Intuiting ProQuest’s New Research Tool

By Bill Kovarik

You never start a story without research. It’s standard practice.

In the newsroom, you always check the library (“the morgue”) and look up old clips about the subject you are working on.

Until recently, journalism students couldn’t do the initial research at a newspaper-these private libraries were not usually open to students.

Research in a standard library would usually involve a two-step process: first, a check in the printed index such as the shelf full of indices for The New York Times; and then a tedious run through miles of microfilm or microfiche.

In the past decade or so, as Lexis-Nexis became more affordable for universities, it was easier to look up recent articles. But a full history was still a long, tedious slog through indices and microfilm.

Recently, a new kind of online database became available. In July 2002 ProQuest (formerly Bell & Howell) began offering The New York Times’ entire backfile from 1851 to 1999. The backfile has 3 million pages and more than 25 million articles covering 148 years of history. In 2003 ProQuest added The Washington Post. Other papers, including The Wall Street Journal, the Chicago Tribune and the Los Angeles Times, are also available.

A search using Boolean delimiters and dates returns a list of possible articles, just like any other database, but the full text is returned as a PDF file. The page position is also available as a lower-resolution PDF. Although it is not full cut-and-paste text, the value to researchers is that pages look as they did when they were printed and it is difficult to have omissions or changes to the material.

These databases are also an improvement in that they are significantly more comprehensive than the old printed indices. For instance, a search for air pollution and smoke nuisance articles through the Progressive era in The New York Times showed many more hits from the ProQuest database than articles referenced in the Times’ original printed index. To teach students how to use these databases and as an experimental teaching assignment I asked members of a class of media history students to help me find articles for an SEJournal column concerning environmental coverage 10, 25, 50, 75, 100 and 150 years ago. “Help your professor make his deadline for the SEJournal” was the name of the assignment.

The experiment was OK but not entirely successful. About half the students attempted to complete the extra-credit assignment, and of these, only about two-thirds were able to follow instructions and return four or more PDF files through e-mail with a summary.

Part of the problem was conceptual: Most students did not believe there was any news coverage of environmental issues earlier than the 1970s, and the assignment was a challenge.

Part of the problem was semantic. Although students were instructed to use alternative search terms (sometimes specific ones), much of what is now considered “environmental” fell into different categories in years past, and some students gave up in frustration. One student, for example, could not find any information about endangered species in the 1900s to 1920s, but she did find one article on buffalo extinction, which she offered very tentatively as possibly not meeting the requirements.

Students said finding articles and text on the environment in earlier years was difficult. “I would assume that much of this is due in part to the lack of concern for things such as forest conservation until more recent years,” said one. “Assume,” of course, is the operative word here, because there are literally thousands of articles on forest conservation in the ProQuest papers from the 1880s through the 1950s.

A link might be made between scientific research and historical research in that both often depend on the interplay of deductive and inductive approaches to find the truth.

Bill Kovarik is a professor in the department of media studies at Radford University and an SEJ board member. This report originally appeared in the SEJournal.