COVID-19 has quickly become a decade-defining issue – and analyzing ProQuest’s newspaper content with text and data mining (TDM) can help shed light on how the world has reacted to it.
Princeton University researcher Gavin Cook explains how he tackled this project in a new case study, Trends in COVID-19 News Coverage: An Introduction to Topic Models Using Text and Data Mining.
“How the media has covered COVID-19 is an empirical question – one we can get a more accurate picture of with statistical techniques for natural language processing,” said Cook. “The detail of human analysis is impossible to replicate with machines, but we can supplement human depth with machine breadth. TDM lets us process many thousands more documents than any human could at once.”
Cook used TDM Studio, ProQuest’s new text and data mining solution, to analyze a ProQuest dataset that includes nearly a million news articles related to the coronavirus. As he noted, this is something no human could do without the assistance of technology.
Using the results of the analysis, Cook created a series of “word clouds,” which offer visually powerful insights into the trends we’re seeing in COVID-19 news coverage.
TDM Studio gives researchers like Cook access to current news and scholarly content in record time by significantly reducing the time and effort needed for the up-front data collection and formatting. This enables researchers to quickly reveal relationships, patterns and connections within and between datasets from a variety of sources, including current and historical ProQuest content.