Skip to main content

Collaborative project will train OCR technology to read early modern fonts

Information powerhouse ProQuest is participating in a project that will vastly accelerate research of 15th through 17th Century cultural history. The company will provide access to page images from the veritable Early English Books Online and newcomer Early European Books to the Early Modern OCR Project (eMOP) at Texas A&M. EMOP will use the content to create a database of typefaces used in the early modern era, train OCR software to read them and then apply crowd-sourcing for editing. The project will turn the rich corpus of works from this pivotal historical period into fully searchable digital documents.

"Digitization of the historical archives of the early modern era made this literature far more accessible. Page images provide scholars with unprecedented access to books that previously could have only been viewed in their source library. However, precision search — the ability to use technology to zero in on very specific text — has been hampered by the fact that OCR technology can't read the peculiarities of early printing," said Mary Sauer-Games, ProQuest vice-president, publishing. "We're thrilled to participate in an effort that we feel will drive new levels of historical discovery. We love the application of modern ingenuity to turn these very old archives into works that are as searchable as text that was born digital."

ProQuest has played a key worldwide role in preservation and access to early modern history, ensuring the survival of printed works from as early as 1450. In the 1930s, the company became a pioneer of microfiche, when it filmed the contents of the vast archives of the British Library and other major libraries across England — virtually every English language book printed in the 15th, 16th and 17th centuries. The microfilm collection, ProQuest's flagship Early English Books, opened these works to global study and created an avenue for preservation. It has since become the quintessential collection for study of the early modern era.

In the 1990s, ProQuest began a massive effort to capture the collection digitally. Early English Books Online enables scholars to manage, share and collaborate on their research virtually. The company even created a social network that allows the scholars who use the collection as a base for their research to connect with each other.

Then, early in the 21st century, ProQuest expanded the program to include major European libraries, launching Early European Books with the Danish Royal Library in Copenhagen and the Biblioteca Nazionale Centrale di Firenze in Italy. Digitization projects are also underway with the U.K.'s famed scientific and medical library — The Wellcome — and the National Library of the Netherlands.

eMop is led by Texas A&M Professors Laura Mandell, Director of the Initiative for Digital Humanities, Media, and Culture (IDHMC), Ricardo Gutierrez-Osuna of Computer Science, and Richard Furuta, Director of the Center for the Study of Digital Libraries (CSDL), along with Anton DuPlessis and Todd Samuelson, book historians from Cushing Rare Books Library. The scholars earned a two-year, $734,000 development grant from the Andrew W. Mellon Foundation to support the work. ProQuest is one of a variety of participating publishers and software organizations that are collaborating on the project.

To learn more about eMOP visit For more information about ProQuest's role in access to and preservation of the world's knowledge, visit

About ProQuest (
ProQuest connects people with vetted, reliable information. Key to serious research, the company has forged a 70-year reputation as a gateway to the world's knowledge — from dissertations to governmental and cultural archives to news, in all its forms. Its role is essential to libraries and other organizations whose missions depend on the delivery of complete, trustworthy information.

ProQuest's massive information pool is made accessible in research environments that accelerate productivity, empowering users to discover, create, and share knowledge.

An energetic, fast-growing organization, ProQuest includes the ProQuest®, Bowker®, Dialog®, ebrary®, and Serials Solutions® businesses and notable research tools such as the RefWorks®, and Pivot services, as well as its' Summon® web-scale discovery service. The company is headquartered in Ann Arbor, Michigan, with offices around the world.

06 November 2012

Featured News Releases

ProQuest’s Latest Addition to DNSA Provides an Insider Look at the Birth of U.S. Human Rights Policy

ProQuest expands its acclaimed Digital National Security Archive (DNSA) with newly declassified files that chronicle the development of U.S. policy as it attempted to deal with the tragedy experienced in Argentina during the critical, formative period of the late 1970's — a time that featured a political collapse verging on civil war; a military coup; and massive illegal detentions, torture and kidnappings.
Read More

ProQuest Revives The Statistical Abstract of the United States

ProQuest has published the 2013 edition of the Statistical Abstract of the United States, rescuing one of researchers’ most valued reference tools — the premier collection of statistics on the United States and its people.
Read More

ProQuest is Named Among Detroit Free Press National Standard Top Workplaces

ProQuest, an Ann Arbor-based information company and creator of award-winning technologies, has been selected as a Detroit Free Press National Standard Top Workplace.
Read More

ProQuest Press Team

Beth Dempsey
Public Relations Manager
(734) 707-2665