
In the last two decades, mass digitization has dramatically changed the landscape of scholarly research. The ability to search digital transcriptions of sources for specific keywords saves valuable time, and scholars are no longer confined to archives and libraries if they wish to comb through a text.
However, with the spread of digital transcriptions come new concerns surrounding the labor required to enable such accessibility. A new article in The Sixteenth Century Journal suggests methods for researchers to obtain transcriptions of digitized early modern sources while also avoiding unethical labor practices.
“Unlocking the Digitized Archive of Early Modern Print: The Automatic Transcription of Early Modern Printed Books,” by authors Serena Strecker and Kimberly Lifton, begins with a brief history of the two kinds of software used to produce transcriptions. Optical Character Recognition (OCR) software has proven itself well-suited to transcribing late 19th-century and 20th-century works, but the irregularities common in early modern print render OCR inadequate for reliable transcription of these sources.
Instead, early modern scholars have turned to Handwritten Text Recognition (HTR) technology. Transkribus, the leading HTR software, allows users to either consult publicly available transcription software models or to train their own models. In their comparison of various HTR models tested on a selection of pages from four 16th-century exempla collections, Strecker and Lifton highlight Transkribus’s ability to facilitate the creation of purpose-built transcription models tailored to the specifications of a scholar’s desired source in five basic steps.
Using Transkribus’s public models, researchers can generate the training data necessary to train their own highly accurate models. This process, the authors argue, makes it “no longer necessary nor desirable” to rely on outsourced labor, such as the labor of graduate students or workers in the Global South.
“With the accurate and automated transcription of early modern print no longer a goal but a reality, the field of early modern studies must consider what combination of human labor and machine learning technology will be accepted, supported, and will ultimately shape the future of research,” the authors conclude.
“Only by insisting on ethical labor practices can scholars avoid either exacerbating inequities within the academic hierarchy or perpetuating the lasting inequalities of colonialism.”
More information:
Serena Strecker et al, Unlocking the Digitized Archive of Early Modern Print: The Automatic Transcription of Early Modern Printed Books, The Sixteenth Century Journal (2025). DOI: 10.1086/735052
Provided by
University of Chicago
Citation:
Researchers explore machine learning to automate early modern text transcription ethically (2025, July 18)
retrieved 18 July 2025
from https://phys.org/news/2025-07-explore-machine-automate-early-modern.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.