Original scientific paper
https://doi.org/10.31745/s.72.5
Handwritten Text Recognition for Croatian Glagolitic
Achim Rabus
orcid.org/0000-0002-5366-1430
; University of Freiburg
Abstract
The paper presents and discusses recent advances in Handwritten Text Recognition (HTR) technologies for handwritten and early printed texts in Croatian Glagolitic script. After elaborating on the general principles of training HTR models with respect to the Transkribus platform used for these experiments, the characteristics of the models trained are discussed. Specifically, the models use the Latin script to transcribe the Glagolitic source. In doing so, they transcribe ligatures and resolve abbreviations correctly in the majority of cases. The computed error rate of the models is below 6%, real-world performance seems to be similar. Using the models for pre-transcription can save a great amount of time when editing manuscripts and, thanks to fuzzy search (keyword spotting), even uncorrected HTR transcriptions can be used for various kinds of analysis. The models are publicly available via the Transkribus platform. Every scholar working on Glagolitic manuscripts and early printings is encouraged to use them.
Keywords
Handwritten Text Recognition; Glagolitic script; Digital Humanities; manuscripts; early printings
Hrčak ID:
269768
URI
Publication date:
31.12.2021.
Visits: 2.286 *