Skip to the main content

Original scientific paper

https://doi.org/10.31745/s.72.5

Handwritten Text Recognition for Croatian Glagolitic

Achim Rabus orcid id orcid.org/0000-0002-5366-1430 ; University of Freiburg


Full text: croatian pdf 449 Kb

page 181-192

downloads: 230

cite

Full text: english pdf 449 Kb

page 181-192

downloads: 460

cite


Abstract

The paper presents and discusses recent advances in Handwritten Text Recognition (HTR) technologies for handwritten and early printed texts in Croatian Glagolitic script. After elaborating on the general principles of training HTR models with respect to the Transkribus platform used for these experiments, the characteristics of the models trained are discussed. Specifically, the models use the Latin script to transcribe the Glagolitic source. In doing so, they transcribe ligatures and resolve abbreviations correctly in the majority of cases. The computed error rate of the models is below 6%, real-world performance seems to be similar. Using the models for pre-transcription can save a great amount of time when editing manuscripts and, thanks to fuzzy search (keyword spotting), even uncorrected HTR transcriptions can be used for various kinds of analysis. The models are publicly available via the Transkribus platform. Every scholar working on Glagolitic manuscripts and early printings is encouraged to use them.

Keywords

Handwritten Text Recognition; Glagolitic script; Digital Humanities; manuscripts; early printings

Hrčak ID:

269768

URI

https://hrcak.srce.hr/269768

Publication date:

31.12.2021.

Article data in other languages: croatian

Visits: 1.332 *