Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.31745/s.72.5

Handwritten Text Recognition for Croatian Glagolitic

Achim Rabus orcid id orcid.org/0000-0002-5366-1430 ; University of Freiburg


Puni tekst: hrvatski pdf 449 Kb

str. 181-192

preuzimanja: 423

citiraj

Puni tekst: engleski pdf 449 Kb

str. 181-192

preuzimanja: 718

citiraj


Sažetak

The paper presents and discusses recent advances in Handwritten Text Recognition (HTR) technologies for handwritten and early printed texts in Croatian Glagolitic script. After elaborating on the general principles of training HTR models with respect to the Transkribus platform used for these experiments, the characteristics of the models trained are discussed. Specifically, the models use the Latin script to transcribe the Glagolitic source. In doing so, they transcribe ligatures and resolve abbreviations correctly in the majority of cases. The computed error rate of the models is below 6%, real-world performance seems to be similar. Using the models for pre-transcription can save a great amount of time when editing manuscripts and, thanks to fuzzy search (keyword spotting), even uncorrected HTR transcriptions can be used for various kinds of analysis. The models are publicly available via the Transkribus platform. Every scholar working on Glagolitic manuscripts and early printings is encouraged to use them.

Ključne riječi

Handwritten Text Recognition; Glagolitic script; Digital Humanities; manuscripts; early printings

Hrčak ID:

269768

URI

https://hrcak.srce.hr/269768

Datum izdavanja:

31.12.2021.

Podaci na drugim jezicima: hrvatski

Posjeta: 2.286 *