Skip to the main content

Professional paper

https://doi.org/10.31724/rihjj.46.2.32

A Web Tool for Managing Material for SASA Dictionary and the Annotation of Lexicographic Card Files

Рада Стијовић ; Институт за српски језик САНУ
Ранка Станковић ; Универзитет у Београду, Рударско-геолошки факултет
Михаило Шкорић ; Универзитет у Београду, Рударско-геолошки факултет


Full text: serbian pdf 1.571 Kb

page 1085-1094

downloads: 626

cite


Abstract

The material for the development of the Dictionary of the Serbo-Croatian Standard and Vernacular Language was collected across 160 years and is recorded on roughly 5,000,000 lexicographic citation cards. It was manually excerpted from over 4,500 written sources and collected in the field in all pronunciations of the Štokavian dialect. At least 15 new volumes of the dictionary are planned based on these card files. They can also serve as the basis for various phonetic, morphological, and syntactic research, as well as for analysing language development over the past two centuries, dialectal specifics from the time when the collections were created (often the only data on the speech of a region at the time of collection), and etymological studies. Its cultural value is also exceptional, as it includes contributions from many illustrious names in Serbian cultural history – Jovan Jovanović Zmaj, Jovan Skerlić, Radoje Domanović, Isidora Sekulić, Milan Rešetar, etc.
This precious, delicate material was scanned from 2016-2018, and in 2017, a web application was developed to efficiently annotate the electronic cards. It was further enhanced based on user needs, enabling (in addition to constricted annotation, where only card headwords were marked) a more detailed annotation including dictionary entry form, homonym tag, attestation and bibliographic reference, abbreviation in the dictionary, and card type (handwritten, typed).
This paper will present the web tool and annotation results from the letter P to Š. So far, 12 different annotators have been working on the annotations, 3-5 annotating simultaneously at any moment with varying intensity. Of the 813 sections with 2,010,508 paper slips, 795 sections with 1,934,583 paper slips were processed with constricted annotation, including 201,487 different headwords. Annotation results will offer an estimation of the remaining number of words for the dictionary SASA with headword list.

Keywords

lexicographic material; lexicographic tool; card files; digitization; annotation

Hrčak ID:

245484

URI

https://hrcak.srce.hr/245484

Publication date:

30.10.2020.

Article data in other languages: croatian serbian

Visits: 2.120 *