Skip to the main content

Review article

The Problems of Marking of Foreign Language Elements within the TEI Standard

Vuk-Tadija Barbarić orcid id orcid.org/0000-0003-1001-437X ; Institut za hrvatski jezik i jezikoslovlje
Antun Halonja ; Institut za hrvatski jezik i jezikoslovlje


Full text: croatian pdf 213 Kb

page 1-17

downloads: 604

cite


Abstract

Under the project Croatian Language Repository of the Institute of Croatian Language and Linguistics the Croatian Language Corpus is being compiled. It consists of a selection of texts dealing with various subject matters and written in various genres of Croatian. It consists of written sources starting from the first period in which the Croatian language standard has been more or less definitely formed, i.e. the second half of the 19th century and ending with contemporary sources.
In their paper the authors focus on the problem of recognition and marking of foreign language elements in the texts which are being prepared for Croatian Language Corpus by means of the computer language for data marking XML within TEI standard. They particularly focus on the possibilities of applying element and global attribute XML:lang.
As the need for establishing unified criteria for the marking of foreign language elements has arisen, guidelines for solving this problem, especially taking into consideration the usefulness of such a corpus for future linguistic research (e.g. the compilation of dictionaries) as well as objective possibilities, i.e. the input/output ratio, have to be devised. Regardless of the practical value of this work, it is necessary to pose a theoretical question: Which language elements in the text are foreign? This question is relevant for any corpus or any particular text.
The authors have identified the basic problems and provided a broad and applicable theoretical and practical framework for the identification and labeling of foreign elements in the corpus based on the code-switching.

Keywords

corpus; Croatian Language Corpus; foreign language elements; TEI standard; marking

Hrčak ID:

98048

URI

https://hrcak.srce.hr/98048

Publication date:

28.1.2013.

Article data in other languages: croatian

Visits: 1.638 *