Filologija, No. 58, 2012.
Review article
The Problems of Marking of Foreign Language Elements within the TEI Standard
Vuk-Tadija Barbarić
orcid.org/0000-0003-1001-437X
; Institut za hrvatski jezik i jezikoslovlje
Antun Halonja
; Institut za hrvatski jezik i jezikoslovlje
Abstract
Under the project Croatian Language Repository of the Institute of Croatian Language and Linguistics the Croatian Language Corpus is being compiled. It consists of a selection of texts dealing with various subject matters and written in various genres of Croatian. It consists of written sources starting from the first period in which the Croatian language standard has been more or less definitely formed, i.e. the second half of the 19th century and ending with contemporary sources.
In their paper the authors focus on the problem of recognition and marking of foreign language elements in the texts which are being prepared for Croatian Language Corpus by means of the computer language for data marking XML within TEI standard. They particularly focus on the possibilities of applying element and global attribute XML:lang.
As the need for establishing unified criteria for the marking of foreign language elements has arisen, guidelines for solving this problem, especially taking into consideration the usefulness of such a corpus for future linguistic research (e.g. the compilation of dictionaries) as well as objective possibilities, i.e. the input/output ratio, have to be devised. Regardless of the practical value of this work, it is necessary to pose a theoretical question: Which language elements in the text are foreign? This question is relevant for any corpus or any particular text.
The authors have identified the basic problems and provided a broad and applicable theoretical and practical framework for the identification and labeling of foreign elements in the corpus based on the code-switching.
Keywords
corpus; Croatian Language Corpus; foreign language elements; TEI standard; marking
Hrčak ID:
98048
URI
Publication date:
28.1.2013.
Visits: 1.638 *