Identification of translational equivalents in Croatian-English parallel corpus

Tadić, Marko; Šojat, Krešimir

Filologija, No. 38-39, 2002.

Original scientific paper

Identification of translational equivalents in Croatian-English parallel corpus

Marko Tadić ; Odsjek za lingvistiku Filozofski fakultet Sveučilišta u Zagrebu, Zagreb
Krešimir Šojat ; Zavod za lingvistiku Filozofski fakultet Sveučilišta u Zagrebu, Zagreb

Full text: croatian pdf 6.749 Kb

page 247-262

downloads: 723

cite

APA 6th Edition

Tadić, M. & Šojat, K. (2002). Identification of translational equivalents in Croatian-English parallel corpus. Filologija, (38-39), 0-0. Retrieved from https://hrcak.srce.hr/173315

MLA 8th Edition

Tadić, Marko and Krešimir Šojat. "Identification of translational equivalents in Croatian-English parallel corpus." Filologija, vol. , no. 38-39, 2002, pp. 0-0. https://hrcak.srce.hr/173315. Accessed 22 Jul. 2026.

Chicago 17th Edition

Tadić, Marko and Krešimir Šojat. "Identification of translational equivalents in Croatian-English parallel corpus." Filologija , no. 38-39 (2002): 0-0. https://hrcak.srce.hr/173315

Harvard

Tadić, M., and Šojat, K. (2002). 'Identification of translational equivalents in Croatian-English parallel corpus', Filologija, (38-39), pp. 0-0. Available at: https://hrcak.srce.hr/173315 (Accessed 22 July 2026)

Vancouver

Tadić M, Šojat K. Identification of translational equivalents in Croatian-English parallel corpus. Filologija [Internet]. 2002 [cited 2026 July 22];(38-39). Available from: https://hrcak.srce.hr/173315

IEEE

M. Tadić and K. Šojat, "Identification of translational equivalents in Croatian-English parallel corpus", Filologija, vol., no. 38-39, pp. 0-0, 2002. [Online]. Available: https://hrcak.srce.hr/173315. [Accessed: 22 July 2026]

Abstract

The contribution is investigating the possibilities of identification of translational equivalents (TE) in Croatian-English parallel corpus aligned at the sentence level and collected in the Institute of Linguistics, Faculty of Philosophy, University of Zagreb. At the beginning the identification of TEs between single words is being accomplished by generating all possible word pairs with first word in pair from source language and second word in pair from target language. Only sentences with 1:1 alignment were included in processing. The statistical measure of Mutual Information was applied to generated pairs of words and it gave us the statistically relevant cooccurences. Pairs with high MI value are considered good TE candidates. In the second part of paper the identification of multi-word units (in this case only MWUs with 2 elements) has been achieved by applying the same statistical measure in both, source (Croatian) and target (English) language. The MI value has been applied on pairs of pairs of words giving the possible candidates of translational patterns. By high MI values it has been detected that there were pairs of words in source language, which were regularly translated with fixed pair of words in target language although the MI values for monolingual pairs in each language were extremely low. The contribution aims to show how the usage of statistical methods in parallel corpora processing can facilitate the detection of collocations (possible multi-word terms) and their TEs. At the same time the correspondent co-textual examples of word-usage is being provided in both, source and target language. This is of relevance for multilingual lexicographers as dictionary-writers and translators as the most important group of dictionary-users.

Keywords

Croatian-English parallel corpus; multi-word units; translational equivalents; word alignment; mutual information

Hrčak ID:

173315

URI

https://hrcak.srce.hr/173315

Publication date:

20.5.2002.

Article data in other languages: croatian

Visits: 2.306 *

Login and registration

Filologija, No. 38-39, 2002.

Abstract

Keywords

Hrčak ID:

URI

Publication date: