Technical gazette, Vol. 24 No. 2, 2017.
Preliminary communication
https://doi.org/10.17559/TV-20150831012553
Document similarity in repeatedly translated corpora
Vladimir Mateljan
; University of Zagreb, Faculty of Humanities and Social Sciences, Ivana Lučića 3, 10000 Zagreb, Croatia
Vedran Juričić
; University of Zagreb, Faculty of Humanities and Social Sciences, Ivana Lučića 3, 10000 Zagreb, Croatia
Dario Ogrizović
; University of Rijeka, Faculty of Maritime Studies, Studentska 2, 51000 Rijeka, Croatia
Abstract
The paper analyses the changes in relationship between documents in textual corpus that occur due to the translation into another language. Authors analyzed the similarities between documents in original corpus, in Croatian, and compared them with the corresponding documents in translated corpus, in English. The changes were analyzed using two measures, chi-square test’s P-value and new proposed measure, correction coefficient.
Keywords
analysis; document similarity; multilingual; translated corpus; translation
Hrčak ID:
179882
URI
Publication date:
14.4.2017.
Visits: 2.151 *