Izvorni znanstveni članak
https://doi.org/10.2498/cit.1001917
Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?
Nikola Ljubešić
; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Petra Bago
; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Damir Boras
; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Sažetak
This research is the first step towards developing a system
for translating Croatian weather forecasts into multiple
languages. This step deals with the Croatian-English
language pair. The parallel corpus consists of a one-year
sample of the weather forecasts for the Adriatic, consisting
of 7,893 sentence pairs. Evaluation is performed
by the automatic evaluation measures BLUE, NIST and
METEOR, as well as by manually evaluating a sample of
200 translations. We have shown that with a small-sized
training set and the state-of-the artMoses system, decoding
can be done with 96% accuracy concerning adequacy
and fluency. Additional improvement is expected by
increasing the training set size. Finally, the correlation
of the recorded evaluation measures is explored.
Ključne riječi
statistical machine translation; automatic evaluation; manual evaluation; correlation between evaluation measures
Hrčak ID:
63862
URI
Datum izdavanja:
30.12.2010.
Posjeta: 1.449 *