Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.2498/cit.1001917

Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

Nikola Ljubešić ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Petra Bago ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Damir Boras ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia


Puni tekst: engleski pdf 165 Kb

str. 303-308

preuzimanja: 604

citiraj


Sažetak

This research is the first step towards developing a system
for translating Croatian weather forecasts into multiple
languages. This step deals with the Croatian-English
language pair. The parallel corpus consists of a one-year
sample of the weather forecasts for the Adriatic, consisting
of 7,893 sentence pairs. Evaluation is performed
by the automatic evaluation measures BLUE, NIST and
METEOR, as well as by manually evaluating a sample of
200 translations. We have shown that with a small-sized
training set and the state-of-the artMoses system, decoding
can be done with 96% accuracy concerning adequacy
and fluency. Additional improvement is expected by
increasing the training set size. Finally, the correlation
of the recorded evaluation measures is explored.

Ključne riječi

statistical machine translation; automatic evaluation; manual evaluation; correlation between evaluation measures

Hrčak ID:

63862

URI

https://hrcak.srce.hr/63862

Datum izdavanja:

30.12.2010.

Posjeta: 1.142 *