Skip to the main content

Original scientific paper

https://doi.org/10.2498/cit.1001917

Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

Nikola Ljubešić ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Petra Bago ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Damir Boras ; Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Croatia


Full text: english pdf 165 Kb

page 303-308

downloads: 606

cite


Abstract

This research is the first step towards developing a system
for translating Croatian weather forecasts into multiple
languages. This step deals with the Croatian-English
language pair. The parallel corpus consists of a one-year
sample of the weather forecasts for the Adriatic, consisting
of 7,893 sentence pairs. Evaluation is performed
by the automatic evaluation measures BLUE, NIST and
METEOR, as well as by manually evaluating a sample of
200 translations. We have shown that with a small-sized
training set and the state-of-the artMoses system, decoding
can be done with 96% accuracy concerning adequacy
and fluency. Additional improvement is expected by
increasing the training set size. Finally, the correlation
of the recorded evaluation measures is explored.

Keywords

statistical machine translation; automatic evaluation; manual evaluation; correlation between evaluation measures

Hrčak ID:

63862

URI

https://hrcak.srce.hr/63862

Publication date:

30.12.2010.

Visits: 1.154 *