Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.2498/cit.2006.04.08

Comparison of Collocation Extraction Measures for Document Indexing

Bojana Dalbelo Basic
Mladen Kolar
Jan Snajder
Sasa Petrovic


Puni tekst: engleski pdf 211 Kb

str. 321-327

preuzimanja: 873

citiraj


Sažetak

Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an aboundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words.

Ključne riječi

Hrčak ID:

44648

URI

https://hrcak.srce.hr/44648

Datum izdavanja:

30.12.2006.

Posjeta: 1.341 *