Comparison of Collocation Extraction Measures for Document Indexing

Dalbelo Basic, Bojana; Kolar, Mladen; Snajder, Jan; Petrovic, Sasa

doi:10.2498/cit.2006.04.08

Journal of computing and information technology, Vol. 14 No. 4, 2006.

Original scientific paper

https://doi.org/10.2498/cit.2006.04.08

Comparison of Collocation Extraction Measures for Document Indexing

Bojana Dalbelo Basic
Mladen Kolar
Jan Snajder
Sasa Petrovic

Full text: english pdf 211 Kb

page 321-327

downloads: 999

cite

APA 6th Edition

Dalbelo Basic, B., Kolar, M., Snajder, J. & Petrovic, S. (2006). Comparison of Collocation Extraction Measures for Document Indexing. Journal of computing and information technology, 14 (4), 321-327. https://doi.org/10.2498/cit.2006.04.08

MLA 8th Edition

Dalbelo Basic, Bojana, et al. "Comparison of Collocation Extraction Measures for Document Indexing." Journal of computing and information technology, vol. 14, no. 4, 2006, pp. 321-327. https://doi.org/10.2498/cit.2006.04.08. Accessed 20 Jan. 2025.

Chicago 17th Edition

Dalbelo Basic, Bojana, Mladen Kolar, Jan Snajder and Sasa Petrovic. "Comparison of Collocation Extraction Measures for Document Indexing." Journal of computing and information technology 14, no. 4 (2006): 321-327. https://doi.org/10.2498/cit.2006.04.08

Harvard

Dalbelo Basic, B., et al. (2006). 'Comparison of Collocation Extraction Measures for Document Indexing', Journal of computing and information technology, 14(4), pp. 321-327. https://doi.org/10.2498/cit.2006.04.08

Vancouver

Dalbelo Basic B, Kolar M, Snajder J, Petrovic S. Comparison of Collocation Extraction Measures for Document Indexing. Journal of computing and information technology [Internet]. 2006 [cited 2025 January 20];14(4):321-327. https://doi.org/10.2498/cit.2006.04.08

IEEE

B. Dalbelo Basic, M. Kolar, J. Snajder and S. Petrovic, "Comparison of Collocation Extraction Measures for Document Indexing", Journal of computing and information technology, vol.14, no. 4, pp. 321-327, 2006. [Online]. https://doi.org/10.2498/cit.2006.04.08

Abstract

Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an aboundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words.

Keywords

Hrčak ID:

44648

URI

https://hrcak.srce.hr/44648

Publication date:

30.12.2006.

Visits: 1.780 *

Login and registration

Journal of computing and information technology, Vol. 14 No. 4, 2006.

Abstract

Keywords

Hrčak ID:

URI

Publication date: