Skoči na glavni sadržaj

Prethodno priopćenje

ASSIGNING KEYWORDS TO DOCUMENTS USING MACHINE LEARNING

Dunja Mladenić ; Department of Intelligent Systems, J. Stefan Institute, Ljubljana, Slovenia
Marko Grobelnik ; Department of Intelligent Systems, J. Stefan Institute, Ljubljana, Slovenia


Puni tekst: engleski pdf 5.444 Kb

str. 123-131

preuzimanja: 831

citiraj


Sažetak

This paper describes the usage of machine learning techniques to assign keywords to documents. The large hierarchy of documents available on the Web, the Yahoo hierarchy, is used here as a real-world problem domain. Machine learning techniques developed for learning on text data are used here in the hierarchical classification structure. The high number of features is reduced by taking into account the hierarchical structure and using a feature subset selection based on the method used in information retrieval. Documents are represented as word-vectors that include word sequences (n-grams) instead of just single words. The hierarchical structure of the examples and class values is taken into account when defining the subproblems and forming training examples for them. Additionally, a hierarchical structure of class values is used in classification, where only promising paths in the hierarchy are considered.

Ključne riječi

machine learning; assigning keywords; Yahoo hierarchy; document categorization; F1-measure; F2-measure

Hrčak ID:

78771

URI

https://hrcak.srce.hr/78771

Datum izdavanja:

15.12.1999.

Posjeta: 1.185 *