Skoči na glavni sadržaj

Prethodno priopćenje

GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS

Kristína Machová ; Technical University, Košice, Slovakia
Andrea Szabóová ; Technical University, Košice, Slovakia
Peter Bednár ; Technical University, Košice, Slovakia


Puni tekst: engleski pdf 239 Kb

str. 101-113

preuzimanja: 531

citiraj


Sažetak

The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-IDF method) for key words generation from a text document collection. These key words should characterise the content of text documents and can be used to retrieve relevant documents from a document collection. Term relations were detected on the base of conditional probability of term occurrences. The focus is on the detection of those words, which occur together very often. Thus, key words, which consist from two terms were generated additionally. Several tests were carried out using the 20 News Groups collection of text documents.

Ključne riječi

text documents; key terms generation; TF-IDF method; information gain; mutual information, term relation

Hrčak ID:

21449

URI

https://hrcak.srce.hr/21449

Datum izdavanja:

12.6.2007.

Posjeta: 897 *