Skip to the main content

Review article

HOW TO APPROACH DATA ANALYSIS OF TEXTS

Dunja Mladenič ; J.Stefan Institute, Ljubljana, Slovenia


Full text: english pdf 7.345 Kb

page 123-134

downloads: 484

cite


Abstract

Analysis of large text data sets is gaining popularity providing the users some insights into their own (potentially even very unstructured) data sets that where difficult to get using the standard methods. This kind of data analysis differs from the standard analysis in the following three directions: (1) the used methods for data analysis differ from the standard statistical methods, (2) the data we are analyzing have different characteristics than the standard, structured data bases, and (3) the users of the data analysis results have different needs and requirements than the usual users of common analytical services (statistics, data-mining, OLAP). This paper gives a brief idea of the area addressing that kind of data analysis commonly referred to as Text-Mining. It is a growing area placed at the intersection of Information-Retrival (IR), Data-Mining (DM), Machine-Learning (ML), Natural-Language-Processing (NLP). The problems usually addressed in Text-Mining are topic detection and tracking, document categorization, visualization of document collections, user profiling, information extraction, construction and updating of hierarchical indices and document collections, intelligent search.

Keywords

text data analysis; data mining; example applications of text mining; personalized information delivery

Hrčak ID:

78313

URI

https://hrcak.srce.hr/78313

Publication date:

15.12.2004.

Visits: 1.135 *