A Comparison of Algorithms for Text Classification of Albanian News Articles

Kadriu, Arbana; Abazi, Lejla

ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, Vol. 3 No. 1, 2017.

Ostalo

A Comparison of Algorithms for Text Classification of Albanian News Articles

Arbana Kadriu orcid.org/0000-0003-4922-4753 ; SEE University, Macedonia
Lejla Abazi orcid.org/0000-0002-4354-146X ; SEE University, Macedonia

Puni tekst: engleski PDF 292 Kb

str. 62-68

preuzimanja: 299

citiraj

APA 6th Edition

Kadriu, A. i Abazi, L. (2017). A Comparison of Algorithms for Text Classification of Albanian News Articles. ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, 3 (1), 62-68. Preuzeto s https://hrcak.srce.hr/251111

MLA 8th Edition

Kadriu, Arbana i Lejla Abazi. "A Comparison of Algorithms for Text Classification of Albanian News Articles." ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, vol. 3, br. 1, 2017, str. 62-68. https://hrcak.srce.hr/251111. Citirano 26.07.2026.

Chicago 17th Edition

Kadriu, Arbana i Lejla Abazi. "A Comparison of Algorithms for Text Classification of Albanian News Articles." ENTRENOVA - ENTerprise REsearch InNOVAtion Journal 3, br. 1 (2017): 62-68. https://hrcak.srce.hr/251111

Harvard

Kadriu, A., i Abazi, L. (2017). 'A Comparison of Algorithms for Text Classification of Albanian News Articles', ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, 3(1), str. 62-68. Preuzeto s: https://hrcak.srce.hr/251111 (Datum pristupa: 26.07.2026.)

Vancouver

Kadriu A, Abazi L. A Comparison of Algorithms for Text Classification of Albanian News Articles. ENTRENOVA - ENTerprise REsearch InNOVAtion Journal [Internet]. 2017 [pristupljeno 26.07.2026.];3(1):62-68. Dostupno na: https://hrcak.srce.hr/251111

IEEE

A. Kadriu i L. Abazi, "A Comparison of Algorithms for Text Classification of Albanian News Articles", ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, vol.3, br. 1, str. 62-68, 2017. [Online]. Dostupno na: https://hrcak.srce.hr/251111. [Citirano: 26.07.2026.]

Sažetak

Text classification is an essential work in text mining and information retrieval. There are a lot of algorithms developed aiming to classify computational data and most of them are extended to classify textual data. We have used some of these algorithms to train the classifiers with part of our crawled Albanian news articles and classify the other part with the already learned classifiers. The used categories are: latest news, economy, sport, showbiz, technology, culture, and world. First, we remove all stop words from the gained articles and the output of this step is a separate text file for each category. All these files are then split in sentences, and for each sentence the appropriate category is assigned. All these sentences are then projected to a single list of tuples sentence/category. This list is used to train (80% of the overall number) and to test (the remained 20%) different classifiers. This list is at the end shuffled aiming to randomize the sequence of different categories. We have trained and then test our articles measuring the accuracy for each classifier separately. We have also analysed the training and testing time.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Ključne riječi

Hrčak ID:

251111

URI

https://hrcak.srce.hr/251111

Datum izdavanja:

31.10.2017.

Posjeta: 930 *

Prijava i registracija

ENTRENOVA - ENTerprise REsearch InNOVAtion Journal, Vol. 3 No. 1, 2017.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: