Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

Özçift, Akın; Akarsu, Kamil; Yumuk, Fatma; Söylemez, Cevhernur

doi:10.1080/00051144.2021.1922150

Automatika : časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, Vol. 62 No. 2, 2021.

Izvorni znanstveni članak

https://doi.org/10.1080/00051144.2021.1922150

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

Akın Özçift ; Hasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey
Kamil Akarsu ; Hasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey
Fatma Yumuk ; Hasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey
Cevhernur Söylemez ; Hasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey

Puni tekst: engleski pdf 2.605 Kb

str. 226-238

preuzimanja: 548

citiraj

APA 6th Edition

Özçift, A., Akarsu, K., Yumuk, F. i Söylemez, C. (2021). Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish. Automatika, 62 (2), 226-238. https://doi.org/10.1080/00051144.2021.1922150

MLA 8th Edition

Özçift, Akın, et al. "Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish." Automatika, vol. 62, br. 2, 2021, str. 226-238. https://doi.org/10.1080/00051144.2021.1922150. Citirano 07.05.2024.

Chicago 17th Edition

Özçift, Akın, Kamil Akarsu, Fatma Yumuk i Cevhernur Söylemez. "Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish." Automatika 62, br. 2 (2021): 226-238. https://doi.org/10.1080/00051144.2021.1922150

Harvard

Özçift, A., et al. (2021). 'Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish', Automatika, 62(2), str. 226-238. https://doi.org/10.1080/00051144.2021.1922150

Vancouver

Özçift A, Akarsu K, Yumuk F, Söylemez C. Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish. Automatika [Internet]. 2021 [pristupljeno 07.05.2024.];62(2):226-238. https://doi.org/10.1080/00051144.2021.1922150

IEEE

A. Özçift, K. Akarsu, F. Yumuk i C. Söylemez, "Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish", Automatika, vol.62, br. 2, str. 226-238, 2021. [Online]. https://doi.org/10.1080/00051144.2021.1922150

Sažetak

Language model pre-training architectures have demonstrated to be useful to learn language representations. bidirectional encoder representations from transformers (BERT), a recent deep bidirectional self-attention representation from unlabelled text, has achieved remarkable results in many natural language processing (NLP) tasks with fine-tuning. In this paper, we want to demonstrate the efficiency of BERT for a morphologically rich language, Turkish. Traditionally morphologically difficult languages require dense language pre-processing steps in order to model the data to be suitable for machine learning (ML) algorithms. In particular, tokenization, lemmatization or stemming and feature engineering tasks are needed to obtain an efficient data model to overcome data sparsity or high-dimension problems. In this context, we selected five various Turkish NLP research problems as sentiment analysis, cyberbullying identification, text classification, emotion recognition and spam detection from the literature. We then compared the empirical performance of BERT with the baseline ML algorithms. Finally, we found enhanced results compared to base ML algorithms in the selected NLP problems while eliminating heavy pre-processing tasks.

Ključne riječi

Bidirectional encoder representations transformers; language pre-processing; morphologically rich language; natural language processing; Turkish

Hrčak ID:

269829

URI

https://hrcak.srce.hr/269829

Datum izdavanja:

4.6.2021.

Posjeta: 1.742 *

Prijava i registracija

Automatika : časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, Vol. 62 No. 2, 2021.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: