A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection

Song, Guizhe; Huang, Degen; Zhang*, Yanping

doi:10.17559/TV-20210325125414

Tehnički vjesnik, Vol. 28 No. 5, 2021.

Izvorni znanstveni članak

https://doi.org/10.17559/TV-20210325125414

A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection

Guizhe Song ; Dalian University of Technology, School of Computer Science and Technology, No. 2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province, P. R. China, 116024
Degen Huang ; Dalian University of Technology, School of Computer Science and Technology, No. 2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province, P. R. China, 116024
Yanping Zhang* ; Gonzaga University, Department of Computer Science, 502 East Boone Avenue, Spokane, WA 99258-0102, Canada

Puni tekst: engleski pdf 962 Kb

str. 1667-1673

preuzimanja: 579

citiraj

APA 6th Edition

Song, G., Huang, D. i Zhang*, Y. (2021). A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection. Tehnički vjesnik, 28 (5), 1667-1673. https://doi.org/10.17559/TV-20210325125414

MLA 8th Edition

Song, Guizhe, et al. "A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection." Tehnički vjesnik, vol. 28, br. 5, 2021, str. 1667-1673. https://doi.org/10.17559/TV-20210325125414. Citirano 25.12.2024.

Chicago 17th Edition

Song, Guizhe, Degen Huang i Yanping Zhang*. "A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection." Tehnički vjesnik 28, br. 5 (2021): 1667-1673. https://doi.org/10.17559/TV-20210325125414

Harvard

Song, G., Huang, D., i Zhang*, Y. (2021). 'A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection', Tehnički vjesnik, 28(5), str. 1667-1673. https://doi.org/10.17559/TV-20210325125414

Vancouver

Song G, Huang D, Zhang* Y. A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection. Tehnički vjesnik [Internet]. 2021 [pristupljeno 25.12.2024.];28(5):1667-1673. https://doi.org/10.17559/TV-20210325125414

IEEE

G. Song, D. Huang i Y. Zhang*, "A Hybrid Model for Monolingual and Multilingual Toxic Comment Detection", Tehnički vjesnik, vol.28, br. 5, str. 1667-1673, 2021. [Online]. https://doi.org/10.17559/TV-20210325125414

Sažetak

Social media provides a public and convenient platform for people to communicate. However, it is also open to hateful behavior and toxic comments. Social networks, like Facebook, Twitter, and many others, have been working on developing effective toxic comment detection methods to provide better service. Monolingual language model focuses on a single-language and provides high accuracy in detection. Multilingual language model provides better generalization performance. In order to improve the effectiveness of detecting toxic comments in multiple languages, we propose a hybrid model, which fuses monolingual model and multilingual model. We use labeled data to fine-tune the monolingual pre-trained model. We use masked language modeling to semi-supervise the fine-tuning of multilingual pre-trained model on unlabeled data and then use labeled data to fine-tune the model. Through this way, we can fully utilize the large amount of unlabeled data; reduce dependence on labeled comment data; and improve the effectiveness of detection. We also design several comparative experiments. The results demonstrate the effectiveness and advantage of our proposed model, especially compared to the XLM-RoBERTa multilingual fine-tuning model.

Ključne riječi

masked language modelling; model fusion; toxic commenting; XLM-RoBERTa

Hrčak ID:

261344

URI

https://hrcak.srce.hr/261344

Datum izdavanja:

15.8.2021.

Posjeta: 1.399 *

Prijava i registracija

Tehnički vjesnik, Vol. 28 No. 5, 2021.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: