Effective Spam Detection with Machine Learning

Borotić, Gordana; Granoša, Lara; Kovačević, Jurica; Bagić Babac, Marina

doi:10.2478/crdj-2023-0007

Croatian Regional Development Journal, Vol. 4 No. 2, 2023.

Izvorni znanstveni članak

https://doi.org/10.2478/crdj-2023-0007

Effective Spam Detection with Machine Learning

Gordana Borotić ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
Lara Granoša ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
Jurica Kovačević ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva
Marina Bagić Babac orcid.org/0000-0003-4979-2216 ; Sveučilište u Zagrebu, Fakultet elektrotehnike i računarstva

Puni tekst: engleski pdf 910 Kb

str. 43-64

preuzimanja: 1.361

citiraj

APA 6th Edition

Borotić, G., Granoša, L., Kovačević, J. i Bagić Babac, M. (2023). Effective Spam Detection with Machine Learning. Croatian Regional Development Journal, 4 (2), 43-64. https://doi.org/10.2478/crdj-2023-0007

MLA 8th Edition

Borotić, Gordana, et al. "Effective Spam Detection with Machine Learning." Croatian Regional Development Journal, vol. 4, br. 2, 2023, str. 43-64. https://doi.org/10.2478/crdj-2023-0007. Citirano 10.07.2026.

Chicago 17th Edition

Borotić, Gordana, Lara Granoša, Jurica Kovačević i Marina Bagić Babac. "Effective Spam Detection with Machine Learning." Croatian Regional Development Journal 4, br. 2 (2023): 43-64. https://doi.org/10.2478/crdj-2023-0007

Harvard

Borotić, G., et al. (2023). 'Effective Spam Detection with Machine Learning', Croatian Regional Development Journal, 4(2), str. 43-64. https://doi.org/10.2478/crdj-2023-0007

Vancouver

Borotić G, Granoša L, Kovačević J, Bagić Babac M. Effective Spam Detection with Machine Learning. Croatian Regional Development Journal [Internet]. 2023 [pristupljeno 10.07.2026.];4(2):43-64. https://doi.org/10.2478/crdj-2023-0007

IEEE

G. Borotić, L. Granoša, J. Kovačević i M. Bagić Babac, "Effective Spam Detection with Machine Learning", Croatian Regional Development Journal, vol.4, br. 2, str. 43-64, 2023. [Online]. https://doi.org/10.2478/crdj-2023-0007

Sažetak

This paper aims to provide results of empirical experiments on the accuracy of different machine learning algorithms for detecting spam messages, using a public dataset of spam messages. The originality of our study lies in the integration of topic modeling, specifically employing Latent Dirichlet Allocation (LDA) alongside machine learning algorithms for spam detection. By extracting hidden topics and uncovering patterns in spam and non-spam messages, we provide unique insights into the distinguishing characteristics of spam messages. Moreover, the integration of machine learning is a powerful tool in bolstering risk control measures ensuring the sustainability of digital platforms and communication channels. The research tests the accuracy of spam detection classifiers on an open-source dataset of spam messages. The key findings of this study reveal that the Logistic Regression classifier achieved the highest F score of 0.986, followed by the Support Vector Machine classifier with a score of 0.98 and the Naive Bayes classifier with a score of 0.955. The study concludes that Logistic Regression outperforms Naive Bayes and Support Vector Machine in text classification, particularly in spam detection, emphasizing the role of machine learning techniques in optimizing risk management strategies for sustained digital ecosystems. This capability stems from Logistic Regression's adeptness in modeling complex relationships, enabling it to achieve high accuracy on training and test datasets.

Ključne riječi

spam; email; naive Bayes; logistic regression; support vector machine; risk; sustainability

Hrčak ID:

313834

URI

https://hrcak.srce.hr/313834

Datum izdavanja:

28.12.2023.

Posjeta: 2.330 *

Prijava i registracija

Croatian Regional Development Journal, Vol. 4 No. 2, 2023.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: