Can Public Code Smells Datasets Be Trusted?

Skoči na glavni sadržaj

Journal of Communications Software and Systems, Vol. 21 No. 4, 2025.

Izvorni znanstveni članak

https://doi.org/10.24138/jcomss-2025-0131

Can Public Code Smells Datasets Be Trusted?

Ruchin Gupta ; Jaypee Institute of Information Technology, Noida, India *
Jitendra Kumar Seth ; KIET Group of Institutions, Delhi-NCR, Ghaziabad, India
Anupama Sharma ; Ajay Kumar Garg Engineering College, Ghaziabad, India
Abhishek Goyal ; KIET Group of Institutions, Delhi-NCR, Ghaziabad, India

* Dopisni autor.

Puni tekst: engleski pdf 1.255 Kb

str. 477-488

preuzimanja: 250

APA 6th Edition

Gupta, R., Kumar Seth, J., Sharma, A. i Goyal, A. (2025). Can Public Code Smells Datasets Be Trusted?. Journal of Communications Software and Systems, 21 (4), 477-488. https://doi.org/10.24138/jcomss-2025-0131

MLA 8th Edition

Gupta, Ruchin, et al. "Can Public Code Smells Datasets Be Trusted?." Journal of Communications Software and Systems, vol. 21, br. 4, 2025, str. 477-488. https://doi.org/10.24138/jcomss-2025-0131. Citirano 26.05.2026.

Chicago 17th Edition

Gupta, Ruchin, Jitendra Kumar Seth, Anupama Sharma i Abhishek Goyal. "Can Public Code Smells Datasets Be Trusted?." Journal of Communications Software and Systems 21, br. 4 (2025): 477-488. https://doi.org/10.24138/jcomss-2025-0131

Harvard

Gupta, R., et al. (2025). 'Can Public Code Smells Datasets Be Trusted?', Journal of Communications Software and Systems, 21(4), str. 477-488. https://doi.org/10.24138/jcomss-2025-0131

Vancouver

Gupta R, Kumar Seth J, Sharma A, Goyal A. Can Public Code Smells Datasets Be Trusted?. Journal of Communications Software and Systems [Internet]. 2025 [pristupljeno 26.05.2026.];21(4):477-488. https://doi.org/10.24138/jcomss-2025-0131

IEEE

R. Gupta, J. Kumar Seth, A. Sharma i A. Goyal, "Can Public Code Smells Datasets Be Trusted?", Journal of Communications Software and Systems, vol.21, br. 4, str. 477-488, 2025. [Online]. https://doi.org/10.24138/jcomss-2025-0131

Sažetak

Code smells signal potential issues in a codebase andindicate technical debt. Early detection is crucial for maintaining code quality. Researchers often rely on public datasets to automate and enhance smell detection, but their trustworthiness is frequently assumed rather than verified. While these datasets are valuable for developing detection tools, key questions arise: Can they be fully trusted? Are the labels accurate? Do they reflect real-world software development? Recent studies reveal inconsistencies, biases, and misclassifications, raising concerns about their reliability. This paper explores the integrity of widely used 2 sets of public code smells datasets namely Group A dataset and Group B dataset by examining their internal consistency, alignment with established facts. Through this investigation, we aim to determine whether these datasets can be confidently utilized in research and practical applications, or if their inherent issues undermine the validity of the results they produce. Group A datasets are smaller, balanced, and factually aligned but lack industry relevance, while Group B deviates from known facts. The study acknowledges academic–industry differences, viewing divergence as a reflection of real-world variability rather than a flaw, and emphasizes the need for rigorous validation of public datasets to ensure reliable research outcomes.

Ključne riječi

Code Smell; code smell datasets; validation

Hrčak ID:

341492

URI

https://hrcak.srce.hr/341492

Datum izdavanja:

31.12.2025.

Posjeta: 470 *