KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS

Thakkar, Gaurish

doi:10.19279/TVZ.PD.2024-13-1-01

Polytechnic and design, Vol. 13 No. 1, 2025.

Original scientific paper

https://doi.org/10.19279/TVZ.PD.2024-13-1-01

KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS

Gaurish Thakkar orcid.org/0000-0002-8119-5078 ; University of Zagreb, Faculty of Humanities and Social Sciences,Ivana Lučića 3, 10000, Zagreb, Croatia *

* Corresponding author.

Full text: croatian pdf 413 Kb

page 1-14

downloads: 120

cite

APA 6th Edition

Thakkar, G. (2025). KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS. Polytechnic and design, 13 (1), 1-14. https://doi.org/10.19279/TVZ.PD.2024-13-1-01

MLA 8th Edition

Thakkar, Gaurish. "KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS." Polytechnic and design, vol. 13, no. 1, 2025, pp. 1-14. https://doi.org/10.19279/TVZ.PD.2024-13-1-01. Accessed 18 Jul. 2026.

Chicago 17th Edition

Thakkar, Gaurish. "KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS." Polytechnic and design 13, no. 1 (2025): 1-14. https://doi.org/10.19279/TVZ.PD.2024-13-1-01

Harvard

Thakkar, G. (2025). 'KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS', Polytechnic and design, 13(1), pp. 1-14. https://doi.org/10.19279/TVZ.PD.2024-13-1-01

Vancouver

Thakkar G. KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS. Polytechnic and design [Internet]. 2025 [cited 2026 July 18];13(1):1-14. https://doi.org/10.19279/TVZ.PD.2024-13-1-01

IEEE

G. Thakkar, "KEY FEATURES OF OPEN DATA SETS IN SENTIMENT ANALYSIS OF TWITTER POSTS", Polytechnic and design, vol.13, no. 1, pp. 1-14, 2025. [Online]. https://doi.org/10.19279/TVZ.PD.2024-13-1-01

Abstract

Open-source datasets are fundamental to the advancement of sentiment analysis models, yet their practical utility is often hampered by a lack of standardisation and comprehensive documentation. This paper provides a critical review of the open dataset landscape for Twitter sentiment analysis, examining 48 papers that introduce datasets in 30 different languages. We analyse key elements, including naming conventions, labelling schemes, data distribution methods, and the inclusion of essential metadata such as tweet IDs. Our findings reveal significant inconsistencies that create challenges for reproducibility and the comparative evaluation of models. We identify a critical need for standard practices in dataset creation and dissemination. Based on this analysis, we offer concrete recommendations to enhance the scientific value, discoverability, and long-term usability of open datasets for the research community.

Keywords

sentiment analysis; natural language processing; twitter; sentiment datasets; multilingual

Hrčak ID:

341614

URI

https://hrcak.srce.hr/341614

Publication date:

30.8.2025.

Article data in other languages: croatian

Visits: 397 *

Login and registration

Polytechnic and design, Vol. 13 No. 1, 2025.

Abstract

Keywords

Hrčak ID:

URI

Publication date: