How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course

Dobša, Jasminka

doi:10.36978/cte.7.2.2

Politehnika : Časopis za tehnički odgoj i obrazovanje, Vol. 7 No. 2, 2023.

Prethodno priopćenje

https://doi.org/10.36978/cte.7.2.2

How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course

Jasminka Dobša orcid.org/0000-0002-1684-1010 ; Fakultet organizacije i informatike Varaždin, Hrvatska *

* Dopisni autor.

Puni tekst: hrvatski pdf 872 Kb

str. 18-25

preuzimanja: 264

citiraj

APA 6th Edition

Dobša, J. (2023). How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course. Politehnika, 7 (2), 18-25. https://doi.org/10.36978/cte.7.2.2

MLA 8th Edition

Dobša, Jasminka. "How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course." Politehnika, vol. 7, br. 2, 2023, str. 18-25. https://doi.org/10.36978/cte.7.2.2. Citirano 19.11.2024.

Chicago 17th Edition

Dobša, Jasminka. "How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course." Politehnika 7, br. 2 (2023): 18-25. https://doi.org/10.36978/cte.7.2.2

Harvard

Dobša, J. (2023). 'How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course', Politehnika, 7(2), str. 18-25. https://doi.org/10.36978/cte.7.2.2

Vancouver

Dobša J. How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course. Politehnika [Internet]. 2023 [pristupljeno 19.11.2024.];7(2):18-25. https://doi.org/10.36978/cte.7.2.2

IEEE

J. Dobša, "How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course", Politehnika, vol.7, br. 2, str. 18-25, 2023. [Online]. https://doi.org/10.36978/cte.7.2.2

Sažetak

The aim of the article is to try to identify cases in which large language models show behaviour similar to human thinking and in which they "think" differently, and to point out opportunities, risks and limits in the application of artificial intelligence in teaching, in the context of testing the ChatGPT model on student tasks in the field of statistics. The possibilities and limitations of large language models will be analysed, as well as how to overcome existing biases and shortcomings in this rapidly growing field. In the paper, a chatbot based on the large language model GPT-4 ChatGPT is tested as part of the introductory statistics course taught to second-year computer science students. The tests were conducted by manually entering 170 statistics quiz questions into the ChatGPT browser. The questions are divided into three categories: theoretical questions in which the knowledge is reproduced, theoretical questions in which the understanding of the field is tested, and exercises. The quiz questions were asked in Croatian and the answers given in Croatian were analysed. The accuracy in solving the quiz questions for students and ChatGPT was compared by question category with the Wilcoxon rank sum test. The results show that ChatGPT performs statistically better than students in the categories of theoretical questions where reproduction of knowledge and understanding is required, while students are more successful in solving the practise questions, but the difference in accuracy is not statistically significant (p < 0.01).

Ključne riječi

large language models; ChatGPT; statistics; testing; Croatian language

Hrčak ID:

311603

URI

https://hrcak.srce.hr/311603

Datum izdavanja:

18.12.2023.

Podaci na drugim jezicima: hrvatski

Posjeta: 729 *

Prijava i registracija

Politehnika : Časopis za tehnički odgoj i obrazovanje, Vol. 7 No. 2, 2023.

Sažetak

Ključne riječi

Hrčak ID:

URI

Datum izdavanja: