Prethodno priopćenje
https://doi.org/10.36978/cte.7.2.2
How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course
Jasminka Dobša
orcid.org/0000-0002-1684-1010
; Fakultet organizacije i informatike Varaždin, Hrvatska
*
* Dopisni autor.
Sažetak
The aim of the article is to try to identify cases in which large language models show behaviour similar to human thinking and in which they "think" differently, and to point out opportunities, risks and limits in the application of artificial intelligence in teaching, in the context of testing the ChatGPT model on student tasks in the field of statistics. The possibilities and limitations of large language models will be analysed, as well as how to overcome existing biases and shortcomings in this rapidly growing field. In the paper, a chatbot based on the large language model GPT-4 ChatGPT is tested as part of the introductory statistics course taught to second-year computer science students. The tests were conducted by manually entering 170 statistics quiz questions into the ChatGPT browser. The questions are divided into three categories: theoretical questions in which the knowledge is reproduced, theoretical questions in which the understanding of the field is tested, and exercises. The quiz questions were asked in Croatian and the answers given in Croatian were analysed. The accuracy in solving the quiz questions for students and ChatGPT was compared by question category with the Wilcoxon rank sum test. The results show that ChatGPT performs statistically better than students in the categories of theoretical questions where reproduction of knowledge and understanding is required, while students are more successful in solving the practise questions, but the difference in accuracy is not statistically significant (p < 0.01).
Ključne riječi
large language models; ChatGPT; statistics; testing; Croatian language
Hrčak ID:
311603
URI
Datum izdavanja:
18.12.2023.
Posjeta: 729 *