When news sites “catch” the coronavirus: Development and comparative analysis of the 2019 and 2020 articles published on the Index.hr news portal

Authors

  • Petra Bago Filozofski fakultet Sveučilišta u Zagrebu

Keywords:

statistical corpus analysis, specialized corpus, journal articles, Sketch Engine, Python, Index.hr

Abstract

The goal of this paper is to present the methodology, tools and results of comparative computational analysis of newspaper online articles: from the collection of documents and the cleaning of language data for the development of specialized corpora of newspaper articles, to the presentation of the tools used and the comparative statistical analysis of the corpora. The research was conducted on two specialized corpora developed precisely for the purpose of this research, based on 500 newspaper articles in the category “News” of the Index.hr news portal. One corpus is based on articles published in the pre-pandemic year 2019, and the other is based on articles published in the pandemic year 2020. By analyzing the data, we found that the vocabulary of the pandemic corpus is significantly poorer than the pre-pandemic corpus, that in 2020 less was written about the neighboring states of the Republic of Croatia than in 2019, and that the pre-pandemic corpus mentioned domestic cities more than the foreign ones, while the opposite can be argued for the pandemic corpus. Finally, we also investigated the adequacy of automatic term extraction to identify specific topics covered in the observed corpora.

Published

2022-08-05