Skip to the main content

Original scientific paper

https://doi.org/10.31820/f.34.2.13

EXTRACTING ENGLISH WORDS FROM A CORPUS OF CROATIAN

Mirjana Borucinsky orcid id orcid.org/0000-0002-1132-9720 ; Sveučilište u Rijeci, Pomorski fakultet
Irena Bogunović orcid id orcid.org/0000-0002-2956-7014 ; Sveučilište u Rijeci, Pomorski fakultet


Full text: croatian pdf 1.656 Kb

page 435-461

downloads: 613

cite


Abstract

As the lingua franca of the modern age, English has become the dominant donor language for many languages, including Croatian. The influence of English on Croatian is evident across different registers and linguistic levels, especially the lexical one. Recently, more and more English words have started to appear in their unadapted form (e.g., freelancer, chat, e-mail) in Croatian, especially in the news and social media. English words can be extracted from corpora either manually, by using existing corpus linguistics tools or by developing new tools. The aim of this paper is to analyse whether the existing tools for Croatian can yield a list of unadapted English words. For that purpose, the web corpus (hrWaC) was analysed using the Sketch Engine platform. A list of 1217 English words was composed using this method. The results showed that it is possible to compile a list of English words and their frequencies with the help of the available tools and resources for the Croatian language, but also that there are many
problems due to which the results cannot be considered completely reliable. Moreover, the procedure itself still has to be combined with other manual methods and classifications, and there is a need for the development of new tools for automatic
extraction of English words from a corpus of Croatian.

Keywords

English words; Croatian language; corpus linguistics

Hrčak ID:

289279

URI

https://hrcak.srce.hr/289279

Publication date:

30.12.2022.

Article data in other languages: croatian

Visits: 1.481 *