EXTRACTING ENGLISH WORDS FROM A CORPUS OF CROATIAN

Authors

  • Mirjana Borucinsky Sveučilište u Rijeci, Pomorski fakultet
  • Irena Bogunović Sveučilište u Rijeci, Pomorski fakultet

Keywords:

english words, Croatian, corpus linguistics

Abstract

As the lingua franca of the modern age, English has become the dominant donor language for many languages, including Croatian. The influence of English on Croatian is evident across different registers and linguistic levels, especially the lexical one. Recently, more and more English words have started to appear in their unadapted form (e.g., freelancer, chat, e-mail) in Croatian, especially in the news and social media. English words can be extracted from corpora either manually, by using existing corpus linguistics tools or by developing new tools. The aim of this paper is to analyse whether the existing tools for Croatian can yield a list of unadapted English words. For that purpose, the web corpus (hrWaC) was analysed using the Sketch Engine platform. A list of 1217 English words was composed using this method. The results showed that it is possible to compile a list of English words and their frequencies with the help of the available tools and resources for the Croatian language, but also that there are many
problems due to which the results cannot be considered completely reliable. Moreover, the procedure itself still has to be combined with other manual methods and classifications, and there is a need for the development of new tools for automatic extraction of English words from a corpus of Croatian.

Published

2023-05-09

Issue

Section

Izvorni znanstveni članak