Izvorni znanstveni članak
https://doi.org/10.31820/f.34.2.13
EXTRACTING ENGLISH WORDS FROM A CORPUS OF CROATIAN
Mirjana Borucinsky
orcid.org/0000-0002-1132-9720
; Sveučilište u Rijeci, Pomorski fakultet
Irena Bogunović
orcid.org/0000-0002-2956-7014
; Sveučilište u Rijeci, Pomorski fakultet
Sažetak
As the lingua franca of the modern age, English has become the dominant donor language for many languages, including Croatian. The influence of English on Croatian is evident across different registers and linguistic levels, especially the lexical one. Recently, more and more English words have started to appear in their unadapted form (e.g., freelancer, chat, e-mail) in Croatian, especially in the news and social media. English words can be extracted from corpora either manually, by using existing corpus linguistics tools or by developing new tools. The aim of this paper is to analyse whether the existing tools for Croatian can yield a list of unadapted English words. For that purpose, the web corpus (hrWaC) was analysed using the Sketch Engine platform. A list of 1217 English words was composed using this method. The results showed that it is possible to compile a list of English words and their frequencies with the help of the available tools and resources for the Croatian language, but also that there are many
problems due to which the results cannot be considered completely reliable. Moreover, the procedure itself still has to be combined with other manual methods and classifications, and there is a need for the development of new tools for automatic
extraction of English words from a corpus of Croatian.
Ključne riječi
English words; Croatian language; corpus linguistics
Hrčak ID:
289279
URI
Datum izdavanja:
30.12.2022.
Posjeta: 1.481 *