Izvorni znanstveni članak
https://doi.org/10.31820/f.35.2.1
A CORPUS-BASED APPROACH TO ENGLISH LOANWORDS: INTRODUCING THE DATABASE OF ENGLISH LOANWORDS IN CROATIAN
Irena Irena Bogunović
orcid.org/0000-0002-2956-7014
; University of Rijeka, Faculty of Maritime Studies
Sažetak
Unadapted English loanwords have become part of informal communication in many languages, including Croatian. Their use is often motivated by the lack of adequate native equivalents, exposure to English through the media, but also by the prestigious status of the English language. A vast body of research has been dedicated to lexical borrowing, especially from English. At
the same time, corpus analyses have mostly been conducted on smaller, ad hoc corpora. Therefore, the goal of this paper is to present the database of English loanwords in Croatian. The database was developed by algorithmic and manual classification of words from the Corpus of Croatian news portals, ENGRI, and provides a list of 9,452 unadapted English loanwords together with the data on their absolute and relative frequencies. The analysis showed that most loanwords (75.85%) appear less than 50 times, while a total of 44.78% of words appear 10 times or less. The biggest drop in the number of loanwords is observed in the categories of occurrence above 500, while only 27 words appear 5,000 times or more. The most frequent English loanword in the corpus is ‘show’ with 80,805 occurrences, which is 0.0122% of all words in the corpus. The analysis of loanwords that occur more than 5,000 times showed that most of them have Croatian translation equivalents, which confirms the role of the media in the introduction of new words. In addition to providing an insight into the occurrence of English loanwords in Croatian, this database also represents a valuable contribution to Croatian computational linguistics resources and enables future experimental research by providing the data on word frequency.
Ključne riječi
English loanwords; Croatian; lexical borrowing; database; corpus
Hrčak ID:
312508
URI
Datum izdavanja:
28.12.2023.
Posjeta: 941 *