Skip to the main content

Original scientific paper

https://doi.org/10.31820/f.37.2.3

Exploring the Interplay of Lexis and Grammar Through N-Grams in English and Croatian

Mirjana Borucinsky orcid id orcid.org/0000-0002-1132-9720 ; Sveučilište u Rijeci, Pomorski fakultet


Full text: croatian pdf 674 Kb

page 427-457

downloads: 114

cite


Abstract

This paper looks at the intricate relationship between lexis and grammar by studying N-grams in English and Croatian. N-grams, i.e. sequences of N words extracted from corpora, are also referred to as lexical bundles (Biber et al. 1999), clusters (Scott 2008), chains (Stubbs 2001; Stubbs & Barth, 2003), recurrent sequences (De Cock 2004), and recurrent word combinations (Altenberg 1998). Lexical bundles (e.g. as well as, in order to, in case of, in terms of) are formulaic sequences that provide “the building blocks of coherent discourse” (Hyland 2008: 6). The lexical bundle approach has focused mainly on register and text-type differences (Hyland 2008), trying to answer the question whether there is a discipline specific lexical repertoire or a core vocabulary. One such attempt is described in Borucinsky and Pritchard (2022). However, since lexical bundles present incomplete structural units that cross grammatical structures (Biber, Conrad and Cortes 2004), there is room for further applications of this approach, both in linguistics and in language teaching (e.g. in cross-linguistic analyses), as suggested by Römer (2009). One such application could be the understanding of the interplay between lexis and grammar. Hence, this paper aims at answering the following question: What can we uncover about the interplay of lexis and grammar through corpus-based research by studying MWEs, and in particular lexical bundles? The starting point of the analysis is the English language, while a contrastive analysis aims to provide insight into MWEs in Croatian, where these multi-word expressions are almost entirely unexplored. We use Sketch Engine (Kilgarriff et al. 2004) to extract N-grams from English and Croatian corpora (enTenTen21 and MaCoCu), and to identify four-word lexical bundles based on the following criteria: (1) a minimum cut-off frequency of at least ten occurrences (cf. Biber et al. 1999); (2) the average reduced frequency (Hlaváčová 2006) and (3) exclusion criteria (cf. Salazar 2014) to eliminate noise resulting from corpus processing. We focus only on NP- and PP-based lexical bundles, and study them crosslinguistically. Furthermore, special attention is paid to grammatical variation or syntactic synonyms such as in case of + Ng (e.g. in case of legal dispute ), in case + CLAUSE (e.g. in cases that could cause conflict); in the event of + Ng (e.g. in the event of an emergency), in the event + CLAUSE (e.g. in the event that no personal representative has been appointed). These forms show that grammatical structures are motivated by meaning and that the boundary between lexis and grammar is fluid. This is even further brought to light through contrastive analysis, as illustrated by the following examples: u slučaju + Ng (e.g. u slučaju pogiblji lit.‘in case of distress’) and u slučaju + CLAUSE (e.g. u slučaju da članica ne odgovori na upozorenje lit. ‘in case that the member does not respond to the warning’; u slučaju kada sud donese rješenje ‘in case when the court passes a decision’). The contribution of this paper is threefold. Empirically, it provides new data on the use of lexico-grammatical constructions in Croatian. Theoretically, it confirms that the structure of these constructions is motivated by meaning and that it interacts with grammar in different ways in the two languages. Methodologically, the study demonstrates the effectiveness of combining frequency-based and functional analysis in contrastive lexico-grammar. The results may have multiple applications: in language teaching, where raising awareness of frequent patterns can contribute to the development of language competence, in contrastive research that links lexico-grammatical patterns across different languages, and in advancing corpus methodology, particularly with regard to more precise annotation of nominal groups and their postmodifiers.

Keywords

lexicogrammar; corpus linguistics; N-grams; lexical bundles; English; Croatian

Hrčak ID:

342893

URI

https://hrcak.srce.hr/342893

Publication date:

31.12.2025.

Article data in other languages: croatian

Visits: 505 *