hrcak mascot   Srce   HID

Izvorni znanstveni članak

A Corpus-Linguistic Analysis of Sportske novosti

Tomislav Stojanov   ORCID icon orcid.org/0000-0002-6972-6518 ; Institut za hrvatski jezik i jezikoslovlje
Zoran Vučić

Puni tekst: hrvatski, pdf (785 KB) str. 103-129 preuzimanja: 465* citiraj
APA 6th Edition
Stojanov, T. i Vučić, Z. (2012). Korpusnojezikoslovna obradba tekstova Sportskih novosti. Filologija, (59), 103-129. Preuzeto s https://hrcak.srce.hr/98089
MLA 8th Edition
Stojanov, Tomislav i Zoran Vučić. "Korpusnojezikoslovna obradba tekstova Sportskih novosti." Filologija, vol. , br. 59, 2012, str. 103-129. https://hrcak.srce.hr/98089. Citirano 21.10.2020.
Chicago 17th Edition
Stojanov, Tomislav i Zoran Vučić. "Korpusnojezikoslovna obradba tekstova Sportskih novosti." Filologija , br. 59 (2012): 103-129. https://hrcak.srce.hr/98089
Harvard
Stojanov, T., i Vučić, Z. (2012). 'Korpusnojezikoslovna obradba tekstova Sportskih novosti', Filologija, (59), str. 103-129. Preuzeto s: https://hrcak.srce.hr/98089 (Datum pristupa: 21.10.2020.)
Vancouver
Stojanov T, Vučić Z. Korpusnojezikoslovna obradba tekstova Sportskih novosti. Filologija [Internet]. 2012 [pristupljeno 21.10.2020.];(59):103-129. Dostupno na: https://hrcak.srce.hr/98089
IEEE
T. Stojanov i Z. Vučić, "Korpusnojezikoslovna obradba tekstova Sportskih novosti", Filologija, vol., br. 59, str. 103-129, 2012. [Online]. Dostupno na: https://hrcak.srce.hr/98089. [Citirano: 21.10.2020.]

Sažetak
The paper examines the role of corpus in linguistic research on the example of two Croatian language corpora interfaces, Philologic and Bonito, for language inquires about document and content relation, as well as the level of character and information display. For specialized linguistic search queries we have built the sport newspaper database made of Sportske novosti online texts (http://sportske.jutarnji.hr/), containing 3,6 mil. of tokens published since April 2008 till July 2009.
The computational procedures of information retrieval and n-gram SQL/regex queries will be shown in order to extract token co-frequencies and reveal phrases, collocations and more constant syntagmemes. The JavaScript wiring library WireIt is used for a token frequencies visualization in browser.
We have compared the output with Google search results based on which we have pointed out seven Google search shortcomings for linguistic investigations and have concluded that our approach could produce unique results in linguistic research.

Ključne riječi
text search; SQLite; information retrieval; Google search; corpus linguistics; Sportske novosti; n-gram; collocation; Croatian language

Hrčak ID: 98089

URI
https://hrcak.srce.hr/98089

[hrvatski]

Posjeta: 900 *