Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.31724/rihjj.46.2.31

From Specialized Web Corpora of Tourism to a Learner’s Dictionary

Irena Srdanović orcid id orcid.org/0000-0003-1281-176X ; Juraj Dobrila University of Pula


Puni tekst: hrvatski pdf 4.014 Kb

str. 1059-1083

preuzimanja: 592

citiraj


Sažetak

This paper presents the two approaches used in creating specialized web corpora of Croatian tourism in Japanese for their usage in building a specialized learners’ dictionary. Both approaches use the WebBootCat technology (Baroni et al. 2006, Kilgarriff et al. 2014) to automatically create specialized web corpora. The first approach creates the corpora from the selected seed words most relevant to the topic. The second approach specifies a number of web pages that cover tourism-oriented information on specified regions, cities, and sites in Croatia available in Japanese, which are then used for web corpora creation inside the Sketch Engine platform. Both approaches provide specialized web corpora small in size, but quite useful for lexical profiling in the specific field of tourism. In the process of dictionary creation, the second approach has proven to be especially useful for the selection of lexical items, while both approaches have proven to be highly useful for the exploration and selection of authentic examples from the corpora. The research exposes some shortcomings in Japanese language processing, such as errors in the lemmatization of some culturally specific terms and indicates the need to refine existing language processing tools in Japanese. The Japanese-Croatian bilingual learner’s dictionary (Srdanović 2018) is currently in the pilot phase and is being used and built by learners and teachers through the open-source dictionary platform Lexonomy (Mechura 2017). In addition to the fact that work on the bilingual dictionary is useful as a means for training students in language analysis and description using modern technologies (e.g. corpora, corpus query systems, dictionary editing platform), the dictionary is also important in educating new personnel capable of working in tourism using the Japanese language, which is strongly needed. In future, the same approach could be used for creating specialized corpora and dictionaries for Japanese and other language pairs.

Ključne riječi

corpus building; BootCat technology; tourism domain; learners’s dictionary; Sketch Engine; specialized web corpus of Croatian tourism in Japanese

Hrčak ID:

245483

URI

https://hrcak.srce.hr/245483

Datum izdavanja:

30.10.2020.

Podaci na drugim jezicima: hrvatski

Posjeta: 1.833 *