Izvorni znanstveni članak
Functional lexicography of an online spellchecker
APA 6th Edition
Dembitz, Š. (2012). Functional lexicography of an online spellchecker. Filologija, (58), 0-0. Preuzeto s https://hrcak.srce.hr/98051
MLA 8th Edition
Dembitz, Šandor. "Functional lexicography of an online spellchecker." Filologija, vol. , br. 58, 2012, str. 0-0. https://hrcak.srce.hr/98051. Citirano 29.09.2023.
Chicago 17th Edition
Dembitz, Šandor. "Functional lexicography of an online spellchecker." Filologija , br. 58 (2012): 0-0. https://hrcak.srce.hr/98051
Dembitz, Š. (2012). 'Functional lexicography of an online spellchecker', Filologija, (58), str. 0-0. Preuzeto s: https://hrcak.srce.hr/98051 (Datum pristupa: 29.09.2023.)
Dembitz Š. Functional lexicography of an online spellchecker. Filologija [Internet]. 2012 [pristupljeno 29.09.2023.];(58). Dostupno na: https://hrcak.srce.hr/98051
Š. Dembitz, "Functional lexicography of an online spellchecker", Filologija, vol., br. 58, str. 0-0, 2012. [Online]. Dostupno na: https://hrcak.srce.hr/98051. [Citirano: 29.09.2023.]
Online spellchecking offers a unique possibility of permanent improving of spellchecker linguistic functionality through an interaction with the community of spellchecker users. Such a possibility is crucial for spellchecking in NLP non-central languages, like Croatian, in order to overcome gaps in natural language processing (NLP) tools between them and NLP central languages (English, Japanese, German, French, Russian, Mandarin Chinese etc.). The possibility will be discussed based on Hascheck example. Hascheck started as the first Croatian public spellchecker, operating with a very modest dictionary of 100,000 Croatian common word-types. Due to the learning the dictionary increased to 830,000 common word-types and 600,000 name-types, acronyms, abbreviations etc. It is a result of processing of a corpus which amounts to 260 millions tokens. Hascheck’s corpus is the biggest corpus ever processed in Croatia with a lexicographic aim. All those happened because of Learning System incorporated into spellchecker software environment, which converts individual user language competence into collective value. The Learning System is highly automated, but its results do not enter into Hascheck’s dictionary without human supervision. The supervision is needed because of precision reasons. The supervision takes a special care about potentially valid words which might be close to frequent or potentially frequent misspellings or typos. Abundance of collected data allows mathematical modeling of many aspects of Hascheck’s life, which are also presented in the paper.
Posjeta: 1.335 *