Skip to the main content

Original scientific paper

Information Retrieval and Terminology Extraction In Online Resources for Patients with Diabetes

Sanja Seljan ; University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and Communication Sciences, Zagreb, Croatia
Maja Baretić ; University of Zagreb, University Hospital Center Zagreb, Division of Endocrinology, Department of Internal Medicine, Zagreb, Croatia
Vlasta Kučiš ; University of Maribor, Faculty of Arts, Department of Translation Studies, Maribor, Slovenia


Full text: english pdf 326 Kb

page 705-710

downloads: 298

cite


Abstract

Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of “browsing phrases” useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user’s demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

Keywords

health literacy, terminology, information extraction, diabetes mellitus type 1, documentation online, language barriers

Hrčak ID:

127612

URI

https://hrcak.srce.hr/127612

Visits: 702 *