Stručni rad
Tiemark Program – the Processing of the Transcribed Old Library Materials
Mario Essert
; Fakultet strojarstva i brodogradnje Sveučilišta u Zagrebu, Zagreb, Hrvatska
Vlado Cingel
orcid.org/0000-0002-1121-8430
Nikola Glumac
orcid.org/0000-0003-0716-0405
Mario Lončarić
orcid.org/0000-0001-7735-3697
Božidar Štimac
Sažetak
In Croatian institutions, there is a great quantity of digitalized legacy, which is, for the purpose of presentation, often accessible via internet. Digitalized documents hold the truth of rich Croatian cultural and written heritage. After the completion of the program DocMark, whose purpose is to mark digitalized picture documents, and to analyze these marks on individual documents and/or to compare them, the TEIMark program was created – which served to mark text: be it typed, translitarated or machine – recognized. While the marking in DocMark was executed over the document picture (in which the key points are material properties/singularities, but not the content of text), in TEIMark the marking is done on a real text, and not on it’s image, and in this way, linguistic and other types of research focused on the content of document are made possible. Program was given it’s name because of the marks TEI (Text Encoding Initiative), however, unlike the usual input (through commercial editors such as oXygen, XMLSpy, XmlBlueprint, etc.) with XML elements and belonging attributes (which creates difficulties in reading and analysis of the marked text), here we have a more simple, completely new visual approach which excludes the need of knowing and reading XML (eXtensible Markup Language) or XSLT program for transformation (however, they are not rejected in the supplemental analysis and processing of the marked text). Program has all advanced generic attributes, so it can be used, aside from TEI markings, for creating Wiki sites, ReST or Markdown applications and similar. Document markings can be made locally (with text in HTML format) but also through internet, which has, similar to DocMark, enabled the visual markings in several independent layers. This enables the work of more than one person, ie several experts from various fields on the same document. The work requires only a web browser. The results of markings can be exported in XML and other formats, and additionally be processed with classic or newly made programs for analysis (ie counting of marks, studying of conceptual classes, grammatical research and such). Aside from manual, TEIMark has installed the automatic marking option, based on words given in advance (ie from the computer base), their parts and even phrases (words in dispersion). Visual marks are possible to define according to the hierarchical structure in depth, and according to conceptual domains in width, and to be displayed in groups, individiually, or in a layered fashion, within a marked document. TEIMark program is built into the new (fifth) version of electronic edition of BIBLE (© KS, Zagreb) and demonstrated in the HAZU library, for the purposes of marking and the analysis of selected digitalized documents of Institute of Croatian Language and Linguistics and online encyclopedia of The Miroslav Krleža Institute of Lexicography.
Ključne riječi
digitalized legacy; TEIMark program; real text; content; digitalized documents
Hrčak ID:
150028
URI
Datum izdavanja:
10.11.2015.
Posjeta: 1.664 *