Network Framework for Research of Croatian Cultural Heritage Prepared for Linked Data

Mario Essert ; Fakultet strojarstva i brodogradnje Sveučilišta u Zagrebu

This paper presents a framework for archiving and analysing documents from various categories of Croatian cultural heritage, e.g. literature, painting and architecture, stored in various different media, e.g. digitalised handwriting, text, images, sound recordings and movies. The framework enables easy categorisation in temporal and spatial coordinates of digital recordings with various properties and contains a search functionality with many different search criteria. Apart from classical bibliographic search, words can be tracked in time, meaning their evolution can be tracked through centuries, from when they first appeared, to when they vanished and reappeared again. A visual editor called TEIMark was developed for marking of syntactic and semantic data, while another editor called DocMark was developed for the marking of visual data, e.g. digitalised handwriting. Both editors support visual markings, or tags above the information, e.g. words or images which are organised in a series of layers that can be hidden, shown or saved in the XML/TEI format. Each document can contain its own set of triplets which can be searched with the Virtuoso triple store database using SparQL commands. The presented network framework also contains a development system for linguistic text analysis, and a program called IExtract, which extracts the s-p-o information from a set of sentences using user-defined patterns. The development system enables semi-automatic creation of alphabets and dictionaries, which can then be connected using the linked data paradigm to existing on-line dictionaries. This forms a foundation for future ontology systems that connect such data (LOD) in a global network cloud.

tools for visual annotations, semantical framework, information extraction, linguistics and heritages open linked data

