hrcak mascot   Srce   HID

Original scientific paper

Language Historical and Computational Linguistic Aspects of the Descriptions and Norming of Dashes in the Croatian Language

Tomislav Stojanov   ORCID icon orcid.org/0000-0002-6972-6518 ; Institut za hrvatski jezik i jezikoslovlje

Fulltext: croatian, pdf (2 MB) pages 127-161 downloads: 489* cite
APA 6th Edition
Stojanov, T. (2015). Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku. Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, 41 (1), 127-161. Retrieved from https://hrcak.srce.hr/141876
MLA 8th Edition
Stojanov, Tomislav. "Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku." Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, vol. 41, no. 1, 2015, pp. 127-161. https://hrcak.srce.hr/141876. Accessed 6 Apr. 2020.
Chicago 17th Edition
Stojanov, Tomislav. "Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku." Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje 41, no. 1 (2015): 127-161. https://hrcak.srce.hr/141876
Harvard
Stojanov, T. (2015). 'Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku', Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, 41(1), pp. 127-161. Available at: https://hrcak.srce.hr/141876 (Accessed 06 April 2020)
Vancouver
Stojanov T. Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku. Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje [Internet]. 2015 [cited 2020 April 06];41(1):127-161. Available from: https://hrcak.srce.hr/141876
IEEE
T. Stojanov, "Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku", Rasprave: Časopis Instituta za hrvatski jezik i jezikoslovlje, vol.41, no. 1, pp. 127-161, 2015. [Online]. Available: https://hrcak.srce.hr/141876. [Accessed: 06 April 2020]

Abstracts
This paper describes one of two punctuation marks (dashes and quotation marks) that deviate significantly from the relationship of one character per (unicode) semantic value. While quotation marks have multiple graphemes (eight, specifically) for one semantic value, dashes typically have two graphemes (a short and a long dash) that cover as many as 11 (Unicode and Latin) dash characters. While the criteria for line length has typically been highly prominent in orthography manuals, it is only found in the presented categorization on the sixth hierarchical level.
Aside from two new Unicode dash characters (the two-em dash and three-em dash, Unicode 6.1, January 2012) having been standardized in the meantime, differing methodology and a comparison of the linguistic-historical and computational linguistic aspects have spread awareness of dash characters in the Croatian language as described in Portada-Stojanov (2009). A categorization is presented that is sensitive to the dichotomy of graphic representation and meaning that divides all dash characters into five hierarchical levels. Among the 44 Unicode horizontal and unbroken dash characters, a division into type, time, functionality, direction, and line height has resulted in 11 contemporary Latin alphabetic horizontal central characters, among which each language written in the Latin alphabet chooses its own. The semantic value and usage of all Unicode dash graphemes has been described.
On the other hand, the paper also described dash characters from the perspective of Croatian historical linguistics and orthography. In comparison to the rich repository of standardized Unicode dash characters, it has been shown that orthographic standards are significantly reductive. Orthographic norming of dash characters is divided into two periods and three groups, depending on their graphemic form (the first and second generation of orthography manuals) and terminology (the pre-standard phase and the two standard norming schools, depending on the acceptance of the terminological pairs “spojnica – crtica” and “crtica – crta”).
The historical linguistic and computational linguistic comparative research and the contrastive analysis of the Unicode standardization of dash characters with traditional orthographic descriptions of dash characters was intended to highlight (i) the need for a broader, interdisciplinary approach to describing written linguistic practice, (ii) the insufficiency of descriptions in primary and secondary school orthography manuals for modern writing, and (iii) the insufficiency of the existing Croatian codification of both terminological schools. In order for orthography manuals to be called scholarly, it is claimed that computer writing should be better described, and that a differentiation between characters and graphemes should be introduced on the level of punctuation. One of the areas in which orthography manuals could bring themselves technologically up to date is the issue of the writing of compound words at the beginning of a broken line, and the paper provides eight reasons to abandon the current tradition.
Analysis has shown that it would be justified to base dash codification on three or four characters, which reduces the 11 Latin Unicode characters to basic groups of dashes – the short, medium, long, and very long dashes, referred to as c1, c2, c3 and c4.

Keywords
Croatian language; orthography; linguography; Unicode; dash; hyphen

Hrčak ID: 141876

URI
https://hrcak.srce.hr/141876

[croatian]

Visits: 729 *