hrcak mascot   Srce   HID

Društvena istraživanja : journal for general social issues, Vol. 14 No. 1-2 (75-76), 2005

Original scientific paper

The Text Vocabulary Size Law. Heaps' Law and Determining Text Vocabulary Size in Croatian Language

Miroslav TUĐMAN

Fulltext: croatian, pdf (295 KB) pages 227-250 downloads: 459* cite
APA 6th Edition
TUĐMAN, M. (2005). ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku. Društvena istraživanja, 14 (1-2 (75-76)), 227-250. Retrieved from https://hrcak.srce.hr/16266
MLA 8th Edition
TUĐMAN, Miroslav. "ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku." Društvena istraživanja, vol. 14, no. 1-2 (75-76), 2005, pp. 227-250. https://hrcak.srce.hr/16266. Accessed 22 Feb. 2019.
Chicago 17th Edition
TUĐMAN, Miroslav. "ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku." Društvena istraživanja 14, no. 1-2 (75-76) (2005): 227-250. https://hrcak.srce.hr/16266
Harvard
TUĐMAN, M. (2005). 'ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku', Društvena istraživanja, 14(1-2 (75-76)), pp. 227-250. Available at: https://hrcak.srce.hr/16266 (Accessed 22 February 2019)
Vancouver
TUĐMAN M. ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku. Društvena istraživanja [Internet]. 2005 [cited 2019 February 22];14(1-2 (75-76)):227-250. Available from: https://hrcak.srce.hr/16266
IEEE
M. TUĐMAN, "ZAKON O VELIČINI VOKABULARA TEKSTA Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku", Društvena istraživanja, vol.14, no. 1-2 (75-76), pp. 227-250, 2005. [Online]. Available: https://hrcak.srce.hr/16266. [Accessed: 22 February 2019]

Abstracts
The existing formula / Vr(n)=Knß / of Heaps' Law regarding the
size of a text's vocabulary is not universal, thus the law needs to
be redefined, in order to be used for analysis of a different
language corpus. The analysis of a corpus of texts in the Croatian
language confirms the hypothesis that the number of
functional items (F) in a text is constant and amounts to 21% of
the size of the text n (there are 26% of functional items in English
texts). The author proves that the percentage of functional items
in a text can be used as the value for the parameter K, and that
the parameter K presents a constant value for every language
corpus. Empirical research has confirmed the author's thesis that
the number of functional items in a text can be calculated according
to the formula F=nK/100, and that for the value of the
most frequent item (MF) the formula MF=n(K/100)2 can be applied.
The value of the other parameter of Heaps' Law can also
be accurately determined: ß=log K/100. The author therefore
suggests a new form of the text vocabulary size law: Vr(n)=(Kn)ß.
The number of words appearing only once (HL) in the text can be
calculated according to the formula: HL= ((Kn)/2)ß . Research
confirms that there is a very high correlation between the calculated
and real values of the vocabulary size, i.e. between the real
and calculated values of single words in the text. Interpreted and
defined in such a way, the law of the text vocabulary size enables
the calculation of the text's vocabulary size in every language, if
the percentage of constant functional words for this language is
known. However, this interpretation of the law enables, apart
from determining the size of the text's vocabulary, also the
calculation of the number of functional items in the text, the size
of the most frequent word in the text, and the number of single
items comprising the text's vocabulary

Hrčak ID: 16266

URI
https://hrcak.srce.hr/16266

[croatian] [german]

Visits: 1.491 *