Natural Language Processing Using Neighbour Entropy-based Segmentation

Qiao, Jianfeng; Yan, Xingzhi; Lv, Shuran

doi:10.20532/cit.2021.1005393

Journal of computing and information technology, Vol. 29 No. 2, 2021.

Original scientific paper

https://doi.org/10.20532/cit.2021.1005393

Natural Language Processing Using Neighbour Entropy-based Segmentation

Jianfeng Qiao orcid.org/0000-0002-4379-0810 ; Capital University of Economics and Business, Beijing, China
Xingzhi Yan ; University of Birmingham, UK
Shuran Lv ; Capital University of Economics and Business, Beijing, China

Full text: english pdf 1.133 Kb

page 113-131

downloads: 418

cite

APA 6th Edition

Qiao, J., Yan, X. & Lv, S. (2021). Natural Language Processing Using Neighbour Entropy-based Segmentation. Journal of computing and information technology, 29 (2), 113-131. https://doi.org/10.20532/cit.2021.1005393

MLA 8th Edition

Qiao, Jianfeng, et al. "Natural Language Processing Using Neighbour Entropy-based Segmentation." Journal of computing and information technology, vol. 29, no. 2, 2021, pp. 113-131. https://doi.org/10.20532/cit.2021.1005393. Accessed 3 Jun. 2026.

Chicago 17th Edition

Qiao, Jianfeng, Xingzhi Yan and Shuran Lv. "Natural Language Processing Using Neighbour Entropy-based Segmentation." Journal of computing and information technology 29, no. 2 (2021): 113-131. https://doi.org/10.20532/cit.2021.1005393

Harvard

Qiao, J., Yan, X., and Lv, S. (2021). 'Natural Language Processing Using Neighbour Entropy-based Segmentation', Journal of computing and information technology, 29(2), pp. 113-131. https://doi.org/10.20532/cit.2021.1005393

Vancouver

Qiao J, Yan X, Lv S. Natural Language Processing Using Neighbour Entropy-based Segmentation. Journal of computing and information technology [Internet]. 2021 [cited 2026 June 03];29(2):113-131. https://doi.org/10.20532/cit.2021.1005393

IEEE

J. Qiao, X. Yan and S. Lv, "Natural Language Processing Using Neighbour Entropy-based Segmentation", Journal of computing and information technology, vol.29, no. 2, pp. 113-131, 2021. [Online]. https://doi.org/10.20532/cit.2021.1005393

Abstract

In natural language processing (NLP) of Chinese hazard text collected in the process of hazard identification, Chinese word segmentation (CWS) is the first step to extracting meaningful information from such semi-structured Chinese texts. This paper proposes a new neighbor entropy-based segmentation (NES) model for CWS. The model considers the segmentation benefits of neighbor entropies, adopting the concept of "neighbor" in optimization research. It is defined by the benefit ratio of text segmentation, including benefits and losses of combining the segmentation unit with more information than other popular statistical models. In the experiments performed, together with the maximum-based segmentation algorithm, the NES model achieves a 99.3% precision, 98.7% recall, and 99.0% f-measure for text segmentation; these performances are higher than those of existing tools based on other seven popular statistical models. Results show that the NES model is a valid CWS, especially for text segmentation requirements necessitating longer-sized characters. The text corpus used comes from the Beijing Municipal Administration of Work Safety, which was recorded in the fourth quarter of 2018.

Keywords

Text Mining; Text Segmentation; Chinese Word Segmentation; Safety Management; Hazard Analysis

Hrčak ID:

280125

URI

https://hrcak.srce.hr/280125

Publication date:

4.7.2022.

Visits: 1.126 *

Login and registration

Journal of computing and information technology, Vol. 29 No. 2, 2021.

Abstract

Keywords

Hrčak ID:

URI

Publication date: