Skip to the main content

Original scientific paper

https://doi.org/10.17559/TV-20230324000477

Chinese Named Entity Recognition Method for Domain-Specific Text

He Liu ; College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China
Yuekun Ma ; 1) College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China 2) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 3) Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China *
Chang Gao ; College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China
Jia Qi ; Inspur Electronic Information Industry Co., Ltd.
Dezheng Zhang ; School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China *

* Corresponding author.


Full text: english pdf 634 Kb

page 1799-1808

downloads: 362

cite


Abstract

The Chinese named entity recognition (NER) is a critical task in natural language processing, aiming at identifying and classifying named entities in text. However, the specificity of domain texts and the lack of large-scale labelled datasets have led to the poor performance of NER methods trained on public domain corpora on domain texts. In this paper, a named entity recognition method incorporating sentence semantic information is proposed, mainly by adaptively incorporating sentence semantic information into character semantic information through an attention mechanism and a gating mechanism to enhance entity feature representation while attenuating the noise generated by irrelevant character information. In addition, to address the lack of large-scale labelled samples, we used data self-augmentation methods to expand the training samples. Furthermore, we introduced a Weighted Strategy considering that the low-quality samples generated by the data self-augmentation process can have a negative impact on the model. Experiments on the TCM prescriptions corpus showed that the F1 values of our method outperformed the comparison methods.

Keywords

attention mechanism; data augmentation; domain text; meta-learning; named entity recognition

Hrčak ID:

309230

URI

https://hrcak.srce.hr/309230

Publication date:

25.10.2023.

Visits: 804 *