Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.17559/TV-20230121000257

A Named Entity Recognition Method Enhanced with Lexicon Information and Text Local Feature

Yuekun Ma ; 1) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 2) College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China 3) Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan 063210, China
He Liu ; College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China
Dezheng Zhang ; 1) School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China 2) Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, China, 100083
Chang Gao ; College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China
Yujue Liu ; College for Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China


Puni tekst: engleski pdf 366 Kb

str. 899-906

preuzimanja: 451

citiraj


Sažetak

At present, Named Entity Recognition (NER) is one of the fundamental tasks for extracting knowledge from traditional Chinese medicine (TCM) texts. The variability of the length of TCM entities and the characteristics of the language of TCM texts lead to ambiguity of TCM entity boundaries. In addition, better extracting and exploiting local features of text can improve the accuracy of named entity recognition. In this paper, we proposed a TCM NER model with lexicon information and text local feature enhancement of text. In this model, a lexicon is introduced to encode the characters in the text to obtain the context-sensitive global semantic representation of the text. The convolutional neural network (CNN) and gate joined collaborative attention network are used to form a text local feature extraction module to capture the important semantic features of local text. Experiments were conducted on two TCM domain datasets and the F1 values are 91.13% and 90.21% respectively.

Ključne riječi

attention mechanism; char-word fusion coding; gate mechanism; NER

Hrčak ID:

300700

URI

https://hrcak.srce.hr/300700

Datum izdavanja:

23.4.2023.

Posjeta: 1.130 *