Izvorni znanstveni članak
https://doi.org/10.48188/so.7.8
The impact of artificial intelligence and natural language processing on the efficiency of the business process of standardizing unstructured textual data
Antonija Buzov
; Faculty of Economics, Business and Tourism, University of Split, Split, Croatia
*
Mario Jadrić
; Faculty of Economics, Business and Tourism, University of Split, Split, Croatia
* Dopisni autor.
Sažetak
Aim: To examine the role of natural language processing (NLP) in supporting business processes by reliably transforming user-submitted unstructured textual data, specifically requests for medicines, into standardized product entries.
Methods: We collected a dataset of 24 medicine requests which we then processed using a Python-based pipeline that combined preprocessing, BERT embeddings, and fuzzy string matching. In this context, association refers to correctly linking a free-text request to a database entry, where impact is measured through accuracy, precision, recall, and F1-score; natural language refers to the unstructured text provided by users; processing denotes the computational steps used to clean, tokenize, and match the data; and the business process involves transforming user-submitted unstructured requests into structured database records.
Results: At a similarity threshold of 95%, the model achieved 0.94 accuracy, 0.89 precision, 1.0 recall, and an F1-score of 0.941. When the threshold was reduced to 85%, performance dropped to 0.25 accuracy, mainly due to false duplicate matches. The model consistently standardized strength and form (e.g., “500 mg tab” → “500 mg Tablet”). Errors occurred when distinct medicines had highly similar names.
Conclusions: NLP methods can support the automation of unstructured textual data in business processes, provided high similarity thresholds and well-structured databases are maintained. Our findings highlight both the potential efficiency gains and the limitations of lightweight NLP models.
Ključne riječi
natural language processing; artificial intelligence; Python; unstructured textual data
Hrčak ID:
346865
URI
Datum izdavanja:
4.5.2026.
Posjeta: 0 *