Skip to the main content

Original scientific paper

https://doi.org/10.32985/ijeces.14.10.3

Semi-automated Software Requirements Categorisation using Machine Learning Algorithms

Pratvina Talele orcid id orcid.org/0000-0003-1400-7812 ; Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India *
Siddharth Apte orcid id orcid.org/0009-0001-3940-231X ; Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Rashmi Phalnikar ; Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Harsha Talele ; Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India

* Corresponding author.


Full text: english pdf 619 Kb

page 1107-1114

downloads: 210

cite


Abstract

Requirement engineering is a mandatory phase of the Software development life cycle (SDLC) that includes defining and documenting system requirements in the Software Requirements Specification (SRS). As the complexity increases, it becomes difficult to categorise the requirements into functional and non-functional requirements. Presently, the dearth of automated techniques necessitates reliance on labour-intensive and time-consuming manual methods for this purpose. This research endeavours to address this gap by investigating and contrasting two prominent feature extraction techniques and their efficacy in automating the classification of requirements. Natural language processing methods are used in the text pre-processing phase, followed by the Term Frequency – Inverse Document Frequency (TF-IDF) and Word2Vec for feature extraction for further understanding. These features are used as input to the Machine Learning algorithms. This study compares existing machine learning algorithms and discusses their correctness in categorising the software requirements. In our study, we have assessed the algorithms Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Neural Network (NN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) on the precision and accuracy parameters. The results obtained in this study showed that the TF-IDF feature selection algorithm performed better in categorising requirements than the Word2Vec algorithm, with an accuracy of 91.20% for the Support Vector Machine (SVM) and Random Forest algorithm as compared to 87.36% for the SVM algorithm. A 3.84% difference is seen between the two when applied to the publicly available PURE dataset. We believe these results will aid developers in building products that aid in requirement engineering.

Keywords

Natural Language Processing; Machine Learning; Software Engineering; Supervised Machine Learning;

Hrčak ID:

311152

URI

https://hrcak.srce.hr/311152

Publication date:

12.12.2023.

Visits: 547 *