Skoči na glavni sadržaj

Izvorni znanstveni članak

ProAll-D: protein allergen detection using long short term memory - a deep learning approach

Pallavi M Shanthappa orcid id ; Amrita School of Arts and Sciences, Mysuru, Amrita Vishwa Vidyapeetham, India
Rakshitha Kumar ; Amrita School of Arts and Sciences, Mysuru, Amrita Vishwa Vidyapeetham, India

Puni tekst: engleski pdf 1.076 Kb

str. 231-240

preuzimanja: 147



Background: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens. Method: This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration. Results: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %. Conclusion: As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link to access the ProAll-D server and data.

Ključne riječi

Allergen prediction; ACC transformation; LSTM model; Gaussian naive bayes; Classifier; Extra tree classifier; Bagging classifier; ADA boost; Linear discriminant analysis; Quadratic discriminant analysis

Hrčak ID:



Datum izdavanja:


Posjeta: 399 *