A Multi-Label Machine Learning Approach to Support Pathologist's Histological Analysis
Keywords:
machine learning, health problems, knowledge extraction, data mining, classificationAbstract
This paper proposes a new tool in the field of telemedicine, defined as a specific branch where IT supports medicine, in case distance impairs the proper care to be delivered to a patient. All the information contained into medical texts, if properly extracted, may be suitable for searching, classification, or statistical analysis. For this reason, in order to reduce errors and improve quality control, a proper information extraction tool may be useful. In this direction, this work presents a Machine Learning Multi-Label approach for the classification of the information extracted from the pathology reports into relevant categories. The aim is to integrate automatic classifiers to improve the current workflow of medical experts, by defining a Multi-Label approach, able to consider all the features of a model, together with their relationships.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Bifet, A., Frank, E. (2010), “Sentiment knowledge discovery in twitter streaming data”, in the Proceedings of the International Conference on Discovery Science, Canberra, ACT, Australia, Springer, pp. 1-15.
Ceci, M., Malerba, D. (2007), “Classifying web documents in a hierarchy of categories: a comprehensive study”, Journal of Intelligent Information Systems, Vol. 28, No. 1, pp. 37-78.
Coden, A., Savova, G., Sominsky, I., Tanenblatt, M., Masanz, J., Schuler, K., Cooper, J., Guan, W., De Groen, P.C. (2009), “Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model”, Journal of biomedical informatics, Vol. 42, No. 5, pp.937-949.
Cohen, A. M., Hersh, W. R. (2005), “A survey of current work in biomedical text mining”, Briefings in Bioinformatics, Vol. 6, No. 1, pp. 57-71.
Combi, C., Pozzani, G., Pozzi, G. (2016), “Telemedicine for developing countries”, Applied clinical informatics, Vol. 7, No. 4, pp. 1025-1050.
Garcia-Remesal, M., Maojo, V., Billhardt, H., Crespo, J. (2009), “Integration of Relational and Textual Biomedical Sources A Pilot Experiment Using a Semi-automated Method for Logical Schema Acquisition”, Methods of Information in Medicine, Vol. 49, No. 5, pp. 337-348.
Jouhet, V., Defossez, G., Burgun, A., Le Beux, P., Levillain, P., Ingrand, P., Claveau, V. (2012), “Automated classification of free-text pathology reports for registration of incident cases of cancer”, Methods of information in medicine, Vol. 51, No. 3, pp. 242-251.
LeCun, Y., Bengio, Y., Hinton, G. (2015), “Deep Learning”, Nature, Vol. 521, No. 7553, pp. 436-444.
Liu, C., Cao, L. (2015), “A coupled k-nearest neighbor algorithm for multi-label classification”, in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, Springer, pp. 176-187.
Li, Y., Martinez, D. (2010), “Information extraction of multiple categories from pathology reports”, in the Proceedings of the Australasian Language Technology Association Workshop, Melbourne, Australia, University of Melbourne, pp. 41-48.
Madjarov, G., Kocev, D., Gjorgjevikj, D., Deroski, S. (2012), “An extensive experimental comparison of methods for multi-label learning”, Pattern Recognition, Vol. 45, No. 9, pp. 3084-3104.
McCallum, A. (2005), “Information extraction: Distilling structured data from unstructured tex”, Queue, Vol. 3, No. 9, pp. 48-57.
Meystre, S., Savova, G., Kipper-Schuler, K. (2007), “Extracting information from textual documents in the electronic health record: A review of recent research”, Yearbook of Medical Informatics, Vol. 17, No. 1, pp. 128-144.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J. (2013), “Distributed representations of words and phrases and their compositionality”, in the Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, Curran Associates, Inc, pp. 3136-3145.
Pestian, J. P., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B., Duch, W. (2007), “A shared task involving multi-label classification of clinical free text”, in the Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, Prague, Czech Republic, Association for Computational Linguistics, pp. 97-104.
Sarioglu, E., Yadav, K. Choi, H. (2013), “Topic modeling based classification of clinical reports”, in the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, ACL, pp. 67-73.
Tsoumakas, G., Katakis, I. (2007), “Multi-label classification: An overview”, International Journal of Data Warehousing and Mining, Vol. 3, No. 3, pp. 1-13.
Turian, J., Ratinov, L., Bengio, Y. (2010), “Word representations: a simple and general method for semi-supervised learning”, in the Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, Association for Computational Linguistics, pp. 384-394.
Viviani, M., Pasi, G. (2017), “Credibility in social media: opinions, news, and health information - a survey”, Data Mining and Knowledge Discovery, Vol. 7, No. 5.
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C. (2004), “Recognizing names in biomedical texts: a machine learning approach”, Bioinformatics, Vol. 20, No. 7, pp. 1178-1190.