Visualization of Big Data Text Analytics in Financial Industry: A Case Study of Topic Extraction for Italian Banks

Authors

  • Živko Krstić Atomic Intelligence, Data Science Department, Zagreb, Croatia
  • Sanja Seljan Faculty of Humanities and Social Sciences, Information and Communication Sciences, Zagreb, Croatia
  • Jovana Zoroja Faculty of Economics and Business, Zagreb, Croatia

Keywords:

visualization, data science, FinTech, topic modelling, LDA

Abstract

Textual data and analysis can derive new insights and bring valuable business insights. These insights can be further leveraged by making better future business decisions. Sources that are used for text analysis in financial industry vary from internal word documents, email to external sources like social media, websites or open data. The system described in this paper will utilize data from social media (Twitter) and tweets related to Italian banks, in Italian. This system is based on open source tools (R language) and topic extraction model was created to gather valuable information. This paper describes methods used for data ingestion, modelling, visualizations of results and insights.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Beeley, C. (2013), Web application development with R using Shiny, Packt Publishing Ltd.

Bruns, A., Liang, Y. E. (2012), “Tools and methods for capturing Twitter data during natural disasters”, First Monday, Vol. 17, No. 4, pp. 1-8.

Ćurlin, T., Jaković, B., Miloloža, I. (2019), “Twitter usage in Tourism: Literature Review”, Business Systems Research, Vol. 10, No. 1, pp. 102-119.

Elshendy, M., Fronzetti Colladon, A. (2017), “Big data analysis of economic news: Hints to forecast macroeconomic indicators”, International Journal of Engineering Business Management, Vol. 9.

Feldman, R., Sanger, J. (2007), The text mining handbook: advanced approaches in analyzing unstructured data, Cambridge University Press.

Furner, C. P., Zinko, R., Zhu, Z. (2017), “Examining the Role of Mobile Self-Efficacy in the Word-of-Mouth/Mobile Product Reviews Relationship”, International Journal of E-Services and Mobile Applications, Vol.10, No. 4, pp. 40-60.

Gašpar, A., Seljan, S. (2016), “Consistency of Translated Terminology Measured by the Herfindahl-Hirshman Index (HHI)”, Lecture Notes in Computer Science (LNCS), Springer.

Hong, L., Davison, B. D. (2010), “Empirical study of topic modeling in twitter”, in the Proceedings of the 1st Workshop on Social Media Analytics, Washington D.C., District of Columbia, USA, ACM, pp. 80-88.

Pejić Bach, M., Krstić, Ž., Seljan, S., Turulja, L. (2019a), “Text Mining for Big Data Analysis in Financial Sector: A Literature Review”, Sustainability, Vol. 11, No. 5.

Pejić Bach, M., Krstić, Ž., Seljan, S. (2019b), “Big data text mining in the financial sector”, in Metawa, N., Elhoseny, M., Hassanien, A. E., Hassan, M. K. (Eds.), Expert Systems in Finance: Smart Financial Applications in Big Data Environments, Routledge, London, pp. 80-96.

Ramos, J. (2003), “Using TF-IDF to determine word relevance in document queries”, in the Proceedings of the first instructional conference on machine learning, Piscataway, NJ, USA, Vol. 242, pp. 133-142.

Seljan, S., Baretić, M., Kučiš, V. (2014), “Information Retrieval and Terminology Extraction In Online Resources for Patients with Diabetes”, Collegium antropologicum, Vol. 38, No. 2, pp. 705-710.

Seljan, S., Dunđer, I., Stančić, H. (2017), “Extracting Terminology by Language Independent Methods”, in the Proceedings of the 2nd International Conference on Translation and Interpreting Studies, Innsbruck, Austria, Peter Lang, pp. 141-147.

RStudio (N/A), “Shiny”, available at: https://shiny.rstudio.com/ (16 March 2019).

Silge, J., Robinson, D. (2016), “tidytext: Text Mining and Analysis Using Tidy Data Principles in R”, The Journal of Open Source Software, Vol. 1, No. 3.

Stepanić, J., Zoroja, J., Šimičević, V. (2017), “Case Study in Interdisciplinary Scientific Communication: A Decade of the INDECS Journal”, Business Systems Research, Vol. 8, No. 2, pp. 101-114.

Twitter (N/A), “Twitter Developer”, available at: https://developer.twitter.com (5 April 2019).

Uys, J. W., Du Preez, N. D., Uys, E. W. (2008), “Leveraging unstructured information using topic modelling”, in the Proceedings of the Portland International Conference on Management of Engineering & Technology, Cape Town, South Africa, IEEE, pp. 955-961.

Vijayarani, S., Ilamathi, M. J., Nithya, M. (2015), “Preprocessing techniques for text mining-an overview”, International Journal of Computer Science & Communication Networks, Vol. 5, No. 1, pp. 7-16.

Wickham, H., Grolemund, G. (2016), R for data science: import, tidy, transform, visualize, and model data, O'Reilly Media, Inc.

Wickham, H. (2014), “Tidy data”, Journal of Statistical Software, Vol. 59, No. 10, pp. 1-23.

Downloads

Published

2019-10-31

How to Cite

Krstić, Živko, Seljan, S., & Zoroja, J. (2019). Visualization of Big Data Text Analytics in Financial Industry: A Case Study of Topic Extraction for Italian Banks. ENTRENOVA - ENTerprise REsearch InNOVAtion, 5(1), 35–43. Retrieved from https://hrcak.srce.hr/ojs/index.php/entrenova/article/view/13728

Issue

Section

Mathematical and Quantitative Methods