Technical Journal, Vol. 16 No. 2, 2022.
Professional paper
https://doi.org/10.31803/tg-20220408162739
A Personalized and Scalable Machine Learning-Based File Management System
Bansal Veena
; Indian Institute of Technology Kanpur, Kalyanpur, Kanpur, 208016, India
Sati Dhiraj
; Capital Business Systems Pvt. Ltd. 288 A, Phase IV, Udyog Vihar, Sector 19 Gurugram, 122016, India
Abstract
In this work, we present a hybrid image and document filing system that we have built. When a user wants to store a file in the system, it is processed to generate tags using an appropriate open-source machine learning system. Presently, we use OpenCV and Tesseract OCR for tagging files. OpenCV recognizes objects in the images and TesserAct recognizes text in the image. An image file is processed for object recognition using OpenCV as well for text/captions process using TesserAct, which are used for tagging the file. All other files are processed using Tesseract only for generating tags. The user can also enter their own tags. A database system has been built that stores tags and the image path. Every file is stored with its owner identification and it is time-stamped. The system has a client-server architecture and can be used for storing and retrieving a large number of files. This is a highly scalable system.
Keywords
document database; file tags; image tags; object detection; personalized filing system; scalable; tags database
Hrčak ID:
276163
URI
Publication date:
8.5.2022.
Visits: 1.370 *