Skip to the main content

Professional paper

A Personalized and Scalable Machine Learning-Based File Management System

Bansal Veena ; Indian Institute of Technology Kanpur, Kalyanpur, Kanpur, 208016, India
Sati Dhiraj ; Capital Business Systems Pvt. Ltd. 288 A, Phase IV, Udyog Vihar, Sector 19 Gurugram, 122016, India

Full text: english pdf 1.266 Kb

page 288-292

downloads: 294



In this work, we present a hybrid image and document filing system that we have built. When a user wants to store a file in the system, it is processed to generate tags using an appropriate open-source machine learning system. Presently, we use OpenCV and Tesseract OCR for tagging files. OpenCV recognizes objects in the images and TesserAct recognizes text in the image. An image file is processed for object recognition using OpenCV as well for text/captions process using TesserAct, which are used for tagging the file. All other files are processed using Tesseract only for generating tags. The user can also enter their own tags. A database system has been built that stores tags and the image path. Every file is stored with its owner identification and it is time-stamped. The system has a client-server architecture and can be used for storing and retrieving a large number of files. This is a highly scalable system.


document database; file tags; image tags; object detection; personalized filing system; scalable; tags database

Hrčak ID:



Publication date:


Visits: 808 *