Estimation of minimum sample size for identication of the most important features: a case study providing a qualitative B2B sales data set


  • Marko Bohanec Salvirt ltd.
  • Mirjana Kljajić Borštnar University of Maribor, Faculty of Organizational Sciences, Kidričeva cesta 55a, 4000 Kranj
  • Marko Robnik-Šikonja University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113 , 1001 Ljubljana


An important task in machine learning is to reduce data set dimensionality, which in turn contributes to reducing computational load and data collection costs, while improving human understanding and interpretation of models. We introduce an operational guideline for determining the minimum number of instances sucient to identify correct ranks of features with the highest impact. We conduct tests based on qualitative B2B sales forecasting data. The results show that a relatively small instance subset is sucient for identifying the most important features when rank is not important. 






CRORR Journal Regular Issue