Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.17535/crorr.2017.0033

Estimation of minimum sample size for identification of the most important features: a case study providing a qualitative B2B sales data set

Marko Bohanec orcid id orcid.org/0000-0002-5295-5111 ; Salvirt Ltd., Dunajska cesta 136, SL-1 000 Ljubljana, Slovenia
Mirjana Kljajić Borštnar orcid id orcid.org/0000-0003-4608-9090 ; University of Maribor, Faculty of Organizational Sciences, Kidričeva cesta 55a, SL-4 000, Slovenia
Marko Robnik-Šikonja orcid id orcid.org/0000-0002-1232-3320 ; University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, SL-1 001 Ljubljana, Slovenia


Puni tekst: engleski pdf 423 Kb

str. 515-524

preuzimanja: 1.118

citiraj


Sažetak

An important task in machine learning is to reduce data set dimensionality, which in turn contributes to reducing computational load and data collection costs, while improving human understanding and interpretation of models. We introduce an operational guideline for determining the minimum number of instances sufficient to identify correct ranks of features with the highest impact. We conduct tests based on qualitative B2B sales forecasting data. The results show that a relatively small instance subset is sufficient for identifying the most important features when rank is not important.

Ključne riječi

data set reduction; B2B sales forecasting; machine learning; sample size

Hrčak ID:

193640

URI

https://hrcak.srce.hr/193640

Datum izdavanja:

30.12.2017.

Posjeta: 1.731 *