News
NEW DOCTORAL DEGREES Methods and algorithms of data clustering based on special distance-like functions
Ivan Vazler
; Department of Mathematics, University of Osijek, Osijek, Croatia
Abstract
In this dissertation the problem of grouping a data set $\mathcal{A}\subset\mathbb{R}$ into $k$ disjunct nonempty subsets called clusters using the least squares distance like function or least absolute deviations distance like function with the goal of finding well-separated compact clusters is considered.
New results concerning the existence and characterization of LAD-optimality, and appropriate adjustments to the well known k-means algorithm to search for the LAD locally optimal partitions are presented. This approach is particularly interesting for practical reasons, for it ignores the presence of strongly protruding data (outliers). Concerning this, we analyzed the properties and methods of calculation of the weighted median of data from $\mathbb{R}^n$.
Special attention is given to the problem of choosing the ``proper" number of clusters in a partition, whereby numerous recent literature was consulted. With the goal of defining a new index of acceptability of a number of clusters, we paid close attention to the problem of cluster stability.
The dissertation names several important applications with a strong emphasis on the application of cluster analysis for hourly energy consumption forecast (natural gas, electricity, water, etc.).
Original scientific contributions of this thesis are a new data clustering algorithm based on the application of the LS and LAD distance-like functions, new methods of determing an appropriate number of clusters, definition of new indices and comparison with known approaches with verification on large data sets and application of new clustering algorithms for dynamic modeling of hourly consumption of natural resources with verification of computational complexity and comparison with other approaches.
It should also be noted that, complementing the thesis, an appropriate software support is available at \texttt{http://www.mathos.unios.hr/oml/software.htm}.
Keywords
Hrčak ID:
93306
URI
Publication date:
5.12.2012.
Visits: 1.394 *