В связи с авторефератом интересная мне ссылка (там есть и хитрости про двумерную медиану и кластеры и прочие игры)
http://valis.cs.uiuc.edu/~sariel/papers/ А работа интересная, жаль автор немного и шире и дальше в прошлое методов не пытается посмотреть
Применения также интересны (и программные и прикладные)
И еще
http://users.cis.fiu.edu/~giri/publications.html И кроме того
X-means: Extending K-means with Efficient Estimation of the Number of Clusters (2000)
Dan Pelleg, Andrew Moore
Abstract
A K-means tutorial.
Despite its popularity for general clustering, k-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that efficiently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure. The innovations include two new ways of exploiting cached sufficient statistics and a new very efficient test that in one k-means sweep selects the most promising subset of classes for refinement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes and their parameters. Experiments show this technique reveals the true number of classes in the underlying distribution, and that it is much faster than repeatedly using accelerated k-means for different values of K.