Estimating the number of clusters in multivariate data by various fittings of the L-curve
Second Author's Department
Mathematics & Actuarial Science Department
Find in your Library
https://doi.org/10.1007/s40314-024-02839-8
Document Type
Research Article
Publication Title
Computational and Applied Mathematics
Publication Date
2-1-2025
doi
10.1007/s40314-024-02839-8
Abstract
The goal of this paper is to estimate the true but unknown number of clusters K in multivariate data. The contributions are two folds. The first is to narrow the search space for the estimates k^ to 1≤k^≤Kmax. We propose a new method for finding Kmax, which is better than the existing ones. The second is to propose three indices for computing k^ within the range 1≤k^≤Kmax: The R-Index, the FB Index, and the CSum Index. All three indices are based on the L-curve (the plot of Wk vs. k), where Wk is the total within-cluster-similarity (withinness), for values of k in the above range. We give the rationale for each method. We investigate the performance of these three indices and compare them with six of the most commonly used indices using both real benchmark datasets and a challenging synthetic data of varying sample sizes (n=200 to n=3600) and varying number of true clusters K ranging from K=2 to K=36. We use both the Hierarchical clustering and the k-Means clustering algorithms, but the approach can also be used with other clustering methods. The three indices are shown to outperform the existing ones. An additional advantage of our indices is computational complexity, where it is shown that they take much less time to compute than the existing ones.
Recommended Citation
APA Citation
Moustafa, R.
&
Hadi, A.
(2025). Estimating the number of clusters in multivariate data by various fittings of the L-curve. Computational and Applied Mathematics, 44(1),
10.1007/s40314-024-02839-8
https://fount.aucegypt.edu/faculty_journal_articles/6222
MLA Citation
Moustafa, Rida, et al.
"Estimating the number of clusters in multivariate data by various fittings of the L-curve." Computational and Applied Mathematics, vol. 44,no. 1, 2025,
https://fount.aucegypt.edu/faculty_journal_articles/6222
Comments
Article. Record derived from SCOPUS.