Faculty Journal Articles

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Rida Moustafa, Dmining-Technology
Ali S. Hadi, The American University in Cairo

Second Author's Department

Mathematics & Actuarial Science Department

Find in your Library

https://doi.org/10.1007/s40314-024-02839-8

All Authors

Rida Moustafa, Ali S. Hadi

Document Type

Research Article

Publication Title

Computational and Applied Mathematics

Publication Date

2-1-2025

doi

10.1007/s40314-024-02839-8

Abstract

The goal of this paper is to estimate the true but unknown number of clusters K in multivariate data. The contributions are two folds. The first is to narrow the search space for the estimates k^ to 1≤k^≤Kmax. We propose a new method for finding Kmax, which is better than the existing ones. The second is to propose three indices for computing k^ within the range 1≤k^≤Kmax: The R-Index, the FB Index, and the CSum Index. All three indices are based on the L-curve (the plot of Wk vs. k), where Wk is the total within-cluster-similarity (withinness), for values of k in the above range. We give the rationale for each method. We investigate the performance of these three indices and compare them with six of the most commonly used indices using both real benchmark datasets and a challenging synthetic data of varying sample sizes (n=200 to n=3600) and varying number of true clusters K ranging from K=2 to K=36. We use both the Hierarchical clustering and the k-Means clustering algorithms, but the approach can also be used with other clustering methods. The three indices are shown to outperform the existing ones. An additional advantage of our indices is computational complexity, where it is shown that they take much less time to compute than the existing ones.

Comments

Article. Record derived from SCOPUS.

Recommended Citation

APA Citation

Moustafa, R. & Hadi, A. (2025). Estimating the number of clusters in multivariate data by various fittings of the L-curve. Computational and Applied Mathematics, 44(1), https://doi.org/10.1007/s40314-024-02839-8

MLA Citation

Moustafa, Rida, et al. "Estimating the number of clusters in multivariate data by various fittings of the L-curve." Computational and Applied Mathematics, vol. 44, no. 1, 2025
https://doi.org/10.1007/s40314-024-02839-8

Link to Full Text

Find this item online

COinS

Faculty Journal Articles

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Second Author's Department

Find in your Library

All Authors

Document Type

Publication Title

Publication Date

doi

Abstract

Comments

Recommended Citation

APA Citation

MLA Citation

Search

Browse

Submit

Faculty Journal Articles

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Authors

Second Author's Department

Find in your Library

All Authors

Document Type

Publication Title

Publication Date

doi

Abstract

Comments

Recommended Citation

APA Citation

MLA Citation

Share

Search

Browse

Submit