Data visualization has gained a lot of attention after the stressing need to make sense of the huge amounts of data that we collect every day. Lower dimensional embedding techniques such as IsoMap, Locally Linear Embedding and t-SNE help us visualize high dimensional data by projecting it on a two or three-dimensional space. t-SNE, or t-Distributed Stochastic Neighbor Embedding proved to be successful in providing lower dimensional data mappings that makes interpreting the underlying structure of data easier for our human brains. We wanted to test the hypothesis that this simple visualization that human beings can easily understand will also simplify the job of the classification models and boost their performance. In order to test this hypothesis, we reduce the dimensionality of a student performance dataset using t-SNE into 2D and 3D and feed the calculated 2D and 3D feature vectors into a classifier to classify students according to their predicted performance. We compare the classifier performance before and after the dimensionality reduction. Our experiments showed that t-SNE helps improve classification accuracy of NN and KNN on a benchmarking dataset as well as a user-curated dataset on performance of students at our home institution. We also visually compared the 2D and 3D mapping of t-SNE and PCA. Our comparison favored t-SNE's visualization over PC's. This was also reflected in the classification accuracy of all classifiers used, scoring higher on t-SNE's mapping than on the PCA's mapping.
Computer Science & Engineering Department
MS in Computer Science
Date of Award
Online Submission Date
Committee Member 1
Committee Member 2
The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
Approval has been obtained for this item
(2017).Visualization as a guidance to classification for large datasets [Master’s thesis, the American University in Cairo]. AUC Knowledge Fountain.
Atteya, Heba Abdelfattah. Visualization as a guidance to classification for large datasets. 2017. American University in Cairo, Master's thesis. AUC Knowledge Fountain.