Abstract

Classi cation is a central problem in the elds of data mining and machine learning. Using a training set of labeled instances, the task is to build a model (classi er) that can be used to predict the class of new unlabeled instances. Data preparation is crucial to the data mining process, and its focus is to improve the tness of the training data for the learning algorithms to produce more e ective classi ers. Two widely applied data preparation methods are feature selection and instance selection, which fall under the umbrella of data reduction. For my research I propose ADR-Miner, a novel data reduction algorithm that utilizes ant colony optimization (ACO). ADR-Miner is designed to perform instance selection to improve the predictive e ectiveness of the constructed classi cation models. Two versions of ADR-Miner are developed: a base version that uses a single classi cation algorithm during both training and testing, and an extended version which uses separate classi cation algorithms for each phase. The base version of the ADR-Miner algorithm is evaluated against 20 data sets using three classi cation algorithms, and the results are compared to a benchmark data reduction algorithm. The non-parametric Wilcoxon signed-ranks test will is employed to gauge the statistical signi cance of the results obtained. The extended version of ADR-Miner is evaluated against 37 data sets using pairings from fi ve classi cation algorithms and these results are benchmarked against the performance of the classi cation algorithms but without reduction applied as pre-processing. Keywords: Ant Colony Optimization (ACO), Data Mining, Classi cation, Data Reduction.

Department

Computer Science & Engineering Department

Degree Name

MS in Computer Science

Graduation Date

6-1-2015

Online Submission Date

May 2015

First Advisor

Abdelbar, Ashraf

Committee Member 1

Goneid, Amr

Committee Member 2

Ismail, Ismail Amr

Document Type

Master's Thesis

Extent

129 p.

Library of Congress Subject Heading 1

Artificial intelligence.

Library of Congress Subject Heading 2

Data mining.

Rights

The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.

IRB

Approval has been obtained for this item

Share

COinS