Abstract

Ensemble encoding is a biologically-motivated, distributed data representation scheme for MLP networks. Multiple overlapping receptive fields are used to enhance locality of representation. The number, form, and placement of receptive fields have a great impact on performance. This thesis presents four heuristics, two based on descriptive statistics, and two based on clustering, for optimizing receptive field configuration, and compares their performance on four benchmark data sets. The two statistical approaches are based on the mean and median properties of the data set. The two clustering methods are the c-means and fuzzy c-means clustering. The four data sets used are well-known machine learning benchmarks, which are breast cancer diagnosis, predicting the contraceptive method used by women in Indonesia based on social and economic status, predicting whether a hepatitis patient would live or die based on symptoms and clinical observations, and predicting the protein localization sites for e.coli bacteria. Performance varies among the benchmarks, but on one benchmark, the fuzzy clustering heuristic yields a 56.6% improvement in test set classification over unencoded data, and a 48.98% improvement over symmetrical-placement three-receptor ensemble encoding. The thesis provides an extension to the original symmetrical placement approach proposed by Narayan, which focused only on the use of three receptive fields. Some experimentation was done on extending the number of receptors to be allocated using symmetrical placement. Further more, this thesis explores the possibility of extending backpropagation to incorporate the parameters of the receptive fields into the learning process. Results show that such an extension is outperformed by the proposed clustering heuristics. Previous work introduced the idea of using standard deviation for dilation of receptive fields. Experiments were run using different fractions of the standard deviation with the available datasets. Results show that such an approach doesn't result in significant improvement in performance. All the experiments were run using leave-one-out cross validation to guarantee a fair evaluation of the trained networks. Moreover, Analysis of Variance is used to confirm that the results are of a statistical significance. The total number of networks trained during the experimental process is 182,844 networks.

School

School of Sciences and Engineering

Department

Computer Science & Engineering Department

Degree Name

MS in Computer Science

Date of Award

6-1-2003

Online Submission Date

1-1-2003

First Advisor

Ashraf Abdelbar

Document Type

Thesis

Extent

244 leaves

Library of Congress Subject Heading 1

Computer networks.

Library of Congress Subject Heading 2

Cluster set theory

Rights

The American University in Cairo grants authors of theses and dissertations a maximum embargo period of two years from the date of submission, upon request. After the embargo elapses, these documents are made available publicly. If you are the author of this thesis or dissertation, and would like to request an exceptional extension of the embargo period, please write to thesisadmin@aucegypt.edu

Call Number

Thesis 2003/43

Location

mgfth;mrs2

Share

COinS