Information gain as a feature selection method for the efficient classification of influenza based on viral hosts
Author's Department
Computer Science & Engineering Department
Document Type
Research Article
Publication Title
Lecture Notes in Engineering and Computer Science
Publication Date
1-1-2014
Abstract
The paper demonstrates the improvement in Influenza A classification based on viral host when applying feature selection on classical machine learning techniques. The impact of using the most informative DNA positions on classifier efficiency and performance was measured. Both decision trees (DTs) and neural networks (NNs) were used. The experiments were conducted on DNA sequences belonging to the PB1 and HA segments of subtypes H1 and H5 respectively. Sequences from each segment were further divided into human and nonhuman hosts prior to classification analysis. Accuracy, sensitivity, specificity, precision and time were used as performance measures. Extracting the best hundred informative positions with information gain increased classification efficiency by 90% for both classifiers, without compromising performance significantly. NNs performed better on both DNA segments than DTs, when decreasing the number of informative positions below a hundred. The classification speed of NNs was improved vastly compared to DTs, when classifying the H1, PB1 segment.
First Page
625
Last Page
631
Recommended Citation
APA Citation
Shaltout, N.
El-Hefnawi, M.
Rafea, A.
&
Moustafa, A.
(2014). Information gain as a feature selection method for the efficient classification of influenza based on viral hosts. Lecture Notes in Engineering and Computer Science, 1, 625–631.
https://fount.aucegypt.edu/faculty_journal_articles/1849
MLA Citation
Shaltout, Nermeen A., et al.
"Information gain as a feature selection method for the efficient classification of influenza based on viral hosts." Lecture Notes in Engineering and Computer Science, vol. 1, 2014, pp. 625–631.
https://fount.aucegypt.edu/faculty_journal_articles/1849