Information gain as a feature selection method for the efficient classification of influenza based on viral hosts
Computer Science & Engineering Department
Lecture Notes in Engineering and Computer Science
The paper demonstrates the improvement in Influenza A classification based on viral host when applying feature selection on classical machine learning techniques. The impact of using the most informative DNA positions on classifier efficiency and performance was measured. Both decision trees (DTs) and neural networks (NNs) were used. The experiments were conducted on DNA sequences belonging to the PB1 and HA segments of subtypes H1 and H5 respectively. Sequences from each segment were further divided into human and nonhuman hosts prior to classification analysis. Accuracy, sensitivity, specificity, precision and time were used as performance measures. Extracting the best hundred informative positions with information gain increased classification efficiency by 90% for both classifiers, without compromising performance significantly. NNs performed better on both DNA segments than DTs, when decreasing the number of informative positions below a hundred. The classification speed of NNs was improved vastly compared to DTs, when classifying the H1, PB1 segment.
(2014). Information gain as a feature selection method for the efficient classification of influenza based on viral hosts. Lecture Notes in Engineering and Computer Science, 1, 625–631.
Shaltout, Nermeen A., et al.
"Information gain as a feature selection method for the efficient classification of influenza based on viral hosts." Lecture Notes in Engineering and Computer Science, vol. 1, 2014, pp. 625–631.