Information gain as a feature selection method for the efficient classification of influenza based on viral hosts

Author's Department

Computer Science & Engineering Department

Document Type

Research Article

Publication Title

Lecture Notes in Engineering and Computer Science

Publication Date

1-1-2014

Abstract

The paper demonstrates the improvement in Influenza A classification based on viral host when applying feature selection on classical machine learning techniques. The impact of using the most informative DNA positions on classifier efficiency and performance was measured. Both decision trees (DTs) and neural networks (NNs) were used. The experiments were conducted on DNA sequences belonging to the PB1 and HA segments of subtypes H1 and H5 respectively. Sequences from each segment were further divided into human and nonhuman hosts prior to classification analysis. Accuracy, sensitivity, specificity, precision and time were used as performance measures. Extracting the best hundred informative positions with information gain increased classification efficiency by 90% for both classifiers, without compromising performance significantly. NNs performed better on both DNA segments than DTs, when decreasing the number of informative positions below a hundred. The classification speed of NNs was improved vastly compared to DTs, when classifying the H1, PB1 segment.

First Page

625

Last Page

631

This document is currently not available here.

Share

COinS