Document Type: Research Paper
University of California, San Francisco
Systems, Life science and Control Engineering Lab., Electrical & Computer Engineering School, TarbiatModares University, Tehran, Iran
Systems, Life science and Control Engineering Lab., Electrical & Computer Engineering School, Tarbiat Modares University, Tehran, Iran.
Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant genes from a published microarray dataset. The approach is comprised of combination of fisher criteria, SAM (Significance Analysis for Microarrays), and GA/SVM (Genetic Algorithm/ Support Vector Machine). To get rid of noisy and redundant genes in high dimensional microarray data, the Fisher method is used. SAM technique is utilized and different subsets of highly informative genes are selected by GA/SVM which uses different training sets. The final subset, highly informative genes, is achieved by analyzing the number of times each gene occurs in the different gene subsets. The proposed method was tested on microarray data of Alzheimer’s disease (AD) and the biological significance of identified genes was evaluated, and the results were compared with those of previous studies. The results indicate that the proposed method has a good selection and classification performance, which can produce 94.55 of classification accuracy by use of only 44 genes. From biological point of view, at least 24 (55%) of these genes are Alzheimer associated genes. Analysis of these genes by GO and KEGG led to identification of AD-related terms and pathways. These genes can act as predictors of the disease as well as a mean to find new candidate genes.