Automatic classification of highly related Malate Dehydrogenase and L-Lactate Dehydrogenase based on 3D-pattern of active sites

Document Type: Original Research Papers

Authors

1 Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

2 Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran; Department of Life Sciences, Barcelona Supercomputing Center, Barcelona, Spain

3 School of Computer Sciences ,Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

Abstract

Accurate protein function prediction is an important subject in bioinformatics, especially where
sequentially and structurally similar proteins have different functions. Malate dehydrogenase
and L-lactate dehydrogenase are two evolutionary related enzymes, which exist in a wide
variety of organisms. These enzymes are sequentially and structurally similar and share
common active site residues, spatial patterns and molecular mechanisms. Here, we study
various features of the active site cavity of 229 PDB chain entries and try to classify them
automatically by various classifiers including the support vector machine, k nearest neighbour
and random forest methods. The results show that the support vector machine yields the highest
predictive performance among mentioned classifiers. Despite very close and conserved patterns
among Malate dehydrogenases and L-lactate dehydrogenases, the SVM predicts the function
efficiently and achieves 0.973 Matthew’s correlation coefficient and 0.987 F-score. The same
approach can be used in other enzyme families for automatic discrimination between
homologous enzymes with common active site elements, however, acting on different
substrates.
 

Keywords

Main Subjects