Authors
1
Department of Mathematics, K.N.Toosi University of Technology. Tehran, Iran.; Bioinformatics Research Group, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
2
The National Organization for Educational Testing (NOET), Ministry of Science, Research and Technology.
3
School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Iran.; Bioinformatics Research Group, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
4
Department of Biophysics, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
5
Faculty of Mathematical Science, Shahid-Beheshti University, G. C., Tehran, Iran.
Abstract
A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the both left and right sides of a hidden state can improve the assignment task. For this purpose, bidirectional profile hidden Markov model (BPHMM) can be used. Also, because of the evolutionary relationship between sequences in a protein family, the information of the corresponding amino acid in the preceding sequence of residues in the PHMM can be considered. For this purpose the hidden Markov random field on regular lattice (HMRFRL) is introduced. In a PHMM, the parameters are defined by the transition and emission probability matrices. The parameters are usually estimated using an EM (Expectation-Maximization) algorithm known as Baum-Welch algorithm. In this paper, the bidirectional Baum-Welch algorithm and theBaum-Welch algorithm on regular lattice are defined for estimating the parameters of the BPHMM and the HMRFRL respectively. We also compare the performance of common Baum-Welch algorithm, bidirectional Baum-Welch algorithm and the Baum-Welch algorithm on regular lattice by applying them to the real top ten protein families from Pfam database. Results show that using the lattice model for sequence assignment increases the number of correctly assigned protein sequences to profiles compared to BPHMM .