A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

Document Type : Original Research Papers

Authors

Abstract

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on their similarity; the one-by-one dependency between corresponding amino acids of two current sequences can be append to PHMM. This perspective makes it possible to consider a generalization of PHMM. For estimating the parameters of generalized PHMM (emission and transition probabilities), we introduce new forward and backward algorithms. The performance of generalized PHMM is discussed by applying it to the twenty protein families in Pfam database. Results show that the generalized PHMM significantly increases the accuracy of ordinary PHMM.

Keywords