iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

Document Type : Research Paper

Authors

1 Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

2 National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran 14155-6346, Iran

Abstract

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accuracy in protein family detection we will get. In this paper, we have provided a method for improving the PROSITE patterns by reconstructing them in a manner that they not only still match to true positive hits, but also match to less false positive hits. From 973 PROSITE patterns, 283 have been improved by our method. We have applied the provided method on the PROSITE database and the improved resulting database is available at http://cbp.ut.ac.ir/iPROSITE.

Keywords