biolover: The PSIPRED Protein Structure Prediction Server

The PSIPRED Protein Structure Prediction Server

Predict Secondary Structure (PSIPRED)

PSIPRED is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 2.6 achieves an average Q₃ score of 80.7%.

Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q₃ score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998).

It is important to realise, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current "state of the art". For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. Also see the EVA servlet to visualize a breakdown of specific types of errors made by PSIPRED and other secondary structure prediction methods. NOTE that at the time of writing, the EVA site is no longer being updated.

Downloads: The PSIPRED V2.6 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server.

Older releases of PSIPRED can be downloaded here HERE.

Predict Transmembrane Topology (MEMSAT)

MEMSAT V3 is the latest version of the widely used all-helical membrane protein prediction method MEMSAT. The method was benchmarked on a test set of transmembrane proteins of known topology. From sequence data MEMSAT was estimated to have an accuracy of over 78% at predicting the structure of all-helical transmembrane proteins and the location of their constituent helical elements within a membrane.

Academic users can download MEMSAT3 code here.

Fold Recognition (GenTHREADER)

GenTHREADER is a fast and relatively powerful fold recognition method, which can be applied to either whole, translated genomic sequences (proteomes) as in the case of the GTD or individual protein sequences as in the case of the PSIPRED server. It is not as sensitive at mGenTHREADER but is much faster.

Fold Recognition (mGenTHREADER)

This method is now our recommended method for fold recognition and identification of distant homologues. Essentially it is the based on the original GenTHREADER method, but makes use of profile-profile alignments and predicted secondary structure (using PSIPRED) as inputs. This increases both the sensitivity of the method and enhances the accuracy of alignments, but also makes it much slower than the normal GenTHREADER method as PSI-BLAST needs to be run on the target sequence before the search can begin.

Domain Recognition (pDomTHREADER)

pDomTHREADER is an accurate and sensitive superfamily discrimination, combining information from both sequence and structure to produce highly accurate domain alignments. The method employs the same underlying threading algorithm as pGenTHREADER, however it aligns sequences to a domain-based template library rather than a chain-based template library. The use of smaller regions of structure for templates means that different features of the alignments are required for optimal scoring. The final prediction score results from an SVM trained on a combination of 5 different feature inputs; template coverage, alignment score, template length, solvation and pairwise potentials.

Compared with other superfamily discrimination methods using Hidden Markov Models and PSI-BLAST profile alignments, we found that pDomTHREADER provided higher coverage on the CATH S35 superfamilies. Additionally, pDomTHREADER produced more accurate alignments that can be used to better predict domain boundaries. For more information regarding the method, please consult the reference above.

Please note that the pDomTHREADER method is tuned for performance in fine superfamily discrimination, for fold recognition problems or structural annotation of very distant sequences, pGenTHREADER should be used.

Currently loaded data banks

Sequences: Filtered UNIREF90 (updated weekly)
Fold library: 16820 chains (last updated 1/3/2008) + weekly updates

biolover

Sunday, January 8, 2012