biolover: Prediction

Showing posts with label Prediction. Show all posts

Sunday, January 8, 2012

ΔG prediction server v1.0

Given the amino acid sequence of a putative transmembrane (TM) helix, the server gives a prediction of the corresponding apparent free energy difference, ΔG_app, for insertion of this sequence into the Endoplasmic Reticulum (ER) membrane by means of the Sec61 translocon. The server runs in two different "modes", for two different types of queries:

ΔG prediction. Predict ΔG_app for membrane insertion of a potential TM helix.
Full protein scan. Scan a protein sequence for putative TM helices.

http://dgpred.cbr.su.se/index.php?p=home

TOPCONS

TOPCONS

1. Summary

Given the amino acid sequence of a putative alpha-helical membrane protein, TOPCONS predicts the topology of the protein, i.e. a specification of the membrane spanning segments and their IN/OUT orientation relative to the membrane. The prediction is a consensus from five different topology prediction algorithms: SCAMPI (single sequence mode), SCAMPI (multiple sequence mode), PRODIV-TMHMM, PRO-TMHMM and OCTOPUS. These five predictions are used as input to the TOPCONS hidden Markov model (HMM), which gives a consensus prediction for the protein, together with a reliability score based on the agreement of the included methods across the sequence. In addition, ZPRED is used to predict the Z-coordinate (i.e. the distance to the membrane center) of each amino acid, and the ΔG-scale is used to predict the free energy of membrane insertion for a window of 21 amino acids centered around each position in the sequence. For an explanation of the methods included in the server, see the corresponding links in the left hand menu.

Note that the server does not predict cleavable signal peptides, which are easily confused with TM segments. If signal peptides are likely to be present in the input data, a separate signal peptide predictor such as SignalP should first be applied and predicted signal peptides cleaved off before submitting the sequence to TOPCONS.

2. Usage

Input to the server is an amino acid sequence in FASTA format. Due to computational limitations, only one sequence per query is allowed. For large benchmark sets and full proteome scans, use the SCAMPI server instead. A sequence profile is created for the input sequence using BLAST, and this profile is used as input to all the different methods (except SCAMPI-seq, where only the query sequence is used).

Example input:
>sp|O93740|BACR_HALS4 Bacteriorhodopsin Halobacterium sp.
MCCAALAPPMAATVGPESIWLWIGTIGMTLGTLYFVGRGRGVRDRKMQEFYIITIFITTI
AAAMYFAMATGFGVTEVMVGDEALTIYWARYADWLFTTPLLLLDLSLLAGANRNTIATLI
GLDVFMIGTGAIAALSSTPGTRIAWWAISTGALLALLYVLVGTLSENARNRAPEVASLFG
RLRNLVIALWFLYPVVWILGTEGTFGILPLYWETAAFMVLDLSAKVGFGVILLQSRSVLE
RVATPTAAPT

Optionally, parts of the sequence can be constrained to a known Inside/Outside/Membrane-location, by clicking the Restrainment options. Apart from N- and C-terminal constraints, any type of constraint can be entered in the Other textbox using the format: [first]-[last]-[label]; where [first] is first residue and [last] is last residue in restrained range, and [label] is i (Inside), o (Outside) or M (Membrane).

Example:
1-1-o;20-25-M;

3. Output

The server outputs the topology predictions using all the individual methods, as well as the consensus prediction (TOPCONS). In addition, predicted Z-coordinates, predicted ΔG-values and reliability scores are given for each position in the sequence. The results are both displayed graphically and are available for download in text format in the TOPCONS result file. The BLAST output, which is used as input to the methods, is available in the BLAST result file. High-resolution versions of the images are also available for download.

The PSIPRED Protein Structure Prediction Server

Predict Secondary Structure (PSIPRED)

PSIPRED is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 2.6 achieves an average Q₃ score of 80.7%.

Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q₃ score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998).

It is important to realise, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current "state of the art". For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. Also see the EVA servlet to visualize a breakdown of specific types of errors made by PSIPRED and other secondary structure prediction methods. NOTE that at the time of writing, the EVA site is no longer being updated.

Downloads: The PSIPRED V2.6 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server.

Older releases of PSIPRED can be downloaded here HERE.

Predict Transmembrane Topology (MEMSAT)

MEMSAT V3 is the latest version of the widely used all-helical membrane protein prediction method MEMSAT. The method was benchmarked on a test set of transmembrane proteins of known topology. From sequence data MEMSAT was estimated to have an accuracy of over 78% at predicting the structure of all-helical transmembrane proteins and the location of their constituent helical elements within a membrane.

Academic users can download MEMSAT3 code here.

Fold Recognition (GenTHREADER)

GenTHREADER is a fast and relatively powerful fold recognition method, which can be applied to either whole, translated genomic sequences (proteomes) as in the case of the GTD or individual protein sequences as in the case of the PSIPRED server. It is not as sensitive at mGenTHREADER but is much faster.

Fold Recognition (mGenTHREADER)

This method is now our recommended method for fold recognition and identification of distant homologues. Essentially it is the based on the original GenTHREADER method, but makes use of profile-profile alignments and predicted secondary structure (using PSIPRED) as inputs. This increases both the sensitivity of the method and enhances the accuracy of alignments, but also makes it much slower than the normal GenTHREADER method as PSI-BLAST needs to be run on the target sequence before the search can begin.

Domain Recognition (pDomTHREADER)

pDomTHREADER is an accurate and sensitive superfamily discrimination, combining information from both sequence and structure to produce highly accurate domain alignments. The method employs the same underlying threading algorithm as pGenTHREADER, however it aligns sequences to a domain-based template library rather than a chain-based template library. The use of smaller regions of structure for templates means that different features of the alignments are required for optimal scoring. The final prediction score results from an SVM trained on a combination of 5 different feature inputs; template coverage, alignment score, template length, solvation and pairwise potentials.

Compared with other superfamily discrimination methods using Hidden Markov Models and PSI-BLAST profile alignments, we found that pDomTHREADER provided higher coverage on the CATH S35 superfamilies. Additionally, pDomTHREADER produced more accurate alignments that can be used to better predict domain boundaries. For more information regarding the method, please consult the reference above.

Please note that the pDomTHREADER method is tuned for performance in fine superfamily discrimination, for fold recognition problems or structural annotation of very distant sequences, pGenTHREADER should be used.

Currently loaded data banks

Sequences: Filtered UNIREF90 (updated weekly)
Fold library: 16820 chains (last updated 1/3/2008) + weekly updates

The NPS@ Web server

NPS@ stands for Network Protein Sequence @nalysis.

NPS@ is an interactive Web server dedicated to protein sequence analysis and available for the biologist community at URL: http://npsa-devel.ibcp.fr/.

NPS@ is the "protein part" of the "Pôle Bio-Informatique Lyonnais" (PBIL).

What kind of analysis can you carry out with NPS@ ?

Sequence similarity search with FASTA, BLAST, PSI-BLAST, and SSEARCH on protein databases such as SWISS-PROT, SP-TrEMBL or NRL_3D.
Sites and signatures detection with PATTINPROT or PROSCAN. PATTINPROT allow a search of one or several pattern on a protein database or on an individual sequence. PROSCAN scan a protein sequence against PROSITE.
Multiple alignment with CLUSTALW or MULTALIN.
Secondary structure prediction with 12 differents methods and a consensus prediction of those methods. Available methods are SOPM, SOPMA, HNN, MLRC, DPM, DSC, GORI, GORII, GORIV, PHD, PREDATOR and SIMPA96.
Primary structure analysis such as : physico-chemical profiles (7 profiles), coil-coiled detection (Lupas method), helix-turn-helix DNA-binding motifs preditcion (Dodd & Egan), amino-acids composition and sequence coloring.

What do you need to use NPS@ ?

a sequence (HELP).
a sequence base in Pearson/FASTA format (HELP).
a pattern with PROSITE syntax (HELP).

What are NPS@'s strong points ?

All methods proposed by NPS@ are piped.
The ouput of one method could be the input of another one. For example, after you've performed a BLAST search, you can make a database of full or partial sequences. Thenafter, this database could be aligned by multiple alignment programs (CLUSTALW,MULTALIN), filtered by a pattern search (PATTINPROT) or you can apply NPS@'s methods on each sequence of the database. And this, with no cut and paste.
You can insert secondary structure prediction in multiple alignment.
You can upload your own database and apply NPS@'s methods on it.
You can download NPS@'s data in protein sequence analysis softwares on your local computer for further analysis, to save them or insert them in an article...
The NPSA link allows you to apply NPS@'s methods on a sequence. Even more, when the sequence comes from a 3D database (NRL-3D), you have some useful links to retrieve and work with 3D data.
NPS@ offers links on international databases (SWISSPROT, PROSITE, CATH, SCOP,...).
NPS@ works with data of an ACNUC query.

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSAHLP/npsahlp_npsageneral.html

Friday, January 6, 2012

3D structure prediction based on amino acid sequnces

ITASSER

On-line I-TASSER Video

http://www.jove.com/video/3259

What is I-TASSER server?

I-TASSER server is an internet service for protein structure and function predictions. It allows acedemic users to automatically generate high-quality predictions of 3D structure and biological function of protein molecules from their amino acid sequences.

How does I-TASSER generate structure and function predictions?

When users submit an amino acid sequence, the server first tries to retrieve template proteins of similar folds (or super-secondary structures) from the PDB library by LOMETS, a locally installed meta-threading approach.

In the second step, the continuous fragments excised from the PDB templates are reassembled into full-length models by replica-exchange Monte Carlo simulations with the threading unaligned regions (mainly loops) built by ab initio modeling. In cases where no appropriate template is identified by LOMETS, I-TASSER will build the whole structures by ab initio modeling. The low free-energy states are identified by SPICKER through clustering the simulation decoys.

In the third step, the fragment assembly simulation is performed again starting from the SPICKER cluster centroids, where the spatial restrains collected from both the LOMETS templates and the PDB structures by TM-align are used to guide the simulations. The purpose of the second iteration is to remove the steric clash as well as to refine the global topology of the cluster centroids. The decoys generated in the second simulations are then clustered and the lowest energy structures are selected. The final full-atomic models are obtained by REMO which builds the atomic details from the selected I-TASSER decoys through the optimization of the hydrogen-bonding network (see Figure 1).

Figure 1. I-TASSER protocol for protein structure and function prediction.

For predicting the biological function of the protein (the last column at Figure 1), the I-TASSER server matches the predicted 3D models to the proteins in 3 independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites. The final results of function predictions are deduced from the consensus of top structural matches with the function scores calculated based on the confidence score of the I-TASSER structural models, the structural similarity between model and templates as evaluated by TM-score, and the sequence identity in the structurally aligned regions [A similar approach to structure-based function annotation was proposed by Brylinski and Skolnick (PNAS 2008. 205:129) who tried to match the target structures on the threading templates. Here the I-TASSER server matches the target models on all template proteins in the libraries].

What are the performances of I-TASSER server compared with other methods?

CASP (or Critical Assessment of Techniques for Protein Structure Prediction) is a community-wide experiment for testing the state-of-the-art of protein structure predictions which takes place every two years since 1994. The experiment (often referred as a competition) is strictly blind because the structures of testing proteins are unknown to the predictors.

The I-TASSER server (as "Zhang-Server") participated in the Server Section of 7th (2006), 8th (2008), and 9th CASPs (2010), and was ranked as the No 1 server in CASP7 and CASP8. In CASP9, I-TASSER server and QUARK (another server from our lab) were ranked as No 1 and No 2 servers, respectively. The detailed rank results can be seen here for CASP7, CASP8, and CASP9. Figure 2 shows histograms of the Z-score of GDT-TS scores of all servers in CASP7 (68 servers), CASP8 (81 servers), and CASP9 (81 servers).

Figure 2. Histogram of Z-scores of all server groups at CASP7, CASP8 and CASP9.

What are the output of the I-TASSER server if you submit a seqeunce?

The output of the I-TASSER server include:

· Up to five full-length atomic models (ranked based on cluster density)

· Estimated accuracy of the predicted models (including a confidence score of all models, and predicted TM-score and RMSD for the first model)

· GIF images of the predicted models

· Predicted secondary structures

· Predicted solvent accessibility

· Top 10 threading alignment from LOMETS

· Top 10 proteins in PDB which are structurally closest to the predicted models

· Predicted Enzyme Classification and the confidence score

· Predicted GO terms and the confidence score

· Predicted ligand-binding sites and the confidence score

· An image of the predicted ligand-binding sites

An illustrative example of the I-TASSER output can be seen from here.

How to use known information (e.g. templates and function) to improve I-TASSER modeling?

If users know some information about the structure of the modeled proteins, the information can be conveniently uploaded to the I-TASSER server. These information can significantly improve the quality of structural and function predictions.

The I-TASSER server currently accepts two types of user-specified restraints:

(1) inter-residue contant and distance restraints;
(2) template structures and template-target alignments.

The server provides 4 convenient options to assign the restraints:

· Assign contact/distance restraints: If you know what atom pairs should be in contact or in some distances, you can use this option to upload a text file including the contact and/or distance information of atom pairs.

· Specify template without alignment: If you want I-TASSER to use a specific PDB structure as a template, you can use this option specify the PDB structure. You only need to type in the PDBID:ChainID, e.g. 1wor:A without specifying the target-template alignments. If the chain information is not present in the PDB file, indicate the ChainID using "_". I-TASSER will first fetch the structure from the PDB library and then generate the target-template alignment based on our in-house alignment tool, MUSTER.

· Specify template without alignment: You can actually use any 3D structure as the template, which does not necessary exist in the PDB library. In this case, you can use this option to upload the 3D structure. This structure file must be in the standard PDB format. You do not need to input the target-template alignments. I-TASSER will generate target-template alignment based on our in-house alignment tool, MUSTER.

· Specify template with alignment: This option allows you (usually the advanced users) to specify both template structure and the target-template alignment.

Please refer to adding restraints to I-TASSER modeling to view more detail illustrations.

Can I exclude some proteins from the I-TASSER template library?

I-TASSER needs templates to generate high-resolution structure predictions. In general, excluding close templates will decrease the quality of the I-TASSER modeling. However, users can exclude some templates from the I-TASSER template library for some special purposes (e.g. knowning some templates are different from target, or benchmark testing of the current algorithms).

The I-TASSER server accept two ways of template excludings:

· Exclude templates that are homologous to the query protein: The users can use this option to exclude templates from the I-TASSER template library, which are homologous to the query protein. The homology is defined based on the sequence identity cutoff, i.e. the number of identical residue between template and query divided by the total number of residues in the query sequence. For example, if you type "60%", I-TASSER will automatically exclude all templates which have a sequence identity >60% to the query protein. The minimum cutoff is set at 25% and all value below 25% will return as 25%.

· Exclude specific template proteins: This option allows users to upload a list of template structures that will be excluded from the I-TASSER template library. As the PDB library is redundant and same protein can exist as multiple entries, I-TASSER server will by default exclude the user-specified templates as well as all templates that have a sequence identity >90% to the specified templates. Users can also specify a different sequence identity cutoff, e.g. 70%, where I-TASSER will exclude all templates with a sequence identity >70% to specified template proteins.

The format of the file should be "PDBID:ChainID %Sequence_Identity", e.g.

1wor:A 70
3mxu:A 80
1zko:B 40

1wor:A
3mxu:A
1zko:B

What is new?

· 2011/11/03: The I-TASSER Video was published at the JoVE: Journal of Visualized Experiments.

· 2011/10/15: Visualization was enabled for the top 10 proteins analogous to the I-TASSER models and for the enzyme commission predictions.

· 2011/10/09: The function homology search was extended to the entire PDB library. This helps increase the coverage and accuracy of the structure-based function of the I-TASSER server.

· 2011/08/01: The I-TASSER server had the 20000th user registered (Congratulations!).

· 2011/03/28: The first version of the I-TASSER Suite (Version 1.1) was publicly released for download and installation. It is free for academic and non-profit users.

>> Read more I-TASSER news ...

How long does it take for I-TASSER to generate the predictions for your protein?

It usually takes server hours to 1~2 days from submitting a sequence to receiving the prediction results. But if too many sequences are accumulated in the queue, the procedure may take a much longer time. The time also depends on the protein size and a smaller protein takes short time than a larger protein.

Currently, the major time consuming part in the I-TASSER protocol is the structural refinement assembly simulations. For those users who want a quicker reponse or those who do not need a refined models, we recommend them to use our LOMETS (meta-server) or MUSTER (single-server fold-recognition). Because these two server do not attempt to refine the threading models, the response time is faster than the I-TASSER server.

http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html