Wednesday, September 5, 2012

How to Blast sequences against a genome

 How to Blast sequences against a genome
1. Get to a DOS window (e.g. by RUN command)

2. Type the following command to run Blast:

blastp -db databaseName -query contigFile -out filename -evalue e-value

For example:

blastp -db octdata -query maydata.fna -out myResults.txt -evalue .00001
  • blastp invokes the program of comparing individual protein sequences to a database of protein sequences
  • Other blast programs to consider:
    • blastn to compare nucleotide sequence(s) against a database of nucleotide sequences
    • blastp to compare protein sequence(s) against a database of protein sequences
    • blastx to compare nucleotide sequence(s) translated in all six reading frames against a database of protein sequences
    • tblastn to compare protein sequence(s) against a database of nucleotide sequences translated in all six reading frames
    • tblastx to compare nucleotide sequence(s) translated in all six reading frames against a database of nucleotide sequences translated in all six reading frames
    • .
  • -db databaseName tells the program to use the databaseName you identified when you set up the database.
  • -query contigFile tells the program to use the specified file as the query (input) to Blast. Give the full path if the file isn't in the same directory as Blast.
  • -out filename tells the program to use the specified file as the output file.
  • -evalue e-value tells the program to ignore matches that would occur by chance with an e-value(probability) greater than the decimal number given
3. Be prepared to wait a while. With only a few contigs, you shouldn't have to wait more than some 10's of seconds, but with the number of sequences we are using, the output may be hours in coming. The program gives no indication of its progress; it simply brings you back to a DOS prompt (>) when it's done.
4. Output could be modest when comparing two small sequences, but with lots of sequences, you can fill your disk drive with LOTS of output (dozens of megabytes).

5. How do you know whether the program worked? If you have a large output file (i.e. dozens of megabytes), don't try to read it into something like Word (you risk choking it). I don't think that Microsoft has any solution for us, but there is an ancient freeware program from the pre-Windows era that will do the job. Click here to download DR (standing for DiRectory). Put it in the Blast directory. Type DR at a DOS prompt to run.

6. To run DR, type DR at a DOS prompt to get a list of files in \Blast, then press the F10 key to sort the files by date of creation, then press the End key to go to the end of the list. You should see the file you just made. Press the Enter key to see the contents of the file (you can scroll through the file using the usual keys).

7. However you look at the output file, you should see something like: BLASTP 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= Contig240-R (500 letters) Database: octdata.fna
1 sequences; 2,160,837 total letters

If so, you win!


From http://www.vcu.edu/csbc/bbsi/inst/archives/bioinf/RunLocalBlast.html

No comments: