biolover: September 2012

Thursday, September 6, 2012

批量Blast

批量Blast就是指多个序列的Blast。

blastall -p blastn -d BlastDB -i in_file.fasta >blast_output

当in_file.fasta里面只有一个序列时，就是单个Blast啊。in_file.fasta也可以放多个Fasta格式的序列，这样子就是批量Blast了。

当然了，麻烦的是批量Blast之后的结果，一个的话我们可以看得了，当批量上千个时，我们不可能一个个看到的。这种小事情Blast早就想到了。这就引进了-m8参数。-b5参数是指显示匹配的前5个结果
blastall -p blastn -d BlastDB -i in_file.fasta -m8 -b5 >blast_output

推荐的命令行如下：
blastall -p blastn -d BlastDB -i in_file.fasta -m8 -b5 -b1 -a2 -FF >blast_output
-a2参数是用二个CPU，加速。-FF是不过滤简单的重复序列和低复杂度的序列（默认是过滤的）。
本文详细出处参考：http://liucheng.name/1221/

Wednesday, September 5, 2012

How to Blast sequences against a genome

How to Blast sequences against a genome
1. Get to a DOS window (e.g. by RUN command)

2. Type the following command to run Blast:

blastp -db databaseName -query contigFile -out filename -evalue e-value

For example:

blastp -db octdata -query maydata.fna -out myResults.txt -evalue .00001

blastp invokes the program of comparing individual protein sequences to a database of protein sequences
Other blast programs to consider:
- blastn to compare nucleotide sequence(s) against a database of nucleotide sequences
- blastp to compare protein sequence(s) against a database of protein sequences
- blastx to compare nucleotide sequence(s) translated in all six reading frames against a database of protein sequences
- tblastn to compare protein sequence(s) against a database of nucleotide sequences translated in all six reading frames
- tblastx to compare nucleotide sequence(s) translated in all six reading frames against a database of nucleotide sequences translated in all six reading frames

- .
-db databaseName tells the program to use the databaseName you identified when you set up the database.
-query contigFile tells the program to use the specified file as the query (input) to Blast. Give the full path if the file isn't in the same directory as Blast.
-out filename tells the program to use the specified file as the output file.
-evalue e-value tells the program to ignore matches that would occur by chance with an e-value(probability) greater than the decimal number given

3. Be prepared to wait a while. With only a few contigs, you shouldn't have to wait more than some 10's of seconds, but with the number of sequences we are using, the output may be hours in coming. The program gives no indication of its progress; it simply brings you back to a DOS prompt (>) when it's done.
4. Output could be modest when comparing two small sequences, but with lots of sequences, you can fill your disk drive with LOTS of output (dozens of megabytes).

5. How do you know whether the program worked? If you have a large output file (i.e. dozens of megabytes), don't try to read it into something like Word (you risk choking it). I don't think that Microsoft has any solution for us, but there is an ancient freeware program from the pre-Windows era that will do the job. Click here to download DR (standing for DiRectory). Put it in the Blast directory. Type DR at a DOS prompt to run.

6. To run DR, type DR at a DOS prompt to get a list of files in \Blast, then press the F10 key to sort the files by date of creation, then press the End key to go to the end of the list. You should see the file you just made. Press the Enter key to see the contents of the file (you can scroll through the file using the usual keys).

7. However you look at the output file, you should see something like: BLASTP 2.2.9 [May-01-2004]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= Contig240-R (500 letters) Database: octdata.fna
1 sequences; 2,160,837 total letters

If so, you win!

From http://www.vcu.edu/csbc/bbsi/inst/archives/bioinf/RunLocalBlast.html

How to run a sequence through BLAST at TIGR

How to run a sequence through BLAST at TIGR Go to the TIGR Comprehensive Microbial Resource

Click on CMR Blast on the blue bar near the top
Click in the down arrow next to the Program window and choose the appropriate program. Click in the down arrow next to the Database window and choose the appropriate database. Paste your sequence into the window supplied for that purpose. Click the Submit BLAST job button. from internet

How to set up a local Blast database

Get to directory where you put Blast files
Type in the following:
makeblastdb -in file -out name -dbtype prot
-hash_index
(for a database of proteins)
OR
makeblastdb -in file -out name -dbtype nucl
-hash_index
(for a database of DNA or RNA)

What it means:

makeblastdb invokes the Blast accessory program to create the database
-in tells the program that the path that follows leads to the input file.
-out tells the program that the characters that follow should be used as the name of the database (you can name it anything you want, so long as you use 8 or fewer legal characters).
-dbtype prot Tells the program "the file does consist of protein sequences".
-dbtype nucl tells the program "the file consists of nucleotide sequences"
-hash_index tells the program "you should make an index of the identification numbers for the sequences" Frankly, I don't know what good the index does, but it's cheap.

WARNING #1: Upper/Lower case matters for the commands following -!

WARNING #2: Windows XP and NT users may experience trouble cutting and pasting the command line makeblastdb. Evidently the system does something strange to the hyphens. Type the command in instead.

From:http://www.vcu.edu/csbc/bbsi/inst/archives/bioinf/SetupLocalBlast.html