NCBI-PSI-BLAST and EBI CLUSTALW
Web sites:
Summary:
Procedure for obtaining a multiple sequence alignment starting from a query
sequence in fasta format using all web-based tools.
Procedure:
Use NCBI
Entrez/Protein, RCSB, or other
favorite protein database to search and find desired sequence.
Display it in FASTA format.
Use it as query in the PSI-BLAST
page. Continue with iterations until no "New" sequences are detected
(until convergence). In the formatting output options, specify "gi
number output".
Copy/paste the headers for the desired sequences into a new file: filename.gin
Use the following command sequence to obtain the gi-identifier numbers
from the above file.
gi_parse.pl filename.gin | grep
-v codes > filename.gib
Note that suffixes reflect: gin= "gi numbers with full headers" and
gib= "gi numbers isolated for batch query". The original Perl script
states the numbers of lines in the file at the bottom of the generated
list of numbers, hence the pipe to "grep -v codes", and the subsequent
output (a column of unique gi numbers) is directed to filename.gib.
Now go to http://www.ncbi.nlm.nih.gov:80/Entrez/batch.html
and upload the filename.gib query to retrieve all sequences in
fasta (or other format, as specified) sequence information. Download
this file to filename.fst (fasta format list).
Consider alternate output formats. For example, select Format: GenBank/GenPept
and deselect the Html checkbox. Submit the query, and save output
as filename.gbp. This file can be
"grep"ed for interesting results. For example, to get an alphabetically
sorted list of the different organisms found in the query, use:
grep ORGANISM filename.gbp | sort +1 > filename.org
This file of sequences in fasta format can be loaded onto the EBI
ClustalW server which interfaces with a nice sequence viewer/editer
(Jalview) as well.
Another useful secondary structure prediction site, JPred2
(from the Barton Group),
takes a a single query sequence, performs a PSI-Blast search, and generates
a consensus secondary structure prediction after comparing predictions
from multiple algorithms.