GCG Blast Search and Multiple Sequence Alignment
Web sites:
Use NCBI
Entrez/Protein, RCSB, or other
favorite protein database to search and find desired sequence
Display it in FASTA format
Make sure you have a separate window opened in which GCG is enabled (e.g.,
genetics.rutgers.edu; must type "gcg" after loggin on to activate the program
for use)
Copy/paste the sequence (by edit-copy-paste or else select text with mouse,
then mouse-middle-click into a separate window where an editor (vi, emacs,
jot) has already opened a new, blank file)
Edit the file into which the sequence has been pasted so that a single
line containing two dots ("..") occurs by itself on a line after the header
(the fasta format header is a string that starts with a greater-than symbol
(">")
Save the file as: filename.gcg
Reformat this file to gcg format:
format filename.gcg
Carry out netblast search using this sequence as query:
netblast filename.gcg
The output, typically named filename.blastp, can be used to "fetch" the
sequences:
netfetch filename.blastp
The output file is typically named "filename_blastp.rsf"
If the query sequence is not in the NCBI database, but it is desired to
be included in a multiple sequence alignment, create a new "list" file
named filename.list with the following contents:
!! SEQUENCE_LIST 1.0
..
filename.gcg
filename_blastp.rsf{*}
Carry out multiple sequence alignment using the listfile as input:
pileup @filename.list
I usually take the default parameters for the analysis and typically name
the output filename.msf
Alternatively, the pileup command can be used on the filename_rsf file
directly:
pileup -out=filename_blastp.msf filename_blastp.rsf{*}
The output from pileup can be used to generate a nice postcript output
using prettybox:
prettybox -cas -con -plu=40 -font=5 -seqn=p -out=filename_msf.ps
filename_blastp.msf
The "-plu" corresponds to plurality, and is number of residues required
to constitute a consensus
The output postcript file can be viewed on lion (or other ghostscript-enabled
postcript viewer) and/or printed:
gs filename_msf.ps
lp -dhp8100 filename_msf.ps