GCG Blast Search and Multiple Sequence Alignment

Web sites:

Use NCBI Entrez/Protein, RCSB, or other favorite protein database to search and find desired sequence

Display it in FASTA format

Make sure you have a separate window opened in which GCG is enabled (e.g., genetics.rutgers.edu; must type "gcg" after loggin on to activate the program for use)

Copy/paste the sequence (by edit-copy-paste or else select text with mouse, then mouse-middle-click into a separate window where an editor (vi, emacs, jot) has already opened a new, blank file)

Edit the file into which the sequence has been pasted so that a single line containing two dots ("..") occurs by itself on a line after the header (the fasta format header is a string that starts with a greater-than symbol (">")

Save the file as:  filename.gcg

Reformat this file to gcg format:

format filename.gcg

Carry out netblast search using this sequence as query:

netblast filename.gcg

The output, typically named filename.blastp, can be used to "fetch" the sequences:

netfetch filename.blastp
The output file is typically named "filename_blastp.rsf"

If the query sequence is not in the NCBI database, but it is desired to be included in a multiple sequence alignment, create a new "list" file named filename.list with the following contents:

!! SEQUENCE_LIST 1.0
..
filename.gcg
filename_blastp.rsf{*}

Carry out multiple sequence alignment using the listfile as input:

pileup @filename.list
I usually take the default parameters for the analysis and typically name the output filename.msf
Alternatively, the pileup command can be used on the filename_rsf file directly:
pileup -out=filename_blastp.msf filename_blastp.rsf{*}

The output from pileup can be used to generate a nice postcript output using prettybox:

prettybox -cas -con -plu=40 -font=5 -seqn=p -out=filename_msf.ps filename_blastp.msf
The "-plu" corresponds to plurality, and is number of residues required to constitute a consensus

The output postcript file can be viewed on lion (or other ghostscript-enabled postcript viewer) and/or printed:

gs filename_msf.ps
lp -dhp8100 filename_msf.ps