STAMP Multiple Sequence Alignment

Web sites:

  • http://barton.ebi.ac.uk/manuals/stamp.4.2/stamp.html
  • (From Geoff Barton's group at the EBI)

    Edit .cshrc file to create environment variables and path:

    #For Stamp
    setenv STAMPDIR /max2/people/dbuckler/stamp_4_2
    set path = ($STAMPDIR $STAMPDIR/bin/sgi $path)

    Similarly if alscript in use:

    #For Alscript
    set path = (/max2/people/dbuckler/alscript.2_07a/bin/sgi $path)

    Copy pdb.directories and dssp.directories files to top-level Stamp_4_2 directory:

    cd /max2/people/dbuckler/stamp_4_2/
    cp defs/pdb.directories .
    cp defs/dssp.directories .

    Edit these to point to the proper paths:

    Download or copy PDB files to proper directory:  /max2/people/dbuckler/STAMP/PDB

    Run dssp program to generate dssp output files:

    cd /max2/people/dbuckler/STAMP/DSSP
    dssp ../PDB/2CHE.pdb 2CHE.dssp
    Repeat the above for all *pdb files.

    Create new working directory and move there:

    cd /max2/people/dbuckler
    mkdir WORK
    cd WORK

    Run pdbc to check that program finds pdb files that will be used in this alignment (they need to have been previously placed in /max2/people/dbuckler/STAMP/PDB and dssp run for each as described above):

    pdbc -m wbi2.pdb

    Run pdbc to generate a domains file:

    pdbc -d wbi2 >! regdom.domains
    pdbc -d 1a04 >> regdom.domains
    pdbc -d 1a2o >> regdom.domains
    Repeat this for each each pdb file.  Note that only the string code for the pdb file is used, without the "pdb" suffix, and it is case insensitive.  It finds all chains for the pdb file located in the /STAMP/PDB directory.  The resulting file, regdom.domains, will now include path/filename for all files (note that pdb files with multiple segments will have multiple listings -- one for each segment).  It is often worthwhile to edit the *.domains file and comment out all but one chain for each protein to reduce the number of structures used in the alignment to the minimal representative subset for proteins of interest.
    orig_regdom.domains (the original file after the above series of pdbc -d commands)
    regdom.domains (the edited file to select one chain from each protein)

    If sequences are similar (e.g. homologs) can probably run STAMP in rough mode:

    stamp -l regdom.domains -rough -n 2 -prefix regdom -o regdom_stamp.log
    The rough mode is appropriate when sequences/structures are reasonably similar and align reasonably well when left (N-terminally) justification as a starting point.  The "-n 2" indicates that a conformation-biased fit will be performed before the final fit.

    Alternatively, do initial superimposition of all sequences by scanning a database with one of the domain structures, and then carry out multiple structural alignment on the resulting initial superimposition:

    Edit the original *domains file to comment out (with %) the sequence that will be used to scan the remaining *domains database:

    Example file:  reg1.domains

    Create a domains file of the single domain sequence that will be used as query scan:

    pdbc -d 3chy > 3chy.domains

    Resulting output file: 3chy.domains

    Scan the "query" against the reg1.domains database:

    stamp -l 3chy.domains -n 2 -s -slide 5 -d reg1.domains -prefix reg1 -cut
    The cut feature removes parts of sequences that are not similar.

    SORTTRANS will remove redundancies, in this case taking only fits with Sc > 2.5:

    sorttrans -f reg1.scan -s Sc 2.5 > reg1.sorted

    Then run STAMP on the sorted output:

    stamp -l reg1.sorted -prefix reg1
    A series of files are generated:  reg1.n; the one with largest "n" contains all structural alignments; use it for subsequent steps.

    TRANSFORM will generate one large PDB file with each aligned structure getting a unique segment ID:

    transform -f reg1.7 -g -o reg1_7.pdb
    If -g (graphics) is omitted, separate pdb files are output for each aligned structure.

    Generate average structure from the STAMP alignment:

    avestruc -f reg1.7 -o reg1_7_ave.pdb

    POSTSTAMP to clean up and check:

    poststamp -f reg1.7 -min 0.5

    STAMP_CLEAN:

    stamp_clean reg1.7.post 3 > reg1.7.clean
    This will create a file reg1.7.clean where all gaps not lying within structurally equivalent regions and having fewer than 3 aligned residues in a
    row (i.e blocks where all sequences are not aligned with gap) are shortened to their minimum length.

    VER2HOR to creates a more readable output in horizontal format:

    ver2hor -f reg1.7.clean > reg1.7.clean.v2h

    Use DSTAMP to generate output for ALSCRIPT:

    dstamp -f reg1.7.clean -prefix reg1_7_align

    Then run ALSCRIPT on the resulting file:

    alscript reg1_7_align.als
    This generates output reg1_7_align.ps

    View with ghostview or print directly:

    gv reg1_7_align.ps or
    lp reg1_7_align.ps