Pattern matching

 
 

In a number of cases, the active site of a protein can be recognized by a specific ``fingerprint'' or ``template'', a fairly small set of residues that are unique to a family of proteins. An example is the sequence GXGXXG (where G=glycine and X=any amino acid) which defines a GTP binding site. Searching for a (rather loose) predefined string of characters in a sequence is called Pattern Matching.

The wEMBOSS program patmatmotifs looks for sequence motifs by searching with a pattern search algorithm through the given protein sequence for the patterns defined in the PROSITE database, compiled by Dr. Amos Bairoch at the University of Geneva. PROSITE is a database of protein families and domains, based on the observation that, while there are a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.


Exercise: patmatmotifs

Program: patmatmotifs

Search a motif database with a protein sequence

Input sequence: xlrhodop.pep




















Result:


########################################

# Program: patmatmotifs

# Rundate: Mon 21 Apr 2008 15:48:30

# Commandline: patmatmotifs

#    -sequence xirhodop.pep

#    -sbegin1 1

#    -send1 354

#    -nofull

#    -prune

#    -rformat dbmotif

#    -auto

# Report_format: dbmotif

# Report_file: .patmatmotifs.08.04.21:15.48.29/l07770_1.patmatmotifs

########################################


#=======================================

#

# Sequence: L07770_1     from: 1   to: 354

# HitCount: 2

#

# Full: No

# Prune: Yes

# Data_file: /usr/ebiotools/share/EMBOSS/data/PROSITE/prosite.lines

#

#=======================================


Length = 17

Start = position 123 of sequence

End = position 139 of sequence


Motif = G_PROTEIN_RECEP_F1_1


TLGGEVALWSLVVLAVERYMVVCKPMA

     |               |

   123               139


Length = 17

Start = position 290 of sequence

End = position 306 of sequence


Motif = OPSIN


PVFMTVPAFFAKSSAIYNPVIYIVLNK

     |               |

   290               306



#---------------------------------------

#---------------------------------------



In our case we already know that our sequence is a rhodopsin. However, if you had an unknown sequence, we hope you can see that identifying motifs might provide you with information to help you plan further experiments.

 

Exercise: patmatmotifs