Pattern matching
Pattern matching
In a number of cases, the active site of a protein can be recognized by a specific ``fingerprint'' or ``template'', a fairly small set of residues that are unique to a family of proteins. An example is the sequence GXGXXG (where G=glycine and X=any amino acid) which defines a GTP binding site. Searching for a (rather loose) predefined string of characters in a sequence is called Pattern Matching.
The wEMBOSS program patmatmotifs looks for sequence motifs by searching with a pattern search algorithm through the given protein sequence for the patterns defined in the PROSITE database, compiled by Dr. Amos Bairoch at the University of Geneva. PROSITE is a database of protein families and domains, based on the observation that, while there are a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.
Exercise: patmatmotifs
Program: patmatmotifs
Search a motif database with a protein sequence
Input sequence: xlrhodop.pep
Result:
########################################
# Program: patmatmotifs
# Rundate: Mon 21 Apr 2008 15:48:30
# Commandline: patmatmotifs
# -sequence xirhodop.pep
# -sbegin1 1
# -send1 354
# -nofull
# -prune
# -rformat dbmotif
# -auto
# Report_format: dbmotif
# Report_file: .patmatmotifs.08.04.21:15.48.29/l07770_1.patmatmotifs
########################################
#=======================================
#
# Sequence: L07770_1 from: 1 to: 354
# HitCount: 2
#
# Full: No
# Prune: Yes
# Data_file: /usr/ebiotools/share/EMBOSS/data/PROSITE/prosite.lines
#
#=======================================
Length = 17
Start = position 123 of sequence
End = position 139 of sequence
Motif = G_PROTEIN_RECEP_F1_1
TLGGEVALWSLVVLAVERYMVVCKPMA
| |
123 139
Length = 17
Start = position 290 of sequence
End = position 306 of sequence
Motif = OPSIN
PVFMTVPAFFAKSSAIYNPVIYIVLNK
| |
290 306
#---------------------------------------
#---------------------------------------
In our case we already know that our sequence is a rhodopsin. However, if you had an unknown sequence, we hope you can see that identifying motifs might provide you with information to help you plan further experiments.
Exercise: patmatmotifs