Structure Prediction Protein Design Biclustering + cMonkey Network Inference Publications          Links + Collaborations People                   Opportunities             Software + Code Teaching            

 

WCG Status Update

WCG Post

second update from the scientistist slacking on the comunication front
[Apr 28, 2005 7:03:30 PM]

hello grid participants. Sorry for not letting you know how the project is going in a while (1 month). I was good in science but terible in english and this trend continues!

In any case we have some results rolling in.

IN BRIEF:
CORRECT STRUCTURES/TOPOLOGIES:
We have 50k proteins done out of 100-150k. we expect ~2/3 of them to result in correct topologies (rough shape and relative positions ... are parts of the chain near each other, are key residues buried and structural or are they surface and per of
active sites)

CORRECT FOLDS:
we expect ~1/3 of them to match previous folds. When we match folds
we can map some of the functional characteristics of the structure we match onto the seqeunce that we used to predict the structure.
So:
preicted structure for sequence A =match=> known structure
in hopes that unknown function for A =sharesSomeFeatures=> function of known structure

(fold match means that the predicted structure matches a structure in the
protein data bank/PDB )


THESE ARE PROTEINS SELECTED BECAUSE WE DON'T KNOW MUCH ABOUT THEM:
The proteins we seleceted to put on the grid are the toughest proteins to annotate.
so, if we find a fold match we look at everthing else and try to synthesise (by integrating the structure prediction with everything else we know about the protein). eg. : Sometimes we know what the first part of a protein does and we predict the structure of the second part to see what the whole protein coulds be doing. Sometimes the structure prediction is key to figureing out what the function is. One analogy is that getting the proteins function is like getting a word in a crossword. The protein function is the clue "1-down: alchoholic ghost" , expresion data might tell us that the first letter is D, and the protein-protein interaction network
might tell us that the word has 5 letters. so, if we've got the scrabble dictionary around we know the word is djini. obviously playing scrabble is harder, but otherwise you get the point, we narrow the posibilities ... this strategy alows us to get info about a lot of proteins that are in the twilight zone.

In any case the next step is getting all this information organized. We don't want to jump the gun there, because giving people bad predictions or buggy things is a certain way of warding biologists off and making the whole thing moot.

I'll put some images of a couple of proteins that have popped out of the post-processing soon.