Structure Prediction Protein Design Biclustering + cMonkey Network Inference Publications          Links + Collaborations People                   Opportunities             Software + Code Teaching            

 

WCG Status Update

WCG Post

HPF2 Update - March 2009

Hello Everyone,

I'd like to give everyone an update on our HPF1/HPF2 progress. I've been working in the Bonneau lab since September of last year and it's taken me awhile to produce my first update; sorry. It's been complicated getting up to speed, but the ball is rolling and I'm working diligently on validating and analyzing all of the great data you are producing. I was a contributor to the WCG long before I imagined I would be involved in one of its projects. I can assure you that my new perspective has helped me appreciate the great contribution to basic research that we are all making.

The HPF project is an ongoing effort to automatically annotate the genomes of organisms that have importance to the human race with predictions about protein structure and function. It's an effort that could never succeed without the massive amount of computing power you all in the WCG are providing. We're continuing to post-process HPF2 protein predictions and grow our database and research tools for biologists. With every "experiment," or organism, folded on the grid, we're expanding our database, growing our resource, and contributing to the great effort to map and understand the functional units of life.

Of particular interest at the moment is the GOS set we recently folded. These proteins come from the J. Craig Venter Institute's Global Ocean Sampling Expedition. You folded groups of proteins from their expedition that have no known sequence similarity to any previously discovered proteins. Although many computational biology methods rely heavily on sequence similarity, our effort to predict meaningful things about these proteins will rely heavily on the protein structure simulations produced on the World Community Grid. In the coming months we'll be combining evolutionary analysis with these predictions, and any discoveries for these novel proteins will be truely exceptional and ground breaking.

In regards to HPF2's higher resolution protein predictions, and specifically with respect to the GOS dataset, we are seeing an increased yield in high quality structure predictions. The extra computation and resolution is paying off, and we are able to make more predictions about proteins confidently. Despite the scale and scope of the project, we really do appreciate each member's contribution. Thanks again for volunteering for HPF!

Patrick Winters
Bonneau Lab


I'd also like to share a little something I've been working on. We'll have fully integrated HPF data into our BioNetBuilder Cytoscape plug-in soon, but I've put together a pre-built network for Saccharomyces cerevisiae (yeast) proteins along with a handful of interacting neighbors that we highlighted in Superfamily Assignments for the Yeast Proteome through Integration of Structure Prediction with the Gene Ontology - Lars Malmstrom, Michael Riffle, Charlie E. M. Strauss, Dylan Chivian, Trisha N. Davis, Richard Bonneau, David Baker. You can launch the network with this webstart. This will make HPF based predictions highly accessible and intuitive for all types of researchers.


click image to enlarge
Launch the Webstart

In the control panel, tab to VizMapper and apply "hpf" as the current visual style. In the data panel, click "select attributes" and check all of the HPF attributes. You should see function and structure predictions in the data panel when selecting nodes, and those attributes effect node size and color.


This is what you're currently folding. Rice has been pushed back, but should be up and coming in a couple of months. The schedule is subject to change, but we usually keep at least one organism in the queue.

organism description status
GOS new antibiotics, new industrial enzymes, new organisms that bind toxic metals (finished)
Plasmodium vivax recently sequenced genome which causes malaria, usually not deadly but truly awful disease (finished)
Arabidopsis model organism for studying plants (finished)
Trypanosoma cruzi causes Chagas disease in Central and South America (current) almost finished!
Plasmodium knowlesi form of malaria from Southeast Asia that infects both humans and macaques (next)
Rice major food source for a large portion of the worlds population (soon to come)

Some of you have asked what it means for the WCG to be working on or to finish an experiment. We upload work units in batches, usually sets that comprise of the proteins in a specific organism without sequence similarity to known protein structures. To finish an "experiment" is to complete predictions for an entire set. For us, it means another batch of data to validate, analyse, and incorporate into our database. The post-processing takes time, and we perform all of the computation on computing clusters at our NYU lab. The steps of our analysis can be found in the paper cited above, and involves our attempts to predict structural and functional annotations from the WCG folded proteins.