Structure Prediction Protein Design Biclustering + cMonkey Network Inference Publications          Links + Collaborations People                   Opportunities             Software + Code Teaching            

 

World Community Grid Status Update

World Community Grid Post - HPF2 Update, June/July 2012

Hello to all!

We've been busy around the lab this past few months - high-profile collaborations, revising, re-revising, and publishing papers, and a great deal of work on new and improved methods for protein structure and function prediction have all been keeping us going at quite a pace into the summer months. As there is so much to mention, lets dive right in…

Plant Proteins Published, Plus Web Resource

First, fantastic news: the paper mentioned in the last status update that we submitted to Genome Biology and Evolution (GBE) was accepted for publication, appearing online in February 2012 and in print in GBE Volume 4, Issue 3 as "The Plant Protein Folding Project: Structure and Positive Selection in Plant Protein Families."

This publication served in part as the release of our own Proteome Folding Project (online at http://pfp.bio.nyu.edu/) web resource, a tool for protein viewing and analysis. This site allows the public and other scientific research groups to look at protein sequence, structure, and evolutionary data with a set of field-standard tools geared toward interactively viewing three-dimensional protein structures. Take a look at the screenshots below, and feel free to browse the website. A good place to start is by looking at a popular example: the MADS-Box protein family.


List of proteins

[All of the proteins of the MADS-box family. All share a similar function. To view the details of a protein and its component structures, click on the drop-down point at (1), then click any of the domains, at (2) for example. To load a structure in the 3D jMol applet, click the link at (3) and then the jMol tab at (4) in the next picture.]



Protein structure in jMol

[A structure from the first domain of the first protein in the list. These types of 3D structural models are what the World Community Grid produces for scientists. You can click, drag, and zoom (with a mousewheel) in this view to better see structures.]


This paper was recently summarized and will be described by a World Community Grid news item, so I'll leave the details for then (you can also view the full text on GBE's website here: The Plant Protein Folding Project). It is important to note that the World Community Grid provided a great deal of structural information for the Rice and Arabidopsis plant proteomes, with which this project was able to more fully examine the structural trends of evolution in important plant proteins. For this the World Community Grid and all it's data crunchers were gratefully acknowledged.


Collaborative Efforts

In other great news regarding publications and interesting applications of our protein structure and function work, the Bonneau lab was approached with an offer of collaboration from Dr. Markus Landthaler of the Max Delbruch Center for Molecular Medicine (MDC) in Berlin. Dr. Landthaler and co. were in the process of finishing a large-scale and state-of-the-art group of experiments for identifying RNA-binding (RNAB) proteins from human cell samples. Largely, this experiment aimed to discover novel RNA-binding proteins and to further understand the behaviors of known RNAB proteins in an effort to study the mechanics of human genetics.

The Landthaler group approached the Bonneau Lab looking for computational methods to understand the structure and therefore function of their RNAB proteins. They were interested in both computationally validating that which they had found in human cell samples, and also demonstrating that some of their experimental results were truly novel (ie, that they could not be discovered by computational methods).

After much work toward these goals, the rather impressive results (if I do say so myself) were published in Molecular CELL in June of 2012. The paper is online here, and represents a mammoth effort by all those involved.

Again, as in the previous paper, the World Community Grid has contributed to providing a more complete structural landscape spanning a set of selected proteins, giving the Bonneau Lab unparalleled resources for studying the structure and function protein targets important to cutting edge science.


The Whole Contribution

With that in mind, the Grid's general contribution to protein science has become more apparent (to me!). Effectively, with the help of the World Community Grid, we are working toward completing the predicted structural landscape of a huge set of proteins. While we can't predict them all (mostly due to software and knowledge constraints), working with the Grid is rapidly increasing the speed at which we can reduce the wild west of protein science and gives us the means to make new insights in novel protein functions and protein-protein relationships. All pretty cool..

Now, on to what's been happening on the Grid, and a quick closing note about what we're working on now and for the future.


On the Grid

At the moment, we are very near the end of processing our new mouse batches (ox through qk), and will shortly move on to new human data! This is particularly exciting for us, as once we have a complete and newly high-resolution set of mouse and human data, we can start applying some of our new methods to data that no one has seen before.


Code    Experiment Project/Organism Description Status
ox-qk 1171 Mouse New proteome data for Mouse Nearly Finished!
ql-qz 1176 Human New proteome data for Human Waiting..
oq-ow 1146-1161 Haloferax, Haloarcula Two Archaea, part of the third domain of life Prev. Suspended
ra+ 1183 Microbiome More Gastro-Intestinal proteins from the Human Microbiome Project Waiting

Also of interest is that fact that for these new mouse and human data sets, we have raised the sampling rate of Rosetta de Novo structure prediction from 30,000 resulting decoys to 100,000 decoys. If you have noticed that a batch is taking longer to complete, this is why - not only are we getting better specificity for protein domain structures, we're also surveying a broader range of structures - like having a bigger structure-catching net with smaller holes.

This increase in sampling size also allows us to consider applying new scientifically interesting filters on predicted protein structures. One such filter currently being worked on in the lab uses evolutionary data to determine how likely a structure is to exist based on its evolutionary relationship to already known structures. Hopefully, we'll have more to report on this front in the future.

After we've finished with this revised mouse and human information, we'll return to studying human microbiome and bacterial data, which has a renewed presence in recent popular press (For just one example, this Scientific American feature I keep seeing on the subway platform [cover below]), and archaea data.


[Bacteria in your body! Halt the presses!]


For the Future

To finish up, one of the most exciting things we have been working on in the lab is the development of new or improved methods to predict protein function using a variety of data types and sources. We have chosen to focus on a method broadly called Machine Learning, which has recently become very popular but for quite some time has enjoyed an incredibly important role in predictive biological sciences (as well as many, many popular applications such as search personalization and social networking).

Hopefully, with a combination of new and improved methods and novel prediction data provided by the World Community Grid, we'll be able to find some novel and interesting results. More to come...