World Community Grid Post - HPF2 Update, Spring 2013
Since our last update, quite a few things have changed. First we will talk about personnel:
Duncan Penfold Brown:
Our programmer, Duncan Penfold Brown, who has been the workhorse of the Human Proteome Folding (HPF) project for the last few years, is leaving for another position. This is an excellent opportunity for Duncan and we are very excited for him but also sad he is leaving. Duncan handled almost all of the day to day work of the project, everything from assembling work-units to post-processing grid results to writing status updates. Please wish him luck.
Kevin Drew just defended his PhD thesis and soon will be moving on as well. He began with the Bonneau Lab about 7 years ago as a programmer working on HPF and was then a student. For his thesis document, he wrote an introduction that describes how the HPF project fits into the broader field of molecular biology. Over the years we have posted small snippets of how our project fits into the field, but Kevin wanted to share a fuller account for all those who are interested. Please take a look here: Kevin Drew Thesis Intro
Rich Bonneau remains at New York University and will lead some of the follow-up work using the results of the Human Proteome Folding project. There are some exciting research possibilities he and others are considering such as investigating how mutations alter protein structure. Perhaps one of these ideas may grow into a new World Community Grid project at some time in the future.
Cost of Cloud Computing
We do a lot of calculations in the lab to try to capture what is happening at the level of protein structure and function but sometimes we do “back of the envelope” calculations just for fun. For instance, a new postdoctoral fellow (“post-doc”) joined our lab a few years ago and while we were explaining the Human Proteome Folding Project to him he blurted out:
“That's a lot of computation! How much would that cost on the cloud?!?”
The “cloud” he was referring to, of course, was the resizable web service grid offered by companies like Google, Amazon, or IBM. Current prices per cpu hour for cloud computing are decreasing rapidly, and 3.5 cents is a reasonable estimate to use. We plugged this value into our calculation and we got a very large number.
122901 cpu years * 365 days/year * 24 hours/day * .035 dollars/cpu hour = $37,681,446
Not to get too carried away with our estimate, but this number really speaks to the power of volunteer computing. If one person, lab or institution needed to provide $37 million in order to run this project, it would have never gotten off the ground. Distributing the work in little bits to volunteers across the world, it was possible. We are very proud of this number because it represents what can happen when individuals come together under the banner of scientific research. We are also glad we did not get this bill in the mail. Deepest thanks go to all of the volunteers on World Community Grid for supporting the Human Proteome Folding project.
In relative terms, $37 million is 0.1% of the US National Institute of Health (NIH) annual budget, or the equivalent to 90 NIH Research Project Grant Program (R01) grants (i.e. main funding vehicle for many university labs). If you have been following the US Congressional budget news, you will know that budget cuts have been implemented which target US research funding and the NIH announced nearly 1000 R01 grants will not be funded due to the “sequester.” (commentary on the subject) These cuts are not only destructive to research but to the economy as a whole; the rate of return of publicly funded research is estimated at 25% to 40% a year (see nih_research_benefits). So for every $1 put into research, $1.40 is returned to the economy.
We realize we are preaching to the choir on this one; people who volunteer their computer time to World Community Grid have already made a conscious choice to support basic research, and they understand better than most the benefits of basic research. You have all contributed far more than what was asked of you, BUT if you would like to do more, contact your elected officials (worldwide!) and tell them you support basic research funding.
The Closing of the HPF Project
Over the last month, with the knowledge of Duncan's departure, Kevin’s eventual departure, and the current budget climate, as a lab we have decided to end the HPF project's “grid phase” after the last Human protein batch has completed. We just do not have the manpower to continue the day-to-day tasks. While we are cutting short several batches, we felt finishing the Human proteins was a natural stopping point. The “post-grid phase” of the project will, however, continue for a long time. We still have several projects in the lab which depend on the results generated from the grid, and we will be kept busy for awhile sifting through the data. All results will still be open and available to other researchers for their benefit (http://www.yeastrc.org/pdr/). We also plan on keeping everyone up to date on the “post-grid phase” through our usual status updates.
There are already many researchers using these results via our database. For instance we collaborated with researchers at the Max Delbruck Center in Berlin who found a large set of novel mRNA binding proteins. mRNA are the intermediate transcript molecules between genes and proteins and are very important for many cellular processes. Many of the novel proteins were found to bind regions of RNA linked to childhood obesity and AIDS progression and this work provides additional knowledge that may be used to address these problems. Our role was to find folds and functions from our database which help to better understand the set of mRNA proteins and also provide new hypotheses to test.
Another example, which demonstrates the reason we publish our results in academic journals, is the use of our database by a group of outside researchers at the Hungarian Academy of Sciences, who only discovered our database after reading our recent paper. They work on a set of proteins called transposons that rearrange DNA in a genome and are interesting to study because they are important for evolution of organisms but also associated with certain diseases such as hemophilia and specific kinds of muscular dystrophy. Our structure predictions helped them better understand how these proteins compare between organisms.
This project has been extremely successful by any measure, including the academic measure of number of publications produced and cited. Below is a list of publications that have been published about the project or have used the data generated by the project. We are sure to continue adding to this list in the coming years.
List of Peer Reviewed Research Publications:
The Proteome Folding Project: proteome-scale prediction of structure and function
K Drew, P Winters, GL Butterfoss, V Berstis, K Uplinger, J Armstrong, M ...
Genome research 21 (11), 1981-1994
(Main publication of all organisms run on grid)
Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology.
Malmström, Lars, et al.
PLoS biology 5.4 (2007): e76.
(First paper to use de novo predictions run on WCG)
A Protein Domain-Based Interactome Network for C. elegans Early Embryogenesis
M Boxem, Z Maliga, N Klitgord, N Li, I Lemmens, M Mana, L de Lichtervelde ...et al.
Cell 134 (3), 534-545
(Used domain and structure predictions to bolster experimental evidence of protein interaction regions)
BioNetBuilder: automatic integration of biological networks
I Avila-Campillo, K Drew, J Lin, DJ Reiss, R Bonneau
Bioinformatics 23 (3), 392-393
(Interface to data)
The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts
AG Baltz, M Munschauer, B Schwanhäusser, A Vasile, Y Murakawa, M Schueler, N ...
Molecular cell 46 (5), 674-690
(Used fold enrichment and function predictions)
The coat morphogenetic protein SpoVID is necessary for spore encasement in Bacillus subtilis
KH Wang, AL Isidro, L Domingues, HA Eskandarian, PT McKenney, K Drew, P ...
Molecular microbiology 74 (3), 634-649
(Used predictions as hypothesis for further experimental characterization)
The Plant Proteome Folding Project: Structure and Positive Selection in Plant Protein Families
MM Pentony, P Winters, D Penfold-Brown, K Drew, A Narechania, R DeSalle, R ...
Genome biology and evolution 4 (3), 360-371
(Mapped evolutionary interesting sites onto structures)
Parametric Bayesian priors and better choice of negative examples improve protein function prediction
N Youngs, D Penfold-Brown, K Drew, D Shasha, R Bonneau
Bioinformatics 29 (9), 1190-1198
(Furthered function prediction algorithm using structure)
Structure prediction and analysis of DNA transposon and LINE retrotransposon proteins.
Abrusan, Gyorgy, Yang Zhang, and Andras Szilagyi.
Journal of Biological Chemistry (2013).
(Outside group used structure predictions for analysis of transposon studies)
Again, thank you so much for your support over the years. We will keep in touch on developments of this magnificent dataset you were crucial in creating.
NYU HPF Team