Structure Prediction Protein Design Biclustering + cMonkey Network Inference Social Networks Publications        Links + Collaborations People        Opportunities        Software + Code Teaching       


The Inferelator: Learning Regulatory Networks

Inferelator Links

Download + Instructions

Paper published in CELL

The efforts of our Th17 network researchers have come to fruition with a huge publication in CELL. Congratulations to Maria Ciofani and Aviv Madar, as well as many other collaborators and Bonneau lab members.

The paper, 'A Validated Regulatory Network for Th17 Cell Specification', is available online here, or via our publications page. Check out the graphical abstract:

Th17 Paper Abstract

Methods for Simultaneous Learning of Network Topology and Dynamics

Learning regulatory networks from genomics data is one of the most important problems in biology today, with applications spanning all of biology and biomedicine. There are, however, a lot of reasons to believe that regulatory network inference is beyond our current reach due to the combinatorics of the problem, factors we can't (or don't often) collect genome wide measurements for, and dynamics that elude cost-effective experimental designs. In spite of these challenges multiple groups have recently shown that we can reconstruct large fractions of many prokaryotic regulatory networks from compendiums of genomics data and that these global regulatory models can be used to predict the dynamics of the transcriptome. These global regulatory models can be combined with modeling of metabolic and signaling networks to model the global operation of cells with unprecedented completeness and accuracy. We review an overall strategy for the reconstruction of global networks resulting from the recent progress of several genomics consortia.

We are developing methods for learning of biological networks that can infer regulatory topology as well as dynamical parameters directly from data. This aspect of the work is a new collaboration between my lab and Eric Vanden-Eijnden that we are both quite excited about. Our new methods for biological network reconstruction will combine strengths of statistical learning approaches, such as those developed by Bonneau et al. with new methods for scalable modeling and simulations of complex systems developed by Vanden-Eijnden et al. New biological datasets, just emerging, will allow the learning of explicit dynamical parameters directly from data, in a manner that then allows for modeling on a much larger, genome-wide, scale. This project will result in mathematical/algorithmic advances (general methodological advancements that combine learning and modeling) as well as biological (specific application of our new methods to analyzing biological datasets as part of ongoing collaborations with several groups here at NYU and at the Institute for Systems Biology in Seattle).

The development of methods for learning accurate models of global regulatory networks (and the coupling of these networks to metabolic and signaling networks) is essential to our understanding of a cell's dynamic behavior and regulatory response to internal and external stimuli. Methods for inferring and modeling regulatory networks must strike a balance between model complexity - a model must be sufficiently complex to describe the system accurately - and the limitations of the available data - in spite of dramatic advances in our ability to measure mRNA and protein levels in cells, nearly all biological systems are under-determined with respect to the problem of regulatory network inference. To date most mathematical and computer science efforts in this problem-domain fall into one of two classes:

  1. Modeling, where small to medium sized signaling and regulatory networks with known topology and/or parameters are modeled to determine the operational behavior of these well defined sub-circuits.
  2. Learning, where systems-biology datasets are analyzed with the goal of learning associations between genes and network topology.

We will develop new methods that combine elements from these two classes. The output of these methods will be regulatory and signaling networks with explicit estimates of both network structure and dynamical properties of these networks.

The Inferelator expanded on these earlier works and uses an ODE model for regulatory dynamics and L1-shrinkage as a means of selecting parsimonious models. Key developments included: modeling of environment and TF interactions, the use of a model fitting procedure that allows for missing data and irregular sampling intervals without the need to impute missing data, and the testing of the model on new time series collected after the publication of the initial model. The kinetic description (an ODE or SDE model of the regulatory network) at the core of the original Inferelator encompasses several essential elements required to describe gene transcription, such as control by specific transcriptional activators (or repressors), activation kinetics, and transcript decay, while at the same time facilitating access to computationally efficient methods for searching among an astronomically large number of possible regulators and regulator combinations.

An application of the method to Halobacterium resulted in a regulatory network model that could predict mRNA levels of 2,000 out of the total 2,400 genes found in the genome (a network with 1431 predicted regulatory influences controlling 459 biclusters that represent 85% of the genes in the genome). The network model was tested (trained prior on the 268 conditions available at the time) using 130 additional new measurements, collected after model fitting and, in-fact, after the first publication of the Halobacterium ODE network model. It was found that the prediction error over the training set was essentially the same as that over the new data-set (predicting the genome-wide transcript levels based on the levels measured in the prior time point; i.e. predicting the time evolution of the transcriptome) . This is encouraging as the new data included environmental perturbations, new combination of environmental and genetic perturbations and time series measurements after novel entrainments of the cell. Parts of the network were validated using ChIP-chip. A similar method has also been applied to the learning of human regulatory networks mediating TLR-5 mediated stimulation of macrophage, and learning several other microbial networks. Future plans for applying these algorithms at NYU include application to Arabidopsis, T-cell differentiation, worm early embryogenesis and Bacillus network evolution.