Structure Prediction Protein Design Biclustering + cMonkey Network Inference Social Networks Publications        Links + Collaborations People        Opportunities        Software + Code Teaching       



I aim to provide practical CS training for biologists and hands on experiences with biology data and biology questions for CS majors. Biology is one of the more interdisciplinary fields and therefore presents curriculum challenges that we are a long way from solving. I hope to continue being a part of the ever-necessary renewal of our CS and biology curriculum. My main teaching activities are concentrated in two new courses that I have designed and teach, a consistent effort to mentor 1-2 high school students at any given time, team-teaching and guest lectures, and the mentorship of PhD and masters students.

My first new solo course at NYU was G23.1127 (x-listed as G22.2520-001) Bioinformatics and Genomes. This class aimed to do three things:

  1. Provide students with a principled and consistent theoretical treatment of the sequence analysis (e.g. alignment, RNA structure prediction and building phylogenetic trees),
  2. Provide practical experience relevant to their research interests via projects, and
  3. Foster, via the projects, the close partnership of Biology and CS graduate students.
This course was designed from scratch, as I did not find any previous model or syllabus that met these three needs. The course ratings and student feedback are quite positive and several of the CS-biology bonds formed within the class were kept up beyond the class, resulting in lasting intellectual partnerships between our graduate students. This year this class has ~30 registered graduate and undergraduate students and ~10 people auditing the class (a very high level of interest for an advanced topic graduate course).

My second new solo course stemmed from what I perceived to be a hole in the undergraduate CS curriculum, practical experience programming with large data-sets and basic statistics. This class teaches students to program using higher-level numerical and statistics languages (like R, octave, numPy) and relational databases while trying to teach them statistics. Evaluation is based on projects and homework. We use genomics data-sets as examples throughout the course, but I encourage students to find problems they are more interested in and have had students projects focus on classification of music, integrating statistics into journalism, and social networking.

I have found that the incoming students (typically CS seniors and juniors) lack even the most rudimentary experience with statistics (my class or something like it is needed). Student feedback has been positive and course rating have been good/high, but I feel this area (programing with data) is something that we need to continue to develop at the undergraduate level at NYU, with major modifications to my course and better integration with courses that develop related topics (a data-science track).

After teaching these two classes I still struggle with balancing test and homework-based evaluation and project driven evaluation. Homework and test based evaluation has the advantage of locally reinforcing the material, helping to pace the students, and identifying students that need more attention. Project driven evaluation and inquiry based curricula, in my experience, leads to better long term learning of skills, large improvements in student communication skills, and better mirrors what we are training these students to do. My undergraduate course is an even mix of these two methods for evaluation, my graduate course is all project driven (since I feel that graduate students should never be made to take quizzes or tests). Although I have taught both of these classes multiple times they are still works in progress, and I hope to continue tuning them in the years to come.

Both of my classes are described in much greater detail on each courses' Wiki / Webpage:

G23.1127 / G22.2520-001 Bioinformatics and Genomes

(Spring 2007, 2008, 2009, 2011 - Richard Bonneau, sole instructor)

Graduate Course: Cross listed with CS and Biology. We were successful in forming balanced groups of CS and Bio students and two of the class projects resulted in results that contributed to publications. One nice example was a student in Claude Desplan's lab teaming up with two CS students to make a very nice tool that they then used to explore promoters controlling the circuit surrounding a key step in fly eye development.

The course was well received and I am teaching a slightly revised version of it this semester (SP11). Two years ago 20 students took the class (17 for credit) ~1/2 CS and 1/2 Bio, and is now enrollment is at 30+ students for credit and ~10 auditing. This year I am adapting the curriculum to include three new sections of next-generation sequencing analysis (by popular demand).

Course Outline

Course Wiki

V22.0480-003 Special Topics: Computing with Large Data Sets

(Spring 2009, Fall 2011 - Richard Bonneau, sole instructor)

Undergraduate course, Cross listed with CS and Biology. Enormous collections of data in multiple fields of science and engineering are being gathered that fundamentally challenge the way we analyst large datasets, and thereby our current offerings for undergraduates majoring in CS at NYU. For example, the Sloan Digital Sky Survey will represent more than 200 million objects, each with 100 dimensions. Other activities in physics, biology, astronomy, and medicine will soon gather ever-larger sets of data (for example the planned sequencing on 10,000 human beings).  This course discusses some of the associated unprecedented computational challenges, focusing on very large data sets arising in computational biology.

High-level languages for mathematical modeling and statistical analysis offer a double-edged sword: use these languages correctly and you’ll be able to prototype methods for data analysis and discovery that amaze your co-workers and can be translated into stand-alone code and Web services; but use these language incorrectly and you will end up with inefficient code that is impossible for others to understand. The course is intended to address some of the needed general principles by using the R statistical programming language to analyze large genomic data sets, which provide examples of data types and statistical analysis common to several problems. In the first few lectures I describe what these large biology data sets are, where they come from, and what we’ll try to learn from them.  We then learn basic statistics fundamentals and how to program R in ways that are efficient in usage of processor, memory, and programmer time.

Course Outline

Course Wiki

Team Teaching

Team Taught:



My module in BioCore covered Genomics technologies, such as microarray manufacture, error/quality control, proteomics, sequencing.

Outreach and Undergraduate Research

I have been devoted to teaching high school students since graduate school. During my PhD I convinced my program to allow me to teach at an area high school (NOVA alternative high school, Seattle) instead of being a teaching assistant. As a Sr. Scientist at the ISB in Seattle I participated in high school out reach as part of the ISB Center for Inquiry Science.

My main outreach activity has been mentoring High-school students over the summer. We have had one High School student in the lab each year, in his second year in the lab Kevin returned for a second year and was able to help Devorah in her first summer in the lab. Three of these interns, including the two high school students, made significant contributions and are included on one or more papers. Both of these students were great and went on to MIT and Princeton respectively.

Guest Lectures

I have given guest lectures in the following courses:


Tutorials (NYU-medical school PhD students): These are one-on-one tutorials, given by professors with experience in the area of interest. These tutorials serve the purpose of an alternate proposal. Keren Klein (reviewing papers on protein design), Ben Bartelle (reviewing synthetic biology).