Functional Genomics with Deep Sequencing
The rapid adoption of next-generation sequencing for the study of protein-DNA interactions and transcriptomes is revolutionizing functional genomics. This tutorial introduces the main technologies and protocols, the currently available tools for analysis, and outlines some of the main challenges of analysing and integrating ChIP-seq and RNA-seq datasets.
Ali Mortazavi has recently published a review of published ChIP-seq and RNA-seq packages . Our tutorial will build-up from this review and incorporate the continuing advances in the analysis of RNA-seq by each of our groups and others, particularly in the analysis of transcript-level expression, and further expand on the integrative analysis of ChIP-seq and RNA-seq data.
- Introduction to next-generation sequencing technologies and counting assays. We will first start with a general description of why biologists want quantitative assays to understand the systems that they are studying, and motivate the measurement of protein-DNA interactions and transcriptomes. We will introduce the various technology platforms, their respective strengths, error profiles, and how raw reads are processed through the vendor pipelines, with a particular emphasis on how these affect the downstream analyses.
- ChIP-seq. We will first discuss the basics of a ChIP-seq experiment and the three different types of signal classes observed:
- punctate binding (typical of transcription factors)
- medium binding (typical of histone marks and polymerase near transcribed genes)
- large-domain binding (typical of repressive marks).
We will then discuss the various algorithms that are published for calling ChIP-seq enriched regions and their respective suitability to the analysis of these different signal classes. We will conclude with criteria to evaluate the quality of ChIP-seq datasets as well as downstream uses of the regions for motif and gene enrichment analyses.
- RNA-seq. After a quick introduction to the various variants of RNA-seq, we will focus on the use of RNA-seq for discovery and quantitation of messenger RNA using the latest protocol enhancements such as:
- longer reads
- strand-preserving protocols
- paired-end read.
We will emphasize the additional challenges posed by spliced read mapping and introduce the various published strategies including annotation-assisted, ab initio (with a reference genome but not annotations), and de novo transcript assembly. We will discuss moving from gene-level quantification to transcript-level quantification. We will finish by detailing some additional uses of RNA-seq reads for detection of haplotype-specific expression, RNA-editing, and RNA-protein interactions (RIP-seq).
- Integration. We will conclude the tutorial by focusing on approaches to analysing multiple ChIP-seq and RNA-seq datasets from time course data or across different cell types jointly using machine learning approaches such as hidden-Markov models for the integrative analysis of transcriptional regulation.
|||Pepke, S., Wold, B., Mortazavi, A. (2009) Computation for ChIP-seq and RNA-seq studies, Nature Methods, http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1371.html.|