The ENCODE-ing of higher
order chromatin organisation

Ben Moore (@benjaminlmoore)
Research talk, April 22nd 2015

Slides online at: blm.io/talks/restalk

Introduction

chromatin
Risca and Greenleaf (2015)

Higher order chromatin organisation:

fractal globule
Lieberman Aiden et al. (2009)

As a section:

higher order chromatin
Steensel and Dekker (2010)

The Hi-C method

hi-c technique
Belton et al. (2012)

Results

Statistical corrections

Yaffe biases

Figure from Yaffe and Tanay (2011)

Insights into higher order organisation from Hi-C data

Multi-megabase chromosome compartments


leeb

Left figure from Lieberman-Aiden et al. (2009)

Sub-megabase topological domains

tads

Figure from Dixon et al. (2012)

Higher resolution structures called
"Topological associating domains"

TADs


~800 kb in size (compartments ~5 Mb)

Identified using directional contact bias

Highlighted regions as TAD boundaries

TAD boundary enrichments

dixon_tad bounds

Figure from Dixon et al. (2012)

Our results

Plan


  1. Get raw Hi-C reads (three pulications for different human cell lines)

  2. Uniformly process interaction matrices and normalise with ICE (Imakaev et al., 2012)

  3. Call compartments, TADs, boundaries — compare these across cell types

  4. Investigate relationship with ENCODE ChIP-seq data: uniformly-processed signal (fold-change relative to input, from Boyle et al., 2014)


Higher order structure highly conserved genome-wide

conserved chromatin structure

...and at multiple levels of higher-order chromatin structure.


conserved chromatin structure

But interesting biology in variable regions

flipped regions

Overall, flipped regions are enriched for tissue-specific enhancers

ChromHMM + SegWay consensus chromatin state predictions from ENCODE

Split into cell-type specific and shared (overlapping annotation in ≥ 2 cell types)


Enrichment for cell-type specific enhancers in flipped open regions in the two lineage-committed cell lines


enhancers

Boundary enrichments at different scales

top6


Boundary profiles have previously been looked at for a handful of features,
we can do this with lots more...

all TAD features

CTCF and YY1 consistently mark both TAD and compartment boundaries

all boundary p-vals

Can compartments be predicted from ENCODE ChIP-seq data?

And if so:
  • What are the most informative variables
  • Do rules differ between cell types
  • Can we call compartments when Hi-C data isn't available


res

Comparing Random Forest with other approaches


comparison with lm and pls

Decent modeling accuracy, a variety of informative features

res

But input features suffer from multicollinearity

cluster heatmap gm12878

Allowing models to cross-apply between cell types

cross-application

H1 embryonic stem cells looks like an outlier

  …generally open, more permissive genome organisation than the lineage-commited lines.

TADs or sub-compartments?

gm tad heatmap

Summary


  • Open / closed compartments well-correlated with combinatorial patterns of histone mods + DNA binding proteins, enabling accurate predictive models

  • Cross-application shows common structural rules across human cell types

  • Still, variable importances reflect known biology (EGR1 in stem cells, H3k9me3 in K562)

  • CTCF and YY1 may combine to create organisational boundaries at multiple scales

  • Increased amounts of cell-type specific enhancers correlate with significant changes in chromatin architecture

Thanks for your attention


paper header


References:

Belton et al. (2012) Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 268-76.
Boyle et al. (2014) Comparative analysis of regulatory information and circuits across distant species. Nature, 512, 435-6.
Dixon et al. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376-80.
Imakaev et al. (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature methods, 9, 999-1003.
Kalhor et al. (2011) Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature biotechnology, 30, 90-8.
Lieberman Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 326, 289-93.
Risca and Greenleaf (2015) Unraveling the 3D genome: genomics tools for multiscale exploration. Trends in Genetics, epub ahead of print.
Yaffe and Tanay (2011) Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature genetics, 43, 1059-65.

Code to reproduce all analyses is available at: github.com/blmoore/3dgenome.


Meeting 5pm today at 7 George Square!

Sponsored by: