The ENCODE-ing of higher
order chromatin structure

Ben Moore (@benjaminlmoore)
BSA section meeting, September 16th 2014

Slides online at:


  • Introduce higher order chromatin structure and the Hi-C technique

  • Describe our work integrating Hi-C with new ENCODE datasets

  • Highlight some of the results from the first two years of my PhD

What's higher order chromatin structure?

Looks something like this:

fractal globule
Lieberman Aiden et al. (2009)

Or a bit more clearly:

higher order chromatin
Steensel and Dekker (2010)


hi-c technique
Belton et al. (2012)

The results

Statistical corrections

Yaffe biases

Figure from Yaffe and Tanay (2011)

Insights from Hi-C data

Multi-megabase chromosome compartments


Left figure from Lieberman-Aiden et al. (2009)

Sub-megabase topological domains


Figure from Dixon et al. (2012)

Higher resolution structures called
"Topological associating domains"


~800 kb in size (compartments ~5 Mb)

Identified using directional contact bias

Highlighted regions as TAD boundaries

TAD boundaries

dixon_tad bounds

Figure from Dixon et al. (2012)

Our results


  1. Get raw Hi-C reads (three pulications for different human cell lines)

  2. Uniformly process interaction matrices and normalise with ICE (Imakaev et al., 2012)

  3. Call compartments, TADs, boundaries — compare these across cell types

  4. Investigate relationship with ENCODE ChIP-seq data: uniformly-processed signal (fold-change relative to input, from Boyle et al., 2014)

Higher order structure highly conserved genome-wide

conserved chromatin structure

...and at multiple levels of higher-order chromatin structure.

conserved chromatin structure

Regions in which chromatin structure varies between cell types

variable regions

Overall, flipped regions are enriched for tissue-specific enhancers

ChromHMM + SegWay consensus chromatin state predictions from ENCODE

Split into cell-type specific and shared (overlapping annotation in ≥ 2 cell types)

Enrichment for cell-type specific enhancers in flipped open regions in the two lineage-committed cell lines


Boundary enrichments at different scales


Boundary profiles have previously been looked at for a handful of features,
we can do this with lots more and test their significance

CTCF and YY1 consistently mark both TAD and compartment boundaries

all boundary p-vals

Can compartments be predicted from ENCODE ChIP-seq data?

And if so:
  • What are the most informative variables
  • Do rules differ between cell types
  • Can we call compartments when Hi-C data isn't available


Decent modeling accuracy, a variety of informative features


But input features suffer from multicollinearity

cluster heatmap gm12878

Allowing models to cross-apply between cell types


H1 embryonic stem cells looks like an outlier

  …generally open, more permissive genome organisation than the lineage-commited lines.


  • Open / closed compartments well-correlated with combinatorial patterns of histone mods + DNA binding proteins, enabling accurate predictive models

  • Cross-application shows common structural rules across human cell types

  • Still, variable importances reflect known biology (EGR1 in stem cells, H3k9me3 in K562)

  • CTCF and YY1 may combine to create organisational boundaries at multiple scales

  • Cell-type specific enhancers likely responsible for local changes in chromatin architecture

Thanks for your attention

Supervisors: Colin Semple and Stuart Aitken


Belton et al. (2012) Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 268-76.
Boyle et al. (2014) Comparative analysis of regulatory information and circuits across distant species. Nature, 512, 435-6.
Dixon et al. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376-80.
Imakaev et al. (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature methods, 9, 999-1003.
Kalhor et al. (2011) Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature biotechnology, 30, 90-8.
Lieberman Aiden et al. (2009) Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 326, 289-93.
Yaffe and Tanay (2011) Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature genetics, 43, 1059-65.

HTML5 presentation written in RMarkdown using library("slidify").