Skip to contents

Given histone mark combinations in the epiRomics database, classifies regions based on curated chromatin state definitions (ChromHMM/Roadmap Epigenomics conventions). States are refined by TSS proximity so that "promoter" labels are assigned only to regions near transcription start sites (within tss_window bp). Regions with promoter-associated marks (H3K4me3) that fall outside TSS windows are reclassified as enhancers (e.g., "active_enhancer" instead of "active_promoter").

Usage

classify_chromatin_states(
  database,
  histone_marks = NULL,
  regions = NULL,
  refine_by_tss = TRUE,
  tss_window = 2000L
)

Arguments

database

epiRomics class database containing all data initially loaded

histone_marks

character vector of histone mark names to use for classification. Must match names in meta(database). If NULL, auto-detects from meta.

regions

GRanges object of regions to classify. If NULL, uses all annotations in the database.

refine_by_tss

logical. If TRUE (default), promoter states are assigned only to regions within tss_window of an annotated TSS. Regions with promoter marks (H3K4me3) outside TSS windows are reclassified as enhancers.

tss_window

integer. Distance in bp around each TSS to define the promoter zone (default: 2000L). Regions within +/- tss_window of any annotated TSS are considered "promoter" context.

Value

data.frame with columns: seqnames, start, end, chromatin_state, genomic_context ("promoter"/"gene_body"/"intergenic"), marks_present (comma-separated), n_marks, is_hotspot

Details

Chromatin state definitions (6 simplified labels, priority order):

  • repressed: H3K27me3 + H3K9me3, or H3K9me3 alone, or H3K27me3 alone (Polycomb/heterochromatin)

  • bivalent: H3K4me3 + H3K27me3 (poised for activation)

  • active: H3K4me3 + H3K27ac; H3K4me1 + H3K27ac; H3K4me1 + H3K27ac + H3K36me3; H3K36me3 alone; H2A.Z + H3K27ac

  • poised: H3K4me1 + H3K27me3; H2A.Z + H3K27me3

  • primed: H3K4me1 only

  • unmarked: no marks, H2A.Z alone, or unclassifiable

Genomic context (promoter/gene_body/intergenic) is reported separately so users can infer regulatory identity from position.

TSS Refinement

H3K4me3 is a promoter-specific mark that peaks at TSS regions (Santos-Rosa et al. 2002, Bernstein et al. 2005). When H3K4me3-containing states are observed outside TSS windows, they likely represent either: (a) strong/broad enhancers that recruit H3K4me3 (Pekowska et al. 2011), or (b) unannotated alternative promoters. By default, these are reclassified as enhancer states. Set refine_by_tss = FALSE to disable this behavior.

Genomic Context

A genomic_context column is added to the output indicating whether each region is at a "promoter" (within tss_window of a TSS), "gene_body" (overlapping a gene but not near TSS), or "intergenic" (not overlapping any annotated gene).

References

  • Ernst J, Kellis M (2012) Nature Methods 9(3):215-216. "ChromHMM: automating chromatin-state discovery and characterization."

  • Kundaje A et al. (2015) Nature 518(7539):317-330. "Integrative analysis of 111 reference human epigenomes."

  • Creyghton MP et al. (2010) PNAS 107(50):21931-21936. "Histone H3K27ac separates active from poised enhancers."

  • Rada-Iglesias A et al. (2011) Nature 470(7333):279-283. "A unique chromatin signature uncovers early developmental enhancers in humans."

  • Santos-Rosa H et al. (2002) Nature 419(6905):407-411. "Active genes are tri-methylated at K4 of histone H3." H3K4me3 TSS specificity.

  • Pekowska A et al. (2011) EMBO J 30(20):4198-4210. H3K4me3 at strong enhancers.

Examples

db <- make_example_database()
states <- classify_chromatin_states(db)
table(states$chromatin_state)
#> 
#>    active    poised    primed repressed  unmarked 
#>         4         1         2         1         2