Classify genomic regions by histone chromatin state with genomic context
Source:R/chromatin_states.R
classify_chromatin_states.RdGiven histone mark combinations in the epiRomics database, classifies
regions based on curated chromatin state definitions (ChromHMM/Roadmap
Epigenomics conventions). States are refined by TSS proximity so that
"promoter" labels are assigned only to regions near transcription start
sites (within tss_window bp). Regions with promoter-associated
marks (H3K4me3) that fall outside TSS windows are reclassified as
enhancers (e.g., "active_enhancer" instead of "active_promoter").
Usage
classify_chromatin_states(
database,
histone_marks = NULL,
regions = NULL,
refine_by_tss = TRUE,
tss_window = 2000L
)Arguments
- database
epiRomics class database containing all data initially loaded
- histone_marks
character vector of histone mark names to use for classification. Must match names in
meta(database). If NULL, auto-detects from meta.- regions
GRanges object of regions to classify. If NULL, uses all annotations in the database.
- refine_by_tss
logical. If TRUE (default), promoter states are assigned only to regions within
tss_windowof an annotated TSS. Regions with promoter marks (H3K4me3) outside TSS windows are reclassified as enhancers.- tss_window
integer. Distance in bp around each TSS to define the promoter zone (default: 2000L). Regions within +/-
tss_windowof any annotated TSS are considered "promoter" context.
Value
data.frame with columns: seqnames, start, end, chromatin_state, genomic_context ("promoter"/"gene_body"/"intergenic"), marks_present (comma-separated), n_marks, is_hotspot
Details
Chromatin state definitions (6 simplified labels, priority order):
repressed: H3K27me3 + H3K9me3, or H3K9me3 alone, or H3K27me3 alone (Polycomb/heterochromatin)
bivalent: H3K4me3 + H3K27me3 (poised for activation)
active: H3K4me3 + H3K27ac; H3K4me1 + H3K27ac; H3K4me1 + H3K27ac + H3K36me3; H3K36me3 alone; H2A.Z + H3K27ac
poised: H3K4me1 + H3K27me3; H2A.Z + H3K27me3
primed: H3K4me1 only
unmarked: no marks, H2A.Z alone, or unclassifiable
Genomic context (promoter/gene_body/intergenic) is reported separately so users can infer regulatory identity from position.
TSS Refinement
H3K4me3 is a promoter-specific mark that peaks at TSS regions
(Santos-Rosa et al. 2002, Bernstein et al. 2005). When H3K4me3-containing
states are observed outside TSS windows, they likely represent either:
(a) strong/broad enhancers that recruit H3K4me3 (Pekowska et al. 2011),
or (b) unannotated alternative promoters. By default, these are
reclassified as enhancer states. Set refine_by_tss = FALSE to
disable this behavior.
Genomic Context
A genomic_context column is added to the output indicating whether
each region is at a "promoter" (within tss_window of a TSS),
"gene_body" (overlapping a gene but not near TSS), or "intergenic"
(not overlapping any annotated gene).
References
Ernst J, Kellis M (2012) Nature Methods 9(3):215-216. "ChromHMM: automating chromatin-state discovery and characterization."
Kundaje A et al. (2015) Nature 518(7539):317-330. "Integrative analysis of 111 reference human epigenomes."
Creyghton MP et al. (2010) PNAS 107(50):21931-21936. "Histone H3K27ac separates active from poised enhancers."
Rada-Iglesias A et al. (2011) Nature 470(7333):279-283. "A unique chromatin signature uncovers early developmental enhancers in humans."
Santos-Rosa H et al. (2002) Nature 419(6905):407-411. "Active genes are tri-methylated at K4 of histone H3." H3K4me3 TSS specificity.
Pekowska A et al. (2011) EMBO J 30(20):4198-4210. H3K4me3 at strong enhancers.
Examples
db <- make_example_database()
states <- classify_chromatin_states(db)
table(states$chromatin_state)
#>
#> active poised primed repressed unmarked
#> 4 1 2 1 2