Identify putative enhancer regions using rule-based histone logic

Automatically scans all histone and histone variant marks in the epiRomics database and applies ChromHMM-based classification rules to identify putative enhancer regions.

Usage

find_putative_enhancers(
  database,
  chromatin_states = NULL,
  hic_contacts = NULL,
  enhancer_states = base::c("active", "poised", "primed")
)

Arguments

database: An epiRomics S4 database object.
chromatin_states: data.frame or NULL. Pre-computed output from classify_chromatin_states. If NULL (default), computed automatically from all available histone marks.
hic_contacts: data.frame or NULL. Hi-C contacts in BEDPE format. Anchors are added as putative enhancers and classified using available histone data.
enhancer_states: Character vector. Chromatin states to include as putative enhancers. Default includes all enhancer-related states.

Value

A data.frame with columns:

putative_id: Integer. Unique enhancer index (sorted by TF co-binding then histone marks).
chr: Character. Chromosome.
start: Integer. Start position.
end: Integer. End position.
width: Integer. Region width.
source: Character. Origin: "histone", "hic", "tf", or comma-separated if multiple sources contribute.
chromatin_state: Character. Broad state category: Active, Poised, Repressed, or Unmarked.
chromatin_state_detail: Character. Specific state from classify_chromatin_states (e.g. active_enhancer, poised_enhancer, primed_enhancer).
histone_marks: Character. Comma-separated histone marks overlapping this region.
n_histone_marks: Integer. Number of histone marks.
h2az: Logical. Whether H2A.Z overlaps this region.
tf_names: Character. Comma-separated TF names with peaks overlapping this region (H2A.Z excluded from TF count).
n_tfs: Integer. Number of TFs with binding peaks.

Details

This function uses a multi-source approach:

Chromatin states: Leverages classify_chromatin_states to classify all genomic regions covered by histone marks. Regions classified as enhancer-related states are included.
Hi-C contacts: If provided, Hi-C contact anchors are added as putative enhancers. Anchors are classified using the available histone data at each anchor; anchors with no histone coverage are labeled "Unmarked".
TF binding: Regions bound by at least one TF (type = "chip") are included as putative enhancers. TF binding alone yields "Unmarked" chromatin state.
H2A.Z regions: H2A.Z-positive regions that were classified as "unmarked" by chromatin states are recovered as putative enhancers, since H2A.Z is enriched at regulatory elements (Giaimo et al. 2019; Lai & Pugh 2017). H2A.Z alone is insufficient for specific chromatin state assignment.

Unlike earlier versions that required the user to specify exactly two histone marks, this function automatically uses ALL histone marks in the database and applies the full set of classification rules.

Examples

db <- make_example_database()
pe <- find_putative_enhancers(db)
#> Auto-computing chromatin states from all histone marks...
#> Histone-based enhancers: 7 regions (from 7 classified regions)
#>   States: active=4, poised=1, primed=2
#> TF binding source: 4 regions from 2 TFs (TF1, TF2)
#> Union: 7 non-overlapping putative enhancer regions
#> Putative enhancers: 7 total
#>   Active: 0 | Poised: 0 | Unmarked: 7 | Repressed: 0
#>   H2A.Z+: 0 | TF-bound: 4 | High co-binding (>=3 TFs): 0
#>   Sources: histone=3, histone,tf=4
head(pe[, c("chr", "start", "end", "chromatin_state")])
#>    chr start   end chromatin_state
#> 1 chr1  1000  2000        Unmarked
#> 2 chr1 50000 51000        Unmarked
#> 3 chr1  5000  6000        Unmarked
#> 4 chr1 20000 21000        Unmarked
#> 5 chr1 10000 11000        Unmarked
#> 6 chr1 30000 31000        Unmarked