Identify putative enhancer regions using rule-based histone logic
Source:R/putative_enhancers.R
find_putative_enhancers.RdAutomatically scans all histone and histone variant marks in the epiRomics database and applies ChromHMM-based classification rules to identify putative enhancer regions.
Usage
find_putative_enhancers(
database,
chromatin_states = NULL,
hic_contacts = NULL,
enhancer_states = base::c("active", "poised", "primed")
)Arguments
- database
An epiRomics S4 database object.
- chromatin_states
data.frame or NULL. Pre-computed output from
classify_chromatin_states. If NULL (default), computed automatically from all available histone marks.- hic_contacts
data.frame or NULL. Hi-C contacts in BEDPE format. Anchors are added as putative enhancers and classified using available histone data.
- enhancer_states
Character vector. Chromatin states to include as putative enhancers. Default includes all enhancer-related states.
Value
A data.frame with columns:
- putative_id
Integer. Unique enhancer index (sorted by TF co-binding then histone marks).
- chr
Character. Chromosome.
- start
Integer. Start position.
- end
Integer. End position.
- width
Integer. Region width.
- source
Character. Origin:
"histone","hic","tf", or comma-separated if multiple sources contribute.- chromatin_state
Character. Broad state category: Active, Poised, Repressed, or Unmarked.
- chromatin_state_detail
Character. Specific state from
classify_chromatin_states(e.g. active_enhancer, poised_enhancer, primed_enhancer).- histone_marks
Character. Comma-separated histone marks overlapping this region.
- n_histone_marks
Integer. Number of histone marks.
- h2az
Logical. Whether H2A.Z overlaps this region.
- tf_names
Character. Comma-separated TF names with peaks overlapping this region (H2A.Z excluded from TF count).
- n_tfs
Integer. Number of TFs with binding peaks.
Details
This function uses a multi-source approach:
- Chromatin states
Leverages
classify_chromatin_statesto classify all genomic regions covered by histone marks. Regions classified as enhancer-related states are included.- Hi-C contacts
If provided, Hi-C contact anchors are added as putative enhancers. Anchors are classified using the available histone data at each anchor; anchors with no histone coverage are labeled
"Unmarked".- TF binding
Regions bound by at least one TF (type = "chip") are included as putative enhancers. TF binding alone yields
"Unmarked"chromatin state.- H2A.Z regions
H2A.Z-positive regions that were classified as
"unmarked"by chromatin states are recovered as putative enhancers, since H2A.Z is enriched at regulatory elements (Giaimo et al. 2019; Lai & Pugh 2017). H2A.Z alone is insufficient for specific chromatin state assignment.
Unlike earlier versions that required the user to specify exactly two histone marks, this function automatically uses ALL histone marks in the database and applies the full set of classification rules.
Examples
db <- make_example_database()
pe <- find_putative_enhancers(db)
#> Auto-computing chromatin states from all histone marks...
#> Histone-based enhancers: 7 regions (from 7 classified regions)
#> States: active=4, poised=1, primed=2
#> TF binding source: 4 regions from 2 TFs (TF1, TF2)
#> Union: 7 non-overlapping putative enhancer regions
#> Putative enhancers: 7 total
#> Active: 0 | Poised: 0 | Unmarked: 7 | Repressed: 0
#> H2A.Z+: 0 | TF-bound: 4 | High co-binding (>=3 TFs): 0
#> Sources: histone=3, histone,tf=4
head(pe[, c("chr", "start", "end", "chromatin_state")])
#> chr start end chromatin_state
#> 1 chr1 1000 2000 Unmarked
#> 2 chr1 50000 51000 Unmarked
#> 3 chr1 5000 6000 Unmarked
#> 4 chr1 20000 21000 Unmarked
#> 5 chr1 10000 11000 Unmarked
#> 6 chr1 30000 31000 Unmarked