Skip to contents

Automatically scans all histone and histone variant marks in the epiRomics database and applies ChromHMM-based classification rules to identify putative enhancer regions.

Usage

find_putative_enhancers(
  database,
  chromatin_states = NULL,
  hic_contacts = NULL,
  enhancer_states = base::c("active", "poised", "primed")
)

Arguments

database

An epiRomics S4 database object.

chromatin_states

data.frame or NULL. Pre-computed output from classify_chromatin_states. If NULL (default), computed automatically from all available histone marks.

hic_contacts

data.frame or NULL. Hi-C contacts in BEDPE format. Anchors are added as putative enhancers and classified using available histone data.

enhancer_states

Character vector. Chromatin states to include as putative enhancers. Default includes all enhancer-related states.

Value

A data.frame with columns:

putative_id

Integer. Unique enhancer index (sorted by TF co-binding then histone marks).

chr

Character. Chromosome.

start

Integer. Start position.

end

Integer. End position.

width

Integer. Region width.

source

Character. Origin: "histone", "hic", "tf", or comma-separated if multiple sources contribute.

chromatin_state

Character. Broad state category: Active, Poised, Repressed, or Unmarked.

chromatin_state_detail

Character. Specific state from classify_chromatin_states (e.g. active_enhancer, poised_enhancer, primed_enhancer).

histone_marks

Character. Comma-separated histone marks overlapping this region.

n_histone_marks

Integer. Number of histone marks.

h2az

Logical. Whether H2A.Z overlaps this region.

tf_names

Character. Comma-separated TF names with peaks overlapping this region (H2A.Z excluded from TF count).

n_tfs

Integer. Number of TFs with binding peaks.

Details

This function uses a multi-source approach:

Chromatin states

Leverages classify_chromatin_states to classify all genomic regions covered by histone marks. Regions classified as enhancer-related states are included.

Hi-C contacts

If provided, Hi-C contact anchors are added as putative enhancers. Anchors are classified using the available histone data at each anchor; anchors with no histone coverage are labeled "Unmarked".

TF binding

Regions bound by at least one TF (type = "chip") are included as putative enhancers. TF binding alone yields "Unmarked" chromatin state.

H2A.Z regions

H2A.Z-positive regions that were classified as "unmarked" by chromatin states are recovered as putative enhancers, since H2A.Z is enriched at regulatory elements (Giaimo et al. 2019; Lai & Pugh 2017). H2A.Z alone is insufficient for specific chromatin state assignment.

Unlike earlier versions that required the user to specify exactly two histone marks, this function automatically uses ALL histone marks in the database and applies the full set of classification rules.

Examples

db <- make_example_database()
pe <- find_putative_enhancers(db)
#> Auto-computing chromatin states from all histone marks...
#> Histone-based enhancers: 7 regions (from 7 classified regions)
#>   States: active=4, poised=1, primed=2
#> TF binding source: 4 regions from 2 TFs (TF1, TF2)
#> Union: 7 non-overlapping putative enhancer regions
#> Putative enhancers: 7 total
#>   Active: 0 | Poised: 0 | Unmarked: 7 | Repressed: 0
#>   H2A.Z+: 0 | TF-bound: 4 | High co-binding (>=3 TFs): 0
#>   Sources: histone=3, histone,tf=4
head(pe[, c("chr", "start", "end", "chromatin_state")])
#>    chr start   end chromatin_state
#> 1 chr1  1000  2000        Unmarked
#> 2 chr1 50000 51000        Unmarked
#> 3 chr1  5000  6000        Unmarked
#> 4 chr1 20000 21000        Unmarked
#> 5 chr1 10000 11000        Unmarked
#> 6 chr1 30000 31000        Unmarked