Filter putative enhancers by chromatin accessibility evidence
Source:R/putative_enhancers.R
filter_accessible_regions.RdMulti-mode accessibility filter for putative enhancers. Supports four complementary evidence types that can be used independently or combined.
Usage
filter_accessible_regions(
putative_enhancers,
track_connection = NULL,
mode = "signal",
scope = "filter_distal",
signal_threshold = 2,
bed_path = NULL,
gene_list = NULL,
promoter_distance = 2000L
)Arguments
- putative_enhancers
data.frame. Output from
find_putative_enhancers. Must contain columnschr,start,end. Forgenelistmode, also requires aSYMBOLcolumn.- track_connection
data.frame or NULL. BigWig track connection sheet. Required for
mode = "signal".- mode
Character. Filtering mode:
"signal"(default),"bed","genelist", or"combined".- scope
Character. Filtering scope:
"filter_distal"(default) or"filter_all".- signal_threshold
Numeric. Z-score threshold for signal mode (default 2).
- bed_path
Character or NULL. Path to a BED file for
bedmode.- gene_list
Character vector or NULL. Expressed gene symbols for
genelistmode.- promoter_distance
Integer. Distance from TSS to classify as promoter-proximal (default 2000 bp). Only used when
scope = "filter_distal".
Value
The input data.frame with an atac_accessible logical column.
For signal mode, also includes per-sample signal and accessibility
columns.
Modes
- signal
Import ATAC-seq/DNase-seq BigWig signal over each region. Regions with mean signal above a z-score threshold are flagged accessible. Requires
track_connectionwith ATAC/DNase tracks.- bed
Overlap with an external accessibility BED file (e.g., ENCODE peaks, DHS hotspots). Any region overlapping a BED entry is flagged. Requires
bed_path.- genelist
Retain enhancers linked to expressed genes. Regions whose
SYMBOLcolumn matches a gene ingene_listare flagged. Requiresgene_list.- combined
Union of all available evidence. A region is retained if ANY mode flags it as accessible.
Scope
The scope parameter controls which regions are evaluated:
- filter_distal
Only distal (non-promoter) enhancers are filtered; promoter-proximal regions are always retained. (Default)
- filter_all
All regions are subject to filtering, including promoter-proximal ones.
Examples
db <- make_example_database()
pe <- find_putative_enhancers(db)
#> Auto-computing chromatin states from all histone marks...
#> Histone-based enhancers: 7 regions (from 7 classified regions)
#> States: active=4, poised=1, primed=2
#> TF binding source: 4 regions from 2 TFs (TF1, TF2)
#> Union: 7 non-overlapping putative enhancer regions
#> Putative enhancers: 7 total
#> Active: 0 | Poised: 0 | Unmarked: 7 | Repressed: 0
#> H2A.Z+: 0 | TF-bound: 4 | High co-binding (>=3 TFs): 0
#> Sources: histone=3, histone,tf=4
## Gene-list mode: attach synthetic SYMBOL column, then filter
pe$SYMBOL <- paste0("GENE", seq_len(nrow(pe)))
pe_genes <- filter_accessible_regions(
pe, mode = "genelist", gene_list = c("GENE1", "GENE2")
)
#> Gene list: 2 / 7 regions linked to expressed genes (28.6%)
#> Accessibility filter (mode=genelist, scope=filter_distal): 2 / 7 regions pass (28.6%)
sum(pe_genes$atac_accessible)
#> [1] 2