Skip to contents

Multi-mode accessibility filter for putative enhancers. Supports four complementary evidence types that can be used independently or combined.

Usage

filter_accessible_regions(
  putative_enhancers,
  track_connection = NULL,
  mode = "signal",
  scope = "filter_distal",
  signal_threshold = 2,
  bed_path = NULL,
  gene_list = NULL,
  promoter_distance = 2000L
)

Arguments

putative_enhancers

data.frame. Output from find_putative_enhancers. Must contain columns chr, start, end. For genelist mode, also requires a SYMBOL column.

track_connection

data.frame or NULL. BigWig track connection sheet. Required for mode = "signal".

mode

Character. Filtering mode: "signal" (default), "bed", "genelist", or "combined".

scope

Character. Filtering scope: "filter_distal" (default) or "filter_all".

signal_threshold

Numeric. Z-score threshold for signal mode (default 2).

bed_path

Character or NULL. Path to a BED file for bed mode.

gene_list

Character vector or NULL. Expressed gene symbols for genelist mode.

promoter_distance

Integer. Distance from TSS to classify as promoter-proximal (default 2000 bp). Only used when scope = "filter_distal".

Value

The input data.frame with an atac_accessible logical column. For signal mode, also includes per-sample signal and accessibility columns.

Modes

signal

Import ATAC-seq/DNase-seq BigWig signal over each region. Regions with mean signal above a z-score threshold are flagged accessible. Requires track_connection with ATAC/DNase tracks.

bed

Overlap with an external accessibility BED file (e.g., ENCODE peaks, DHS hotspots). Any region overlapping a BED entry is flagged. Requires bed_path.

genelist

Retain enhancers linked to expressed genes. Regions whose SYMBOL column matches a gene in gene_list are flagged. Requires gene_list.

combined

Union of all available evidence. A region is retained if ANY mode flags it as accessible.

Scope

The scope parameter controls which regions are evaluated:

filter_distal

Only distal (non-promoter) enhancers are filtered; promoter-proximal regions are always retained. (Default)

filter_all

All regions are subject to filtering, including promoter-proximal ones.

Examples

db <- make_example_database()
pe <- find_putative_enhancers(db)
#> Auto-computing chromatin states from all histone marks...
#> Histone-based enhancers: 7 regions (from 7 classified regions)
#>   States: active=4, poised=1, primed=2
#> TF binding source: 4 regions from 2 TFs (TF1, TF2)
#> Union: 7 non-overlapping putative enhancer regions
#> Putative enhancers: 7 total
#>   Active: 0 | Poised: 0 | Unmarked: 7 | Repressed: 0
#>   H2A.Z+: 0 | TF-bound: 4 | High co-binding (>=3 TFs): 0
#>   Sources: histone=3, histone,tf=4
## Gene-list mode: attach synthetic SYMBOL column, then filter
pe$SYMBOL <- paste0("GENE", seq_len(nrow(pe)))
pe_genes <- filter_accessible_regions(
  pe, mode = "genelist", gene_list = c("GENE1", "GENE2")
)
#> Gene list: 2 / 7 regions linked to expressed genes (28.6%)
#> Accessibility filter (mode=genelist, scope=filter_distal): 2 / 7 regions pass (28.6%)
sum(pe_genes$atac_accessible)
#> [1] 2