Classify regions by cell-type-specific binary accessibility
Source:R/visualizations.R
classify_celltype_accessibility.RdFor each genomic region, determines whether it is accessible in each cell type using per-sample z-score thresholding on BigWig ATAC-seq signal. Unlike fold-change approaches (which compare relative enrichment between samples), this method makes an independent binary open/closed call per sample, then classifies regions based on the combination of binary calls.
Arguments
- bw_paths
named character vector of BigWig file paths. Names are cell-type labels (e.g.,
c(Alpha = "alpha.atac.bw", Beta = "beta.atac.bw")).- regions
GRanges object of regions to classify.
- z_threshold
numeric, z-score cutoff for calling a region as accessible (default: 1.0). Higher values are more stringent. Use
plot_signal_histogramto visualize the signal distribution and identify an appropriate threshold for your data.- auto_threshold
logical. If TRUE, automatically detect the optimal threshold from the signal distribution using kernel density valley detection, overriding
z_threshold(default: FALSE).
Value
A data.frame with columns:
- region_idx
integer, index into input regions
- cell_type
character, cell-type label (single name if accessible in only one cell type),
"Shared"if accessible in all cell types, or"Closed"if not accessible in any- accessible_in
character, comma-separated list of cell types where the region is accessible (empty string if closed)
The binary accessibility matrix (logical, regions x cell types) is
attached as the "accessibility_matrix" attribute.
Details
A region with high signal in BOTH cell types is correctly classified as
"Shared" regardless of the magnitude difference between them.
This avoids the fold-change pitfall where a region with signal 50 in one
cell type and 100 in the other is called "enriched" even though both have
accessible chromatin.
Methodology
For each BigWig file independently:
Import mean ATAC signal per region
Compute genome-wide null distribution (mean and SD of all region signals)
Apply z-score threshold: z = (signal - mean) / sd
Regions with z >= threshold are called accessible (open)
The z-score approach normalizes within each sample, making the open/closed call robust to differences in sequencing depth or library complexity between samples (Buenrostro et al. 2013, Nat Methods; Corces et al. 2018, Science).
Examples
bw_path <- make_example_bigwig()
regions <- GenomicRanges::GRanges(
"chr1", IRanges::IRanges(
start = c(1000L, 5000L, 10000L, 20000L, 50000L),
end = c(2000L, 6000L, 11000L, 21000L, 51000L)
)
)
ct <- classify_celltype_accessibility(
bw_paths = c(Alpha = bw_path, Beta = bw_path),
regions = regions
)
table(ct$cell_type)
#>
#> Closed Shared
#> 4 1
file.remove(bw_path)
#> [1] TRUE