Compute per-region signal z-scores from a BigWig file
Source:R/visualizations.R
call_accessible_regions.RdCalculates the mean ATAC/ChIP signal per region from a BigWig file and returns z-scores relative to the distribution across all input regions. The null distribution is estimated from the signal across all regions, assuming most of the genome has low/no signal. Regions with z-scores above the threshold are called as accessible.
Usage
call_accessible_regions(
bw_path,
regions,
z_threshold = 1,
auto_threshold = FALSE,
return_scores = FALSE
)Arguments
- bw_path
character, path to a BigWig file.
- regions
GRanges object of regions to classify.
- z_threshold
numeric, z-score cutoff for calling accessibility (default: 1.0). Regions with z >= threshold are called accessible. Set to NULL to return z-scores without thresholding. Ignored when
auto_threshold = TRUE.- auto_threshold
logical. If TRUE, automatically detects the optimal threshold by finding the valley between the noise and signal peaks in the log-transformed signal distribution using kernel density estimation. This is the recommended approach – let the data define background vs. signal rather than using a fixed z-score (default: FALSE).
- return_scores
logical. If TRUE, returns a list with z_scores, raw signal, and boolean calls. If FALSE (default), returns a logical vector for backward compatibility.
Value
If return_scores = FALSE (default): a logical vector the
same length as regions, TRUE for accessible regions.
If return_scores = TRUE: a list with:
- z_scores
numeric vector of z-scores
- signal
numeric vector of raw mean signal per region
- accessible
logical vector (z >= threshold)
- mu
mean signal (background estimate)
- sigma
SD of signal
Examples
bw_path <- make_example_bigwig()
regions <- GenomicRanges::GRanges(
"chr1", IRanges::IRanges(
start = c(1000L, 5000L, 10000L, 20000L, 50000L),
end = c(2000L, 6000L, 11000L, 21000L, 51000L)
)
)
accessible <- call_accessible_regions(bw_path, regions, z_threshold = 1)
file.remove(bw_path)
#> [1] TRUE