Skip to contents

Calculates the mean ATAC/ChIP signal per region from a BigWig file and returns z-scores relative to the distribution across all input regions. The null distribution is estimated from the signal across all regions, assuming most of the genome has low/no signal. Regions with z-scores above the threshold are called as accessible.

Usage

call_accessible_regions(
  bw_path,
  regions,
  z_threshold = 1,
  auto_threshold = FALSE,
  return_scores = FALSE
)

Arguments

bw_path

character, path to a BigWig file.

regions

GRanges object of regions to classify.

z_threshold

numeric, z-score cutoff for calling accessibility (default: 1.0). Regions with z >= threshold are called accessible. Set to NULL to return z-scores without thresholding. Ignored when auto_threshold = TRUE.

auto_threshold

logical. If TRUE, automatically detects the optimal threshold by finding the valley between the noise and signal peaks in the log-transformed signal distribution using kernel density estimation. This is the recommended approach – let the data define background vs. signal rather than using a fixed z-score (default: FALSE).

return_scores

logical. If TRUE, returns a list with z_scores, raw signal, and boolean calls. If FALSE (default), returns a logical vector for backward compatibility.

Value

If return_scores = FALSE (default): a logical vector the same length as regions, TRUE for accessible regions. If return_scores = TRUE: a list with:

z_scores

numeric vector of z-scores

signal

numeric vector of raw mean signal per region

accessible

logical vector (z >= threshold)

mu

mean signal (background estimate)

sigma

SD of signal

Examples

bw_path <- make_example_bigwig()
regions <- GenomicRanges::GRanges(
  "chr1", IRanges::IRanges(
    start = c(1000L, 5000L, 10000L, 20000L, 50000L),
    end   = c(2000L, 6000L, 11000L, 21000L, 51000L)
  )
)
accessible <- call_accessible_regions(bw_path, regions, z_threshold = 1)
file.remove(bw_path)
#> [1] TRUE