Skip to contents

For each genomic region, determines whether it is accessible in each cell type using per-sample z-score thresholding on BigWig ATAC-seq signal. Unlike fold-change approaches (which compare relative enrichment between samples), this method makes an independent binary open/closed call per sample, then classifies regions based on the combination of binary calls.

Usage

classify_celltype_accessibility(
  bw_paths,
  regions,
  z_threshold = 1,
  auto_threshold = FALSE
)

Arguments

bw_paths

named character vector of BigWig file paths. Names are cell-type labels (e.g., c(Alpha = "alpha.atac.bw", Beta = "beta.atac.bw")).

regions

GRanges object of regions to classify.

z_threshold

numeric, z-score cutoff for calling a region as accessible (default: 1.0). Higher values are more stringent. Use plot_signal_histogram to visualize the signal distribution and identify an appropriate threshold for your data.

auto_threshold

logical. If TRUE, automatically detect the optimal threshold from the signal distribution using kernel density valley detection, overriding z_threshold (default: FALSE).

Value

A data.frame with columns:

region_idx

integer, index into input regions

cell_type

character, cell-type label (single name if accessible in only one cell type), "Shared" if accessible in all cell types, or "Closed" if not accessible in any

accessible_in

character, comma-separated list of cell types where the region is accessible (empty string if closed)

The binary accessibility matrix (logical, regions x cell types) is attached as the "accessibility_matrix" attribute.

Details

A region with high signal in BOTH cell types is correctly classified as "Shared" regardless of the magnitude difference between them. This avoids the fold-change pitfall where a region with signal 50 in one cell type and 100 in the other is called "enriched" even though both have accessible chromatin.

Methodology

For each BigWig file independently:

  1. Import mean ATAC signal per region

  2. Compute genome-wide null distribution (mean and SD of all region signals)

  3. Apply z-score threshold: z = (signal - mean) / sd

  4. Regions with z >= threshold are called accessible (open)

The z-score approach normalizes within each sample, making the open/closed call robust to differences in sequencing depth or library complexity between samples (Buenrostro et al. 2013, Nat Methods; Corces et al. 2018, Science).

Examples

bw_path <- make_example_bigwig()
regions <- GenomicRanges::GRanges(
  "chr1", IRanges::IRanges(
    start = c(1000L, 5000L, 10000L, 20000L, 50000L),
    end   = c(2000L, 6000L, 11000L, 21000L, 51000L)
  )
)
ct <- classify_celltype_accessibility(
  bw_paths = c(Alpha = bw_path, Beta = bw_path),
  regions  = regions
)
table(ct$cell_type)
#> 
#> Closed Shared 
#>      4      1 
file.remove(bw_path)
#> [1] TRUE