Build epiRomics database — build_database • epiRomics

Reads epigenomic annotation files from a CSV manifest and builds a unified epiRomics database for downstream analysis. Supports optional extra columns for ChIP/histone peak files (signal, pval, qval, peak).

Usage

build_database(
  db_file,
  txdb_organism,
  genome,
  organism,
  extraCols = NULL,
  data_dir = NULL
)

Arguments

db_file: character string of path to properly formatted csv file containing epigenetic data. [See vignette for more details]
txdb_organism: a character string containing the TxDB associated with your data.
genome: a character string naming the genome assembly associated with your data (e.g. "mm10", "hg38", "rn6", "dm6"). The value must match the assembly referenced by txdb_organism / the CSV manifest's genome column; epiRomics itself is organism-agnostic.
organism: a character string containing the org.db associated with your data.
extraCols: named character vector of extra columns to read from chip/histone BED files. Default is NULL (no extra columns). Set to c(signal = "numeric", pval = "numeric", qval = "numeric", peak = "numeric") to read narrowPeak columns.
data_dir: optional character string specifying the root directory for resolving relative file paths in the CSV manifest. When provided, any relative path in the path column is prefixed with data_dir. This is especially useful with cached data from cache_data where the CSV uses relative paths. Default is NULL (paths used as-is).

Value

Variable of class epiRomics for further downstream analysis

Examples

## build_database reads external BED/BigWig files from a CSV manifest.
## Confirm that a missing file produces a clean error:
tryCatch(
  build_database("nonexistent.csv",
    txdb_organism = paste0("TxDb.Hsapiens.UCSC.hg38.knownGene::",
                           "TxDb.Hsapiens.UCSC.hg38.knownGene"),
    genome = "hg38", organism = "org.Hs.eg.db"),
  error = function(e) message(e$message)
)
#> build_database: The following files do not exist: nonexistent.csv