Skip to contents

Performs pairwise statistical testing of transcription factor co-occurrence at enhanceosome regions using Fisher's exact test or permutation testing, with odds ratios, Pointwise Mutual Information (PMI), and hierarchical clustering.

Usage

analyze_tf_cobinding(
  enhanceosome,
  database,
  fdr_threshold = 0.05,
  min_regions = 5L,
  method = c("fisher", "permutation"),
  n_permutations = 1000L
)

Arguments

enhanceosome

epiRomics class database containing enhanceosome calls

database

epiRomics class database containing all data initially loaded

fdr_threshold

numeric, FDR threshold for significance (default: 0.05)

min_regions

integer, minimum number of co-bound regions to report a pair (default: 5)

method

character, statistical method: "fisher" (default) for Fisher's exact test or "permutation" for permutation-based testing that accounts for spatial autocorrelation.

n_permutations

integer, number of permutations when method = "permutation" (default: 1000). Ignored for Fisher's test.

Value

list with components:

pairwise

data.frame with columns: tf1, tf2, n_both, n_tf1_only, n_tf2_only, n_neither, odds_ratio, pvalue, fdr, pmi, significant

presence_matrix

logical matrix (regions x TFs) of binding presence

clustering

hclust object from hierarchical clustering of TF co-occurrence (Jaccard distance, Ward.D2 linkage)

tf_names

character vector of TF names analyzed

n_regions

integer, total number of enhanceosome regions

method

character, statistical method used

Details

This replaces the previous decision-tree approach (epiRomics_predictors) with statistically rigorous co-binding analysis. For each pair of TFs, a 2x2 contingency table is constructed from the enhanceosome presence matrix. P-values are corrected using Benjamini-Hochberg FDR.

Statistical methods

  • Fisher's exact test (method = "fisher"): Tests whether two TFs co-occur at enhanceosome regions more (or less) often than expected by chance. Assumes independence between regions. This is the default and is appropriate when regions are largely non-overlapping. Reference: Fisher, R.A. (1922) J Royal Stat Soc.

  • Permutation test (method = "permutation"): Shuffles TF_B binding labels across regions to generate a null distribution, accounting for spatial autocorrelation between nearby genomic regions. More conservative but robust to violations of independence. Reference: Gel et al. (2016) Bioinformatics 32(2):289-291. "regioneR: an R/Bioconductor package for the association analysis of genomic regions."

  • Odds ratio: Measures strength of association. OR > 1 indicates co-occurrence; OR < 1 indicates mutual exclusion.

  • PMI: Pointwise Mutual Information quantifies the degree of association between two TFs: PMI(A,B) = log2(P(A,B) / (P(A)*P(B))). PMI > 0 indicates co-occurrence; PMI < 0 indicates avoidance. Reference: Church & Hanks (1990) Computational Linguistics.

  • BH-FDR: Benjamini-Hochberg correction controls the false discovery rate across all pairwise tests. Reference: Benjamini & Hochberg (1995) J Royal Stat Soc B.

Note on spatial autocorrelation

Fisher's exact test assumes independence between observations (regions). Nearby genomic regions may be spatially correlated (e.g., broad TF binding domains), which can inflate significance. If your enhanceosome regions contain many closely spaced or overlapping intervals, consider using method = "permutation" for more conservative p-values.

See also

analyze_tf_overlap for overlap fractions without significance testing

Examples

db <- make_example_database()
eso <- make_example_enhanceosome(db)
cobinding <- analyze_tf_cobinding(eso, db)
cobinding$pairwise[, c("tf1", "tf2", "odds_ratio", "fdr")]
#>   tf1 tf2 odds_ratio fdr
#> 1 TF1 TF2          0   1