Analyze statistical significance of TF co-binding at enhanceosome regions
Source:R/predictors.R
analyze_tf_cobinding.RdPerforms pairwise statistical testing of transcription factor co-occurrence at enhanceosome regions using Fisher's exact test or permutation testing, with odds ratios, Pointwise Mutual Information (PMI), and hierarchical clustering.
Usage
analyze_tf_cobinding(
enhanceosome,
database,
fdr_threshold = 0.05,
min_regions = 5L,
method = c("fisher", "permutation"),
n_permutations = 1000L
)Arguments
- enhanceosome
epiRomics class database containing enhanceosome calls
- database
epiRomics class database containing all data initially loaded
- fdr_threshold
numeric, FDR threshold for significance (default: 0.05)
- min_regions
integer, minimum number of co-bound regions to report a pair (default: 5)
- method
character, statistical method: "fisher" (default) for Fisher's exact test or "permutation" for permutation-based testing that accounts for spatial autocorrelation.
- n_permutations
integer, number of permutations when
method = "permutation"(default: 1000). Ignored for Fisher's test.
Value
list with components:
- pairwise
data.frame with columns: tf1, tf2, n_both, n_tf1_only, n_tf2_only, n_neither, odds_ratio, pvalue, fdr, pmi, significant
- presence_matrix
logical matrix (regions x TFs) of binding presence
- clustering
hclust object from hierarchical clustering of TF co-occurrence (Jaccard distance, Ward.D2 linkage)
- tf_names
character vector of TF names analyzed
- n_regions
integer, total number of enhanceosome regions
- method
character, statistical method used
Details
This replaces the previous decision-tree approach
(epiRomics_predictors)
with statistically rigorous co-binding analysis. For each pair of TFs, a 2x2
contingency table is constructed from the enhanceosome presence matrix.
P-values are corrected using Benjamini-Hochberg FDR.
Statistical methods
Fisher's exact test (method = "fisher"): Tests whether two TFs co-occur at enhanceosome regions more (or less) often than expected by chance. Assumes independence between regions. This is the default and is appropriate when regions are largely non-overlapping. Reference: Fisher, R.A. (1922) J Royal Stat Soc.
Permutation test (method = "permutation"): Shuffles TF_B binding labels across regions to generate a null distribution, accounting for spatial autocorrelation between nearby genomic regions. More conservative but robust to violations of independence. Reference: Gel et al. (2016) Bioinformatics 32(2):289-291. "regioneR: an R/Bioconductor package for the association analysis of genomic regions."
Odds ratio: Measures strength of association. OR > 1 indicates co-occurrence; OR < 1 indicates mutual exclusion.
PMI: Pointwise Mutual Information quantifies the degree of association between two TFs:
PMI(A,B) = log2(P(A,B) / (P(A)*P(B))). PMI > 0 indicates co-occurrence; PMI < 0 indicates avoidance. Reference: Church & Hanks (1990) Computational Linguistics.BH-FDR: Benjamini-Hochberg correction controls the false discovery rate across all pairwise tests. Reference: Benjamini & Hochberg (1995) J Royal Stat Soc B.
Note on spatial autocorrelation
Fisher's exact test assumes independence between observations (regions).
Nearby genomic regions may be spatially correlated (e.g., broad TF binding
domains), which can inflate significance. If your enhanceosome regions
contain many closely spaced or overlapping intervals, consider using
method = "permutation" for more conservative p-values.
See also
analyze_tf_overlap for overlap fractions without
significance testing
Examples
db <- make_example_database()
eso <- make_example_enhanceosome(db)
cobinding <- analyze_tf_cobinding(eso, db)
cobinding$pairwise[, c("tf1", "tf2", "odds_ratio", "fdr")]
#> tf1 tf2 odds_ratio fdr
#> 1 TF1 TF2 0 1