Analyze pairwise and multi-way overlap between transcription factor binding sites
Source:R/tf_overlap.R
analyze_tf_overlap.RdFor N TFs in the enhanceosome, computes pairwise overlap fractions, unique region counts per TF, shared region counts, and UpSet-style intersection data for all TF combinations. Uses Jaccard index for symmetric overlap quantification and overlap coefficient for asymmetric assessment (Church & Hanks, 1990). The presence/absence matrix pattern follows the approach used by DiffBind (Stark & Brown, 2011).
Value
list with components:
- overlap_matrix
matrix of pairwise overlap coefficients (asymmetric: fraction of row TF overlapping col TF)
- overlap_counts
matrix of pairwise absolute overlap counts
- jaccard_matrix
matrix of pairwise Jaccard indices (symmetric: intersection/union)
- unique_counts
named integer vector of regions bound by ONLY that TF
- shared_all
integer count of regions bound by ALL TFs
- n_tf_counts
table of how many regions are bound by exactly 1, 2, 3... N TFs
- tf_names
character vector of TF names analyzed
- summary
data.frame with per-TF summary statistics
References
Church KW, Hanks P (1990) Computational Linguistics 16(1):22-29. "Word Association Norms, Mutual Information, and Lexicography." Pointwise mutual information framework adapted for co-binding analysis.
Stark R, Brown GD (2011) DiffBind, Bioconductor. Presence/absence matrix approach for binding site overlap analysis.
Examples
db <- make_example_database()
eso <- make_example_enhanceosome(db)
overlap <- analyze_tf_overlap(eso, db)
overlap$overlap_matrix
#> TF1 TF2
#> TF1 1 0.6667
#> TF2 1 1.0000