Testing Conditional Mutual Information in Discrete Distributions

The basic model in distribution testing is that we receive independent and identically distributed samples of an unknown (discrete) distribution, and aim to decide whether it possesses a certain property or not. Here we are interested in how many samples are required to reliably answer fundamental questions about the relationship between distributions over multiple variables by testing whether there are (conditional) correlations between them.

A natural way to formalize this in an information-theoretic way is to ask whether two variables of a bipartite distribution are independent or have a mutual information $I {(A : C)}_{P}$ above some threshold $ε$ . The mutual information is zero exactly when $P_{A C} = P_{A} P_{C}$ , i.e., when there are no correlations between $A$ and $C$ . If the system has at least three variables, we may also ask whether these are conditionally independent, or have conditional mutual information $I {(A : C | B)}_{P} \geq ε$ . Conditional independence intuitively means that $A$ and $C$ may be correlated, but their correlations are due to $B$ , such that observed correlations between $A$ and $C$ can be explained without requiring ‘interactions’ between $A$ and $C$ . It is not difficult to show that this is the case if and only if $P_{A B C} = Q_{A B C} := P_{A B} P_{B C} / P_{B}$ (on the support of $P_{B}$ ). Recall that $I {(A : C | B)}_{P} := D_{K L} (P_{A B C} ‖ P_{A B} P_{B C} / P_{B})$ .

A related problem has previously been studied by Canonne, Diakonikolas, Kane, and Stewart (STOC 2018), where the farness guarantee was expressed using the so-called total variation distance instead of the conditional mutual information. This algorithm might be applied to our problem of (conditional) mutual information testing, however it is not sample optimal for this task.

Our algorithm aims to reduce the problem to instances of equivalence testing, that is, testing whether two unknown distributions are equal or not. We first simulate samples from $Q_{A B C}$ and compare it to $P_{A B C}$ . The testing itself uses a reduction to $ℓ_{2}$ -distance, pioneered by Diakonikolas and Kane (FOCS 2016), in which the domain $𝒟 = A \times B \times C$ is partitioned into subsets ${S_{i}}$ , based on the weight of $Q_{A B C}$ . This allows us to tightly link guarantees in the $ℓ_{2}$ -distance to the conditional mutual information. The subsets are then tested separately. Our partitioning is tailored to also take the specific structure of $Q_{A B C} = P_{A B} P_{B C} / P_{B}$ into account, to be even more efficient than general equivalence testing.

However, a major obstacle is sampling from $Q_{A B C}$ : while it turns out that for ‘large’ $P_{B} (b)$ , this is relatively straightforward, we are not able to perfectly sample from $Q_{A B C}$ for $b$ where $P_{B} (b)$ is ‘small’. Intuitively, this would require the ability to sample from $P_{A C | b}$ . Instead, we try to approximately simulate $Q_{A B C}$ in this regime, but the resulting samples are both correlated and biased, which leads to different regimes. Our sample complexity and the sample complexity by Canonne et al. are shown below, up to polylogarithmic factors. The sizes of the subsystems are given by $d_{A}$ , $d_{B}$ , and $d_{C}$ , respectively. The parameter $ε$ denotes the farness guarantee in conditional mutual information, and $γ$ the corresponding guarantee in the total variation distance problem.

Existing lower bounds on the sample complexity are marked in color (red: derived by Canonne et al., blue: by Seyfried et al., yellow: by Canonne et al. for the special case where $d_A=d_B=d_C$ , $\varepsilon=1$ ). These bounds already proved that the observed structure with various case distinctions is ‘right’ – the five different regimes and nested case distinctions are necessary. However, the exact form of certain terms was not clear. This left the door open for a possible improvement, in particular in the regime where we were not able to reduce the problem to equivalence testing in a direct manner.

In our follow-up work, we showed that the sample complexities derived by Canonne et al. for conditional independence testing and Seyfried et al. for conditional mutual information testing are indeed optimal, up to logarithmic factors. Our proof is based on generalizing the constructions by Canonne et al., which did not fully resolve all regimes.

For further details, please see Seyfried, Sen, and Tomamichel (COLT 2025) for our algorithm to test conditional mutual information, and Seyfried, Mishra, Sen, and Tomamichel (COLT 2026) for the aforementioned lower bounds.

Testing Conditional Mutual Information in Discrete Distributions

Leave a Reply Cancel reply