Statistical experiments are a basis of much of classical statistics and its applications such as hypothesis testing: every hypothesis corresponds to a probability distribution over possible observations, and therefore a statistical experiment is simply a family P=\{p^{\vartheta}\}_{\vartheta\in\Theta} of probability distributions p^{\vartheta} on the same space of outcomes. Given two statistical experiments, one may naturally ask which one is more “informative”, for example by asking which one yields the higher expected payoff in the varied tests one might apply. Thus, the comparison of statistical experiments becomes a highly relevant question, going back to the work by Blackwell. This notion of information order can be reformulated in the following way: an experiment P=\{p^{\vartheta}\}_{\vartheta\in\Theta} is at least as informative as Q=\{q^{\vartheta}\}_{\vartheta\in\Theta} if there is a statistical operator (Markov kernel) T such that Tp^{\vartheta}=q^{\vartheta} for all \vartheta\in\Theta, i.e., all the information in Q is already available in P and this information can be extracted with a single statistical map. We also say that P majorizes Q.
Our work studies statistical experiments consisting of finitely many (d) probability measures on finite outcome spaces and their comparison. Thus, we may model a statistical experiment P=\{p^{(k)}\}_{k=1}^d as a matrix whose columns p^{(k)} are probability vectors. This means that P is a stochastic matrix. A statistical map is now also a stochastic matrix. Finally, a finite experiment P majorizes another finite experiment Q if there is a stochastic matrix T such that Q=TP. This sets up an order among stochastic matrices which is also called matrix majorization. One may also study matrix majorization in the asymptotic and catalytic schemes: The former notions means that we have i.i.d. copies of the experiments P and Q at our disposal and ask whether a high enough number of copies of P can majorize the same number of copies of Q. Formally this means that, for high enough n, there is a stochastic matrix T_n such that T_n \left(p^{(k)}\right)^{\otimes n}=\left(q^{(k)}\right)^{\otimes n} for all k=1,\ldots,d. The second notion means that we have an experiment R=\{r^{(k)}\}_{k=1}^d which catalyses the majorization, i.e., there is a stochastic matrix T such that T\left(p^{(k)}\otimes r^{(k)}\right)=q^{(k)}\otimes r^{(k)}. We have given sufficient and almost necessary conditions for these laxed modes of matrix majorization using the machinery developed earlier by one of the authors, T. Fritz.
We apply the theory of preordered semirings to the problem of asymptotic and catalytic matrix majorization. To this end, we have to build a semiring out of the space of stochastic matrices with d columns by allowing the columns not to sum up to 1 and introducing addition and multiplication compatible with the preorder obtained by extending the matrix majorization order to this larger set. Asymptotic majorization is now described by a family \mathcal{D} of ‘divergences’ D_{\underline{\alpha}} which are multipartite generalizations of the well-known Renyi divergences. Evaluated on a matrix P with columns p^{(k)}=(p^{(k)}_i)_{i=1}^n, k=1,\ldots,d, these divergences have the form
D_{\underline{\alpha}}(P)=\frac{1}{\max_{1\leq k\leq d}\alpha_k - 1}\log{\sum_{i=1}^n\prod_{k=1}^d \left(p^{(k)}_i\right)^{\alpha_k}}where \underline{\alpha}=(\alpha_k)_{k=1}^d\in\mathbb{R}^d is such that \alpha_1+\cdots+\alpha_d=1 and either \alpha_\ell<1 for all \ell=1,\ldots,d or one \alpha_k>1 and the rest \alpha_\ell\leq 0. In addition to these, we have to include certain pointwise limits of these divergences along trajectories in the parameter space \mathbb{R}^d in the family \mathcal{D}.
The family \mathcal{D} of divergences described above gives sufficient conditions for asymptotic and catalytic matrix majorization: A matrix P asymptotically majorizes Q if D(P)>D(Q) for all D\in\mathcal{D}. Moreover, there is a sequence \{Q_k\}_k of matrices of the same size as Q converging to Q such that P catalytically majorizes Q_k for all k if and only if D_{\underline{\alpha}}(P)\geq D_{\underline{\alpha}}(Q) for all the parameter tuples \underline{\alpha} as specified above. Thus, our multi-divergences give necessary and sufficient conditions for approximate catalytic matrix majorization
Setting d=2 in the above theory retrieves known results on relative majorization. Moreover, using the powerful semiring machinery, we may rederive several known results on majorization between individual probability vectors. Recall that a probability vector p majorizes another probability vector q if there is a bistochastic matrix T such that Tp=q. This mode of majorization can also be studied in asymptotic and catalytic settings and it is amenable to our semiring methods which give sufficient conditions for majorization in the form of inequalities involving the Renyi entropies. It remains as an interesting problem to extend our classical results to the quantum regime. This would allow the characterization of relative information content between quantum statistical experiments, i.e., families of quantum states.
For further details, you may the full work in arXiv 2301.07353