Thursday, March 15, 2018

An Inequality Between Conditional Cross Entropy and Error Probability

Some time ago I wrote about an inequality between conditional entropy and error probability. Specifically, one can show that if $\hat{x}(Y)$ is the maximum a posteriori estimate of a discrete random variable $X$ given the random observation $Y$, then the probability of making an error is bounded from above by

$$
  P_e = \mathbb{P}(X\neq \hat{x}(Y)) \le 1- 2^{H(X|Y)}.
$$

This inequality is not new. I was not aware, however, of any previous references until I met Igal Sason at the International Zurich Seminar earlier this year. He was speaking about a paper he recently published which includes this inequality (eq. (2)), together with many more general results relating error probability with Arimoto-Renyi conditional entropy.

I was hoping to generalize above inequality to the mismatched case. Specifically, let $p_{X|Y}$ be the conditional distribution of $X$ given $Y$. The MAP estimator is $\hat{x}(y) = \arg\max_x p_{X|Y}(x|y)$. Now suppose that the MAP estimator does not have access to $p_{X|Y}$ but only to a surrogate distribution $q_{X|Y}$, in which case the estimator becomes $\tilde{x}(y) = \arg\max_x q_{X|Y}(x|y)$. I thought to get a bound similar to the one above that includes this surrogate distribution and that bounds the error probability $\tilde{P}_e = \mathbb{P}(X\neq \tilde{x}(Y))$. To be more precise, I was hoping to exchange the conditional entropy above by some conditional cross entropy, i.e., to obtain a bound similar to

$$
\tilde{P}_e \le 1- 2^{\mathbb{E}(H^\times(p_{X|Y}(\cdot|Y)\Vert q_{X|Y}(\cdot|Y)))}
$$

possibly with the order of the distributions reversed. Since cross entropy is an upper bound on entropy, the bound will be larger than the bound for $P_e$. (Interestingly, even if $p_{X|Y}$ and $q_{X|Y}$ are different, we nay have $\tilde{P}_e =P_e$: It suffices that, for every $y$, both conditional distributions have their modes at the same values of $x$.)

Unfortunately, my hope was not justified. We show this in a counterexample, i.e., we show that there exists distributions such that the inequality on $\tilde{P}_e$ is violated (for any choice of order of distributions). For the sake of simplicity, we now look at guessing rather than estimation, i.e., the conditional distributions satisfy $p_{X|Y}=p_X$ and $q_{X|Y}=q_{X}$. Now suppose that $X$ is binary, i.e., it takes values 0 and 1. Furthermore, let $p_X(0)=q_X(1)=\epsilon\ll 0.5$. We have that $\tilde{x}=\arg\max_x q_X(x) = 0$, thus

$$\tilde{P}_e=\mathbb{P}(X\neq 0)=1-\epsilon.$$

Since $p_X$ and $q_X$ are permutations of each other, we have that $H(q_X)=H(p_X)$, that $D(p_X\Vert q_X) = D(q_X\Vert p_X)$, and that thus

$$
H^\times(p_X\Vert q_X) = H^\times(q_X\Vert p_X) = -\epsilon\log(1-\epsilon)-(1-\epsilon)\log\epsilon = -\log \epsilon - \epsilon\log\left(\frac{1-\epsilon}{\epsilon}\right).
$$

With this we get

$$
2^{-H^\times(p_X\Vert q_X)} = \epsilon\cdot 2^{\epsilon\log\left(\frac{1-\epsilon}{\epsilon}\right)} > \epsilon
$$

and thus the bound $\tilde{P}_e \le 1- 2^{-H^\times(p_X\Vert q_X)}$ cannot hold.