If Entropy is seen as number of binary questions required to reach the answer. KL-distance can be described as extra number of questions required if you assume the wrong distribution.
For example, Let us A,B,C,D actually occur with probability p = 1/4,1/4,1/4,1/4
The number of questions actually required is just 2. ( 1st question - Is it (A/B) or (C,D); 2nd question - If (A/B), is it A)
Say if you assume the distribution wrongly as q = 1/2, 1/4, 1/8, 1/8, then how many extra questions on an average would you end up asking is what KL distance is about.
So, if the distribution is 1/2, 1/4, 1/8, 1/8 of A,B,C,D. This is how you would proceed to calculate entropy or average number of binary questions.
Half the times you will receive yes. So average number of questions is 1/2 x 1 = 0.5
You will receive yes half the times. But this is when you received No in first question, whose chances of occuring are again 1/2. So, average number of questions = 1/2 x 1/2 x 2 = 0.5
You will have to ask this question if question 1 is answered no(probability = 1/2) and question 2 is also answered no(probability = 1/2). So, average number of questions on = 1/2 x 1/2 x 3 = 0.75
Total average number of questions = 0.5 + 0.5 + 0.75 = 1.75 questions.
But note that this is NOT the original distribution, which is p = 1/4,1/4,1/4,1/4. You are assuming the distribution wrongly as q = 1/2, 1/4, 1/8, 1/8
So, let us calculate how many average binary questions you would end up asking if you assume the wrong distribution.
You will receive yes only one-fourth of times(as opposed to previous case - 1/2). So, average number of questions = 0.25 questions.