Basic Concepts in Information Theory

Basic Concepts in Information Theory

摘要: 本文主要记录熵、相对熵、互信息以及交叉熵等概念. 因为这些概念在机器学习以及数据挖掘,例如信息检索在做文本特征选择的时候,反复用到,因此在这里做个记录.


熵和信息量

熵用于衡量随机变量组成的序列的随机性(不确定性或者不可预测性). 离散分布的熵定义为:

H=mi=1Pilog2Pi

是一个关于p的concave function. 对数以2为底表示将这个随机序列编码需要的比特数.
需要特别注意的是对于离散随机变量,熵的值并不依赖于变量本身,而仅仅依赖于这些变量的概率.
可以自然的拓展到多变量的联合熵,joint entropy. 假设(x,y)服从分布p(x,y),那么(x,y)的联合熵定义为:

H(p)=xXyYp(x,y)logp(x,y)

条件熵(conditional entropy) 表示已知一个变量的信息时,另一个变量的熵:

H(Y|X)=xXp(x) H(Y|x)=xXyYp(x)p(y|x)logp(y|x)=xXyYp(x,y)logp(y|x)

对于连续的情况,熵的定义为:

H=p(x)lnp(x)dx

以下概念均以熵为基础概念.

相对熵

假设对同一个离散随机变量x,有两种不同形式的离散概率分布p(x)q(x). 为了衡量x在不同分布下的距离,定义相对熵:

DKL(p(x),q(x))=xq(x)lnq(x)p(x)

相对熵也称为KL(Kullback-Leibler)距离. 然而相对熵并不是一个真正的metric,如上式,DKL不满足对称性和三角不等式.

互信息

假设有两个不同变量的概率分布p(x)q(y). 互信息是指在获取一个变量的信息后,另一个变量的不确定性的减少量:

I(p;q)=H(p)H(p|q)=x,yr(x,y)log2r(x,y)p(x)q(y)

r(x,y)x,y的联合概率分布函数. 结合相对熵,可以看到,互信息是联合概率分布r(x,y)与概率分布乘积p(x)q(y)的相对熵, 表示x,y的分布与统计独立的差异度. 互信息也称为信息增益(information gain)

交叉熵

假设x的真实分布是p(x): x ~ p(x), 而模型认为x服从另一种分布q(x),那么定义于Xq之间的交叉熵为:

H(X,q)=Ep[H(X)]=H(X)+DKL(p|q)

其中DKL(p|q)表示将x错误的认为服从q(x)分布的代价,这个代价同样用熵表示.

x是离散变量时,H(X,q)=xp(x)logp(x),是连续变量时,H(X,q)=Xp(x)logp(x).

熵之间的关系图:
Alt text
Individual (H(X),H(Y)), joint (H(X,Y)),
and conditional entropies for a pair of
correlated subsystems X,Y
with mutual information I(X; Y).

参考材料

  1. 模式分类
  2. Information Theory
    CS769 Spring 2010 Advanced Natural Language Processing

  3. 熵的社会学意义
    阮一峰的网络日志

Basic%20Concepts%20in%20Information%20Theory%0A%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%3D%20%20%0A@%28ir%29%5Bpublished%7Cmachine%20learning%5D%0A%0A%u6458%u8981%3A%20%u672C%u6587%u4E3B%u8981%u8BB0%u5F55%u71B5%u3001%u76F8%u5BF9%u71B5%u3001%u4E92%u4FE1%u606F%u4EE5%u53CA%u4EA4%u53C9%u71B5%u7B49%u6982%u5FF5.%20%u56E0%u4E3A%u8FD9%u4E9B%u6982%u5FF5%u5728%u673A%u5668%u5B66%u4E60%u4EE5%u53CA%u6570%u636E%u6316%u6398%uFF0C%u4F8B%u5982%u4FE1%u606F%u68C0%u7D22%u5728%u505A%u6587%u672C%u7279%u5F81%u9009%u62E9%u7684%u65F6%u5019%uFF0C%u53CD%u590D%u7528%u5230%uFF0C%u56E0%u6B64%u5728%u8FD9%u91CC%u505A%u4E2A%u8BB0%u5F55.%0A%0A----------%0A%0A%0A%u71B5%u548C%u4FE1%u606F%u91CF%0A-------------%20%20%0A%u71B5%u7528%u4E8E%u8861%u91CF%u968F%u673A%u53D8%u91CF%u7EC4%u6210%u7684%u5E8F%u5217%u7684%u968F%u673A%u6027%28%u4E0D%u786E%u5B9A%u6027%u6216%u8005%u4E0D%u53EF%u9884%u6D4B%u6027%29.%20%u79BB%u6563%u5206%u5E03%u7684%u71B5%u5B9A%u4E49%u4E3A%3A%20%20%0A%24%24%0AH%20%3D%20-%5Csum_%7Bi%3D1%7D%5EmP_i%5Ctext%7Blog%7D_2%20P_i%0A%24%24%0A%u662F%u4E00%u4E2A%u5173%u4E8E%24p%24%u7684concave%20function.%20%u5BF9%u6570%u4EE52%u4E3A%u5E95%u8868%u793A%u5C06%u8FD9%u4E2A%u968F%u673A%u5E8F%u5217%u7F16%u7801%u9700%u8981%u7684%u6BD4%u7279%u6570.%20%20%20%0A%u9700%u8981%u7279%u522B%u6CE8%u610F%u7684%u662F%u5BF9%u4E8E%u79BB%u6563%u968F%u673A%u53D8%u91CF%uFF0C%u71B5%u7684%u503C%u5E76%u4E0D%u4F9D%u8D56%u4E8E%u53D8%u91CF%u672C%u8EAB%uFF0C%u800C%u4EC5%u4EC5%u4F9D%u8D56%u4E8E%u8FD9%u4E9B%u53D8%u91CF%u7684%u6982%u7387.%20%20%0A%u53EF%u4EE5%u81EA%u7136%u7684%u62D3%u5C55%u5230%u591A%u53D8%u91CF%u7684%u8054%u5408%u71B5%2Cjoint%20entropy.%20%u5047%u8BBE%24%28x%2C%20y%29%24%u670D%u4ECE%u5206%u5E03%24p%28x%2Cy%29%24%2C%u90A3%u4E48%24%28x%2C%20y%29%24%u7684%u8054%u5408%u71B5%u5B9A%u4E49%u4E3A%3A%20%20%20%0A%24%24H%28p%29%3D-%5Csum_%7Bx%5Cin%20X%7D%5Csum_%7By%5Cin%20Y%7Dp%28x%2Cy%29%5Ctext%7Blog%7Dp%28x%2Cy%29%24%24%0A%0A%u6761%u4EF6%u71B5%28conditional%20entropy%29%20%u8868%u793A%u5DF2%u77E5%u4E00%u4E2A%u53D8%u91CF%u7684%u4FE1%u606F%u65F6%uFF0C%u53E6%u4E00%u4E2A%u53D8%u91CF%u7684%u71B5%3A%20%20%0A%24%24%0A%20%20%20%20H%28Y%7CX%29%20%3D%20%5Csum_%7Bx%5Cin%20X%7Dp%28x%29%5Cspace%5Cspace%20H%28Y%7Cx%29%20%3D%20-%5Csum_%7Bx%5Cin%20X%7D%5Csum_%7By%20%5Cin%20Y%7Dp%28x%29p%28y%7Cx%29%5Ctext%7Blog%7Dp%28y%7Cx%29%20%5C%5C%20%20%0A%20%20%20%20%3D%20-%5Csum_%7Bx%5Cin%20X%7D%5Csum_%7By%5Cin%20Y%7Dp%28x%2C%20y%29%5Ctext%7Blog%7Dp%28y%7Cx%29%0A%24%24%0A%u5BF9%u4E8E%u8FDE%u7EED%u7684%u60C5%u51B5%uFF0C%u71B5%u7684%u5B9A%u4E49%u4E3A%3A%20%20%20%0A%24%24%0A%09H%20%3D%20%5Cint_%7B-%5Cinfty%7D%5E%7B%5Cinfty%7Dp%28x%29%20%5Ctext%7Bln%7Dp%28x%29%5Ctext%7Bd%7Dx%0A%24%24%20%20%0A%u4EE5%u4E0B%u6982%u5FF5%u5747%u4EE5%u71B5%u4E3A%u57FA%u7840%u6982%u5FF5.%20%20%0A%0A%u76F8%u5BF9%u71B5%20%20%0A--------%0A%u5047%u8BBE%u5BF9%u540C%u4E00%u4E2A%u79BB%u6563%u968F%u673A%u53D8%u91CF%24x%24%2C%u6709%u4E24%u79CD%u4E0D%u540C%u5F62%u5F0F%u7684%u79BB%u6563%u6982%u7387%u5206%u5E03%24p%28x%29%24%u548C%24q%28x%29%24.%20%u4E3A%u4E86%u8861%u91CFx%u5728%u4E0D%u540C%u5206%u5E03%u4E0B%u7684%u8DDD%u79BB%uFF0C%u5B9A%u4E49%u76F8%u5BF9%u71B5%3A%20%20%0A%24%24%0A%20%20%20%20D_%7BKL%7D%28p%28x%29%2C%20q%28x%29%29%20%3D%20%5Csum_xq%28x%29%5Ctext%7Bln%7D%5Cfrac%7Bq%28x%29%7D%7Bp%28x%29%7D%0A%24%24%20%0A%u76F8%u5BF9%u71B5%u4E5F%u79F0%u4E3AKL%28Kullback-Leibler%29%u8DDD%u79BB.%20%u7136%u800C%u76F8%u5BF9%u71B5%u5E76%u4E0D%u662F%u4E00%u4E2A%u771F%u6B63%u7684metric%uFF0C%u5982%u4E0A%u5F0F%uFF0C%24D_%7BKL%7D%24%u4E0D%u6EE1%u8DB3%u5BF9%u79F0%u6027%u548C%u4E09%u89D2%u4E0D%u7B49%u5F0F.%20%20%0A%0A%u4E92%u4FE1%u606F%20%20%0A--------%0A%u5047%u8BBE%u6709%u4E24%u4E2A%u4E0D%u540C%u53D8%u91CF%u7684%u6982%u7387%u5206%u5E03%24p%28x%29%24%u548C%24q%28y%29%24.%20%u4E92%u4FE1%u606F%u662F%u6307%u5728%u83B7%u53D6%u4E00%u4E2A%u53D8%u91CF%u7684%u4FE1%u606F%u540E%uFF0C%u53E6%u4E00%u4E2A%u53D8%u91CF%u7684%u4E0D%u786E%u5B9A%u6027%u7684%u51CF%u5C11%u91CF%3A%20%20%0A%24%24%0A%20%20%20%20I%28p%3B%20q%29%20%3D%20H%28p%29%20-%20H%28p%7Cq%29%20%3D%20%5Csum_%7Bx%2C%20y%7Dr%28x%2C%20y%29%5Ctext%7Blog%7D_2%5Cfrac%7Br%28x%2C%20y%29%7D%7Bp%28x%29q%28y%29%7D%0A%24%24%20%20%0A%24r%28x%2C%20y%29%24%u4E3A%24x%2C%20y%24%u7684%u8054%u5408%u6982%u7387%u5206%u5E03%u51FD%u6570.%20%u7ED3%u5408%u76F8%u5BF9%u71B5%uFF0C%u53EF%u4EE5%u770B%u5230%uFF0C%u4E92%u4FE1%u606F%u662F%u8054%u5408%u6982%u7387%u5206%u5E03%24r%28x%2C%20y%29%24%u4E0E%u6982%u7387%u5206%u5E03%u4E58%u79EF%24p%28x%29q%28y%29%24%u7684%u76F8%u5BF9%u71B5%2C%20%u8868%u793A%24x%2C%20y%24%u7684%u5206%u5E03%u4E0E%u7EDF%u8BA1%u72EC%u7ACB%u7684%u5DEE%u5F02%u5EA6.%20%u4E92%u4FE1%u606F%u4E5F%u79F0%u4E3A%u4FE1%u606F%u589E%u76CA%28information%20gain%29%20%20%20%0A%0A%u4EA4%u53C9%u71B5%20%20%0A--------%0A%u5047%u8BBE%24x%24%u7684%u771F%u5B9E%u5206%u5E03%u662F%24p%28x%29%24%3A%20%24x%20%5Ctext%7B%20%7E%20%7D%20p%28x%29%24%2C%20%u800C%u6A21%u578B%u8BA4%u4E3A%24x%24%u670D%u4ECE%u53E6%u4E00%u79CD%u5206%u5E03%24q%28x%29%24%uFF0C%u90A3%u4E48%u5B9A%u4E49%u4E8E%24X%24%u548C%24q%24%u4E4B%u95F4%u7684%u4EA4%u53C9%u71B5%u4E3A%3A%20%20%0A%24%24%0A%20%20%20%20H%28X%2Cq%29%20%20%3D%20%5Ctext%7BE%7D_p%5BH%28X%29%5D%20%3D%20H%28X%29%20+%20D_%7BKL%7D%28p%7Cq%29%0A%24%24%20%20%0A%u5176%u4E2D%24D_%7BKL%7D%28p%7Cq%29%24%u8868%u793A%u5C06%24x%24%u9519%u8BEF%u7684%u8BA4%u4E3A%u670D%u4ECE%24q%28x%29%24%u5206%u5E03%u7684%u4EE3%u4EF7%uFF0C%u8FD9%u4E2A%u4EE3%u4EF7%u540C%u6837%u7528%u71B5%u8868%u793A.%20%20%0A%0A%u5F53%24x%24%u662F%u79BB%u6563%u53D8%u91CF%u65F6%2C%24H%28X%2Cq%29%20%3D%20-%5Csum_xp%28x%29%5Ctext%7Blog%7Dp%28x%29%24%uFF0C%u662F%u8FDE%u7EED%u53D8%u91CF%u65F6%2C%24H%28X%2Cq%29%20%3D%20-%5Cint_%7B-X%7Dp%28x%29%5Ctext%7Blog%7Dp%28x%29%24.%0A%0A%0A%u71B5%u4E4B%u95F4%u7684%u5173%u7CFB%u56FE%3A%20%20%0A%21%5BAlt%20text%5D%28./entropy.jpg%29%0A%20%20%20%20%20%20%20%20Individual%20%28H%28X%29%2CH%28Y%29%29%2C%20joint%20%28H%28X%2CY%29%29%2C%20%0A%20%20%20%20%20%20%20%20and%20conditional%20entropies%20for%20a%20pair%20of%20%0A%20%20%20%20%20%20%20%20correlated%20subsystems%20X%2CY%20%0A%20%20%20%20%20%20%20%20with%20mutual%20information%20I%28X%3B%20Y%29.%20%20%0A%20%20%20%20%20%20%20%20%0A%u53C2%u8003%u6750%u6599%20%20%20%0A-----%0A1.%20%5B%u6A21%u5F0F%u5206%u7C7B%5D%28http%3A//book.douban.com/subject/1138189/%29%0A2.%20%5BInformation%20Theory%5D%28http%3A//pages.cs.wisc.edu/%7Ejerryzhu/cs769/info.pdf%29%0ACS769%20Spring%202010%20Advanced%20Natural%20Language%20Processing%20%20%0A%0A3.%20%5B%u71B5%u7684%u793E%u4F1A%u5B66%u610F%u4E49%5D%28http%3A//www.ruanyifeng.com/blog/2013/04/entropy.html%29%20%20%0A%u962E%u4E00%u5CF0%u7684%u7F51%u7EDC%u65E5%u5FD7


comments powered by Disqus