Linear Methods for Regression
Linear Methods for Regression
A linear regression model assumes that the regression function E[Y|X] is linear in the inputs X1,....,Xp.
在具体指定模型之前,使用E[Y|X]泛指判别模型,生成模型表示为E[Y,X].
生成模型有种创造万物的上帝的感觉,但许多任务,找到合适的generative process非常困难,而去“识别判断"它很容易,这就像是一个做一个好作家很难,但是能识别好的作品容易得多.
Linear Regression假设E[Y|X]是线性的,或者至少是近似线性的,具体形式化为:
损失函数定义为:
Linear Regression的损失函数是在特征空间中找到最小化训练集误差的超平面:f(X;β).
假设XTX是满秩矩阵,那么对RSS(β)求导并使其为0求最小值,可得到:XT(y−Xβ)=0,β^=(XTX)−1XTy. 从XT(y−Xβ)=0又可以得到XT⊥(y−Xβ).
Assume the deviations of Y around its expectation are additive and Gaussian, then:
其中,ε∼N(0,σ2)
那么:
- E(β^)=E[(XTX)−1XTy]=(XTX)−1XTE[y]
⟹E[β^]=(XTX)−1XTXβ=β - β^−E[β^]=(XTX)−1XTy−(XTX)−1XTXβ
⟹((XTX)−1XT)(y−Xβ)=(XTX)−1XTε - Var(β^)=E[(β^−E[β^])(β^−E[β^])T]
⟹E[(XTX)−1XT(εε)TX(XTX)−1]
⟹(XTX)−1XTE[(εε)T]X(XTX)−1
⟹(XTX)−1XTVar(ε)X(XTX)−1
⟹σ2(XTX)−1XTX(XTX)−1
⟹σ2(XTX)−1
所以 β^∼N(β,(XTX)−1σ2)
通常,σ2由无偏估计(E[σ2^]=σ2)得到:
稍做变换:(N−p−1)σ2^=∑Ni=1(yi−yi^)2=σ2χ2N−p−1
Under the null hypothesis that βj=0,zj is distributed as tN−p−1 and hence a large (absolute) value of zj will lead to rejectoin of this null hypothesis.
Page 47:
The variance-covariance matrix of the least squares parameter estimates is easily derived( from(3.6) and is given by
Typically one estimates the variance σ2 by
拓展到多类问题时,Linear model形式化为:
进一步,用矩阵表示为:
那么多类问题的损失函数为:
估计的参数结果为:
Least squares estimates of the parameter β have the smallest variance among all linear unbiased estimates.
- Best-subset,暴力方法,局限问题是计算复杂度,subset size怎么选择的问题,最终目的是最小化期望误差,但是实际中使用交叉验证或者AIC. higher variance.
- Forward stepwise, 贪心策略,计算复杂度较低,lower variance,使用情况多.
- Forward stagewise
以上特征子集选择方法是一种对特征采用0/1编码的选择方法,通常具有比较高的variance, 而Shrinkages更加平滑,相对的variance较低.
Ridge regression does a proportional shrinkage. Lasso translates each coefficient by a constant factor λ, truncating at zero. This is called "soft thresholding".
Best-subset selection drops all variables with coefficients smaller than the Mth largest; this is a form of "hard-thresholding."
references:
Chapter 3: Linear Methods for Regression
Regression (statistics): What is Least Angle Regression and when should it be used?
More Notes for Linear Regression
统计学习那些事
Bias of an estimator
Random Vectors and the Variance-Covariance Matrix
LaTeX:Symbols
Mean Vector and Covariance Matrix
(3.11) chi-squared distribution:
包含定义和一个简单示例,不包括chi-squared distribution的性质.
Definition
Applications
(3.12) Z-score:
介绍Z-score的定义和用法.
Standard Score
Hypothesis Testing
(3.12)Z-score:
统计建模与R软件-多重共线性