Linear Methods for Classification

Linear Methods for Classification

LDA and logistic regression use the same underlying linear model.
For LDA:

logPr(G=k|X=x)Pr(G=K|X=x)=log(π1π0)12<μ0+μ1,1(μ1μ0)>+<x,1(μ1μ0)>=α0+<α,x>

For Logistic Regression:

logPr(G=k|X=x)Pr(G=K|X=x)=β0+<β,x>

Both models differ in the way they estimate the parameters.
LDA maximizes the complete likelihood:
Alt text

While logistic regression maximizes the conditional likelihood only:
Alt text

That is these two methods do not differ in their functional forms. The difference rather lies in the estimation of the coefficients.

When the covariates are simulated from the normal distribution, LDA of course seems to be the more appropriate method. However, the results of the two methods are really close when the sample size is large.

From Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation

The main differences can be observed for small samples, as their distributions vary too much for the LR to be able to give good results. On the other hand, LDA assumes normality. The errors it makes in prediction are only due to the errors in estimation of the mean and the variance on the sample.

Whenever the explanatory variables are not normally distributed, the usage of LDA is theoretically wrong, as the assumptions are violated. The goodness-of-fit is therefore only more or less coincidental. On the other hand, the LR fits well to many types of distribution.

From What is the difference between Logistic Regression and Discriminant Analysis?

For both methods the categories in the outcome (i.e. the dependent variable) must be mutually exclusive. One of the ways to determine whether to use logistic regression or discriminant analysis in the cases where there are more than two groups in the dependant variable is to analyze the assumptions pertinent to both methods. The logistic regression is much more relaxed and flexible in its assumptions than the discriminant analysis. Unlike the discriminant analysis, the logistic regression does not have the requirements of the independent variables to be normally distributed, linearly related, nor equal variance within each group (Tabachnick and Fidell, 1996, p575). Being free from the assumption of the discriminant analysis, posits the logistic regression as a tool to be used in many situations. However, “when [the] assumptions regarding the distribution of predictors are met, discriminant function analysis may be more powerful and efficient analytic strategy" (Tabachnick and Fidell, 1996, p579).

Even though the logistic regression does not have many assumptions, thus usable in more instances, it does require larger sample size, at least 50 cases per independent variable might be required for an accurate hypothesis testing, especially when the dependant variable has many groups (Grimm and Yarnold, p. 221). However, given the same sample size, if the assumptions of multivariate normality of the independent variables within each group of the dependant variable are met, and each category has the same variance and covariance for the predictors, the discriminant analysis might provide more accurate classification and hypothesis testing (Grimm and Yarnold, p.241). The rule of thumb though is to use logistic regression when the dependant variable is dichotomous and there are enough samples. [194:604]

Reference:

  1. Leature:Logistic Regression and LDA
  2. Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation Study
Linear%20Methods%20for%20Classification%20%20%20%0A%3D%3D%3D%3D%20%20%20%0A@%28ir%29%5Bpublished%7Cmachine%20learning%5D%20%20%20%0ALDA%20and%20logistic%20regression%20use%20the%20***same***%20underlying%20linear%20model.%20%20%0AFor%20LDA%3A%20%20%20%0A%24%24%0A%20%20%20%20%5Ctext%7Blog%7D%5Cfrac%7B%5Ctext%7BPr%7D%28G%3Dk%7CX%3Dx%29%7D%7B%5Ctext%7BPr%7D%28G%3DK%7CX%3Dx%29%7D%20%3D%20%5C%5C%0A%20%20%20%20%5Ctext%7Blog%7D%28%5Cfrac%7B%5Cpi_1%7D%7B%5Cpi_0%7D%29%20-%20%5Cfrac%7B1%7D%7B2%7D%3C%5Cmu_0%20+%20%5Cmu_1%2C%20%5Csum%5E%7B-1%7D%28%5Cmu_1-%5Cmu_0%29%3E%20+%20%3Cx%2C%20%5Csum%5E%7B-1%7D%28%5Cmu_1-%5Cmu_0%29%3E%20%3D%20%5C%5C%20%20%0A%20%20%20%20%0A%20%20%20%20%5Calpha_0%20+%20%3C%5Calpha%2Cx%3E%0A%24%24%20%20%20%20%0A%0AFor%20Logistic%20Regression%3A%20%20%0A%24%24%0A%20%20%20%20%20%5Ctext%7Blog%7D%5Cfrac%7B%5Ctext%7BPr%7D%28G%3Dk%7CX%3Dx%29%7D%7B%5Ctext%7BPr%7D%28G%3DK%7CX%3Dx%29%7D%20%3D%20%0A%20%20%20%20%20%5Cbeta_0%20+%20%3C%5Cbeta%2C%20x%3E%0A%24%24%20%20%20%0ABoth%20models%20***differ***%20in%20the%20way%20they%20estimate%20the%20parameters.%20%20%0ALDA%20maximizes%20the%20**complete%20likelihood**%3A%20%20%20%0A%21%5BAlt%20text%5D%28./1407337955122.png%29%0A%0AWhile%20logistic%20regression%20maximizes%20the%20**conditional%20likelihood**%20only%3A%20%20%0A%21%5BAlt%20text%5D%28./1407337967551.png%29%0A%0AThat%20is%20these%20two%20methods%20do%20not%20differ%20in%20their%20functional%20forms.%20The%20difference%20rather%20lies%20in%20***the%20estimation%20of%20the%20coefficients***.%20%20%0A%0A%3EWhen%20the%20covariates%20are%20simulated%20from%20the%20normal%20distribution%2C%20LDA%20of%20course%20seems%20to%20be%20the%20more%20appropriate%20method.%20However%2C%20the%20results%20of%20the%20two%20methods%20are%20really%20close%20when%20the%20sample%20size%20is%20large.%20%20%20%0A%0AFrom%20%5BComparison%20of%20Logistic%20Regression%20and%20Linear%20Discriminant%20Analysis%3A%20A%20Simulation%5D%28http%3A//mrvar.fdv.uni-lj.si/pub/mz/mz1.1/pohar.pdf%29%0A%3EThe%20main%20differences%20can%20be%20observed%20for%20small%20samples%2C%20as%20their%20distributions%20vary%20too%20much%20for%20the%20LR%20to%20be%20able%20to%20give%20good%20results.%20On%20the%20other%20hand%2C%20LDA%20assumes%20normality.%20The%20errors%20it%20makes%20in%20prediction%20are%20only%20due%20to%20the%20errors%20in%20estimation%20of%20the%20mean%20and%20the%20variance%20on%20the%20sample.%20%20%0A%0A%3EWhenever%20the%20explanatory%20variables%20are%20not%20normally%20distributed%2C%20the%20usage%20of%20LDA%20is%20theoretically%20wrong%2C%20as%20the%20assumptions%20are%20violated.%20The%20goodness-of-fit%20is%20therefore%20only%20more%20or%20less%20coincidental.%20On%20the%20other%20hand%2C%20the%20LR%20fits%20well%20to%20many%20types%20of%20distribution.%20%20%0A%0AFrom%20%5BWhat%20is%20the%20difference%20between%20Logistic%20Regression%20and%20Discriminant%20Analysis%3F%5D%28http%3A//www.kmentor.com/socio-tech-info/2003/12/what-is-the-difference-between.html%29%20%20%0A%3E%20For%20both%20methods%20the%20categories%20in%20the%20outcome%20%28i.e.%20the%20dependent%20variable%29%20must%20be%20mutually%20exclusive.%20One%20of%20the%20ways%20to%20determine%20whether%20to%20use%20logistic%20regression%20or%20discriminant%20analysis%20in%20the%20cases%20where%20there%20are%20more%20than%20two%20groups%20in%20the%20dependant%20variable%20is%20to%20analyze%20the%20assumptions%20pertinent%20to%20both%20methods.%20The%20logistic%20regression%20is%20much%20more%20relaxed%20and%20flexible%20in%20its%20assumptions%20than%20the%20discriminant%20analysis.%20Unlike%20the%20discriminant%20analysis%2C%20**the%20logistic%20regression%20does%20not%20have%20the%20requirements%20of%20the%20independent%20variables%20to%20be%20normally%20distributed%2C%20linearly%20related%2C%20nor%20equal%20variance%20within%20each%20group%20%28Tabachnick%20and%20Fidell%2C%201996%2C%20p575%29**.%20Being%20free%20from%20the%20assumption%20of%20the%20discriminant%20analysis%2C%20posits%20the%20logistic%20regression%20as%20a%20tool%20to%20be%20used%20in%20many%20situations.%20However%2C%20%u201Cwhen%20%5Bthe%5D%20assumptions%20regarding%20the%20distribution%20of%20predictors%20are%20met%2C%20discriminant%20function%20analysis%20may%20be%20more%20powerful%20and%20efficient%20analytic%20strategy%u201D%20%28Tabachnick%20and%20Fidell%2C%201996%2C%20p579%29.%0A%0A%3EEven%20though%20the%20logistic%20regression%20does%20not%20have%20many%20assumptions%2C%20thus%20usable%20in%20more%20instances%2C%20**it%20does%20require%20larger%20sample%20size%2C%20at%20least%2050%20cases%20per%20independent%20variable%20might%20be%20required%20for%20an%20accurate%20hypothesis%20testing%2C%20especially%20when%20the%20dependant%20variable%20has%20many%20groups%20%28Grimm%20and%20Yarnold%2C%20p.%20221%29.**%20However%2C%20given%20the%20same%20sample%20size%2C%20if%20the%20assumptions%20of%20multivariate%20normality%20of%20the%20independent%20variables%20within%20each%20group%20of%20the%20dependant%20variable%20are%20met%2C%20and%20each%20category%20has%20the%20same%20variance%20and%20covariance%20for%20the%20predictors%2C%20the%20discriminant%20analysis%20might%20provide%20more%20accurate%20classification%20and%20hypothesis%20testing%20%28Grimm%20and%20Yarnold%2C%20p.241%29.%20The%20rule%20of%20thumb%20though%20is%20to%20use%20logistic%20regression%20when%20the%20dependant%20variable%20is%20dichotomous%20and%20there%20are%20enough%20samples.%20%5B194%3A604%5D%0A%0AReference%3A%20%20%0A1.%20%5BLeature%3ALogistic%20Regression%20and%20LDA%5D%28http%3A//www.ismll.uni-hildesheim.de/lehre/ml-07w/skript/ml-2up-02-logisticregression.pdf%29%20%20%20%20%0A2.%20%5BComparison%20of%20Logistic%20Regression%20and%20Linear%20Discriminant%20Analysis%3A%20A%20Simulation%20Study%5D%28http%3A//mrvar.fdv.uni-lj.si/pub/mz/mz1.1/pohar.pdf%29%0A%0A


comments powered by Disqus