Non-negative Matrix Factorization

Non-negative Matrix Factorization

一 问题定义:

给定矩阵ARm×n,αij0以及预先制定的正整数kmin(m,n)通过NMF将A成两个非负矩阵:
WRm×k以及HRk×n, 即:
AWH
k为近似矩阵WH的秩

如下图所示:
Alt text


二 目标函数的形式:

NMF的目标函数表示成非线性优化问题形式:

2.1 可以定义为矩阵范数的形式:

minW,H f(W,H)=12AWH2F
subject to Wia0, Hbj0,i,a,b,j

其中.F 是矩阵的F范数(Frobenius norm,AF=(αij2)12)

2.2 也可以定义成KL散度(Kullback-Leibler divergence)的形式:

minW,H ni=1mj=1(VijlogVij(WH)ijVij+(WH)ij)
subject to Wia0,Hbj0,i,a,b,j

如果Vij=0 或者 (WHij)=0,那么上式不是良好定义的(well-defined).因此,严格来说不能说这样的目标函数形式是bound-constrained problem.所以,本文后面部分仅考虑矩阵范数形式的目标函数.


三 优化算法

通用算法框架如下图的第二个循环部分:
Alt text

3.1 Multiplicative Update Methods

算法伪代码如下图:

Alt text
(本图来自推荐阅读7)

To have Algo1 well-defined, one must ensure that denominators in (4) and (5) are strictly positive. Moreover, if Wkia=0 at the kth iteratoin, then Wia=0 at all subsequent iterations. The following theorem dicscuss when this property holds:

Theorem 1 If V has neither zero column nor row, and W1ia>0 and H1bj>0,i,a,b,j then

Wkia>0 andHkbj>0,i,a,b,j,k1

The overall cost of Algo1 is

iterations×O(nmr)

Notes that, Thought Lee and Seung 2011[1] have shown that the loss functions value is non-increasing afger every update. However, Gonzales and Zhang indicate that this claim is wrong.
Therefore, this multiplicative update method still lacks optimization properties.

3.2 Alternating Least Squares(ALS)

算法伪代码如下:

Alt text

ALS will converge to a fixed point which may be a local extremum or a saddle point. The solution of Eqns. (9) and (10) can, for example, be computed using a QR-factorization or an LU-decomposition, or based on computing the pseudo inverse of H and H, respectively.

Formally, ALS has the following convergence result:
Theorem 2 Any limit point of the sequence Wk,Hk generated by Algo2 is a stationary point of the loss function(1)

In summary, contraty to Algo1, which still lacks convergence results, ALS has nice optimization properties.

3.3 Projected Gradient(PG)

The Projected Gradient bound-constrained optimizaton method used for NMF in two situations:

  1. solving the alternating non-negative least squares problems:
    Analogously to ALS, one factor (W or H) is updated while A and the other factor are kept constant:

    Ht+1=HtαHHf(Wt,Ht)
    Wt+1=WtαWWf(Wt,Ht+1)

    αH and αW are step-size parameters.
    The partial derivatives are Hf(Wt,Ht)=(WT)t(WtHtA)
    and Wf(Wt,Ht+1)=(WtHt+1A)(HT)t+1, respectively.

    This method is computationally very competitive and has better convergence properties than the standard MU approach in many cases.

  2. minimizing the loss function directly:
    In this method, projected gradient is used to directly minimize the loss function. From the current solution (Wt,Ht), both matrices are simultaneously updated to (Wt+1,Ht+1) in the general form:

    (Wt+1,Ht+1)=(Wt,Ht)αH(Wf(Wt,Ht,Hf(Wt,Ht)

3.4 Termination Criteria

  • a fixed number of iterations, the number if problem-dependent.
  • the approximation accuracy of the NMF loss function. This depends on the size and strucure of the data.
  • The relative change of factors W and H from one iteration to the next iteraion. If the change is below some pre-defined threshold δ, then terminates.

source code

  1. LibMF
  2. Python Matrix Factorization Module

推荐阅读:

  1. Seung D, Lee L. Algorithms for non-negative matrix factorization[J]. Advances in neural information processing systems, 2001, 13: 556-562.
  2. Janecek A, Grotthoff S S, Gansterer W N. libNMF--A Library for Nonnegative Matrix Factorization[J]. Computing and Informatics, 2012, 30(2): 205-224.
  3. Non-negative Matrix Factorization (NMF) by Chih-Jen Lin
  4. Gradient Projection and Reduced Gradient Methods
  5. Parallelism on the Nonnegative Matrix Factorization
  6. Matrix Factorization: A Simple Tutorial and Implementation in Python
  7. Lin C J. Projected gradient methods for nonnegative matrix factorization[J]. Neural computation, 2007, 19(10): 2756-2779.
Non-negative%20Matrix%20Factorization%0A%3D%3D%3D%3D%0A@%5Bmatrix%20factorization%7Cpublished%5D%20%20%0A%23%23%20%u4E00%20%u95EE%u9898%u5B9A%u4E49%3A%20%20%0A%u7ED9%u5B9A%u77E9%u9635%60%24%5Cmathcal%7BA%7D%5Cin%20%5CRe%20%5E%7Bm%5Ctimes%20n%7D%24%60%2C%60%24%5Cmathcal%7B%5Calpha%7D_%7Bij%7D%5Cge0%24%60%u4EE5%u53CA%u9884%u5148%u5236%u5B9A%u7684%u6B63%u6574%u6570%60%24%5Cmathcal%7Bk%7D%5Cll%20min%28m%2Cn%29%24%60%u901A%u8FC7NMF%u5C06%60%24%5Cmathcal%7BA%7D%24%60%u6210%u4E24%u4E2A%u975E%u8D1F%u77E9%u9635%3A%20%20%0A%60%24%5Cmathcal%7BW%7D%5Cin%20%5CRe%20%5E%7Bm%5Ctimes%20k%7D%24%60%u4EE5%u53CA%60%24%5Cmathcal%7BH%7D%20%5Cin%20%5CRe%20%5E%7Bk%20%5Ctimes%20n%7D%24%60%2C%20%u5373%uFF1A%20%20%0A%60%24%5Cmathcal%7BA%7D%20%5Capprox%20%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D%24%60%20%20%20%0A%60%24%5Cmathcal%7Bk%7D%24%60%u4E3A%u8FD1%u4F3C%u77E9%u9635%60%24%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D%24%60%u7684%u79E9%20%20%20%20%0A%0A%u5982%u4E0B%u56FE%u6240%u793A%3A%20%20%0A%21%5BAlt%20text%5D%28data%3Aimage%2Clocal%3A//als_eqn.jpg%29%20%20%0A%20%20%0A---%0A%0A%23%23%u4E8C%20%u76EE%u6807%u51FD%u6570%u7684%u5F62%u5F0F%3A%20%20%0ANMF%u7684%u76EE%u6807%u51FD%u6570%u8868%u793A%u6210%u975E%u7EBF%u6027%u4F18%u5316%u95EE%u9898%u5F62%u5F0F%3A%20%20%0A%23%23%23%232.1%20%u53EF%u4EE5%u5B9A%u4E49%u4E3A%u77E9%u9635%u8303%u6570%u7684%u5F62%u5F0F%3A%20%0A%60%60%60mathjax%0A%20%20%20%20%20%20%20%20min_%7B%5Cmathcal%7BW%7D%2C%5Cmathcal%7BH%7D%7D%20%5Cspace%20f%28%5Cmathcal%7BW%7D%2C%20%5Cmathcal%7BH%7D%29%20%3D%20%5Cfrac%7B1%7D%7B2%7D%5Cparallel%20%5Cmathcal%7BA%7D%20-%20%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D%20%5Cparallel_%5Cmathcal%7BF%7D%20%5E2%0A%60%60%60%0A%60%60%60mathjax%0A%20%20%20%20subject%20%5Cspace%20to%20%5Cspace%20%5Cmathcal%7BW%7D_%7Bia%7D%20%5Cge%200%2C%5Cspace%20%5Cmathcal%7BH%7D_%7Bbj%7D%20%5Cge%200%2C%20%5Cforall%20i%2Ca%2Cb%2Cj%0A%60%60%60%20%0A%0A%u5176%u4E2D%60%24%5Cparallel%20.%20%5Cparallel_%5Cmathcal%7BF%7D%24%60%20%u662F%u77E9%u9635%u7684F%u8303%u6570%28Frobenius%20norm%2C%60%24%5Cshortparallel%20%5Cmathcal%7BA%7D%20%5Cshortparallel%20_%5Cmathcal%7BF%7D%20%3D%20%28%5Csum%5Cmid%5Calpha_%7Bij%7D%20%5Cmid%20%5E2%29%20%5E%5Cfrac%7B1%7D%7B2%7D%24%60%29%20%20%0A%23%23%23%232.2%20%u4E5F%u53EF%u4EE5%u5B9A%u4E49%u6210KL%u6563%u5EA6%28Kullback-Leibler%20divergence%29%u7684%u5F62%u5F0F%3A%20%20%20%0A%60%60%60mathjax%0A%20%20%20%20min_%7BW%2CH%7D%20%5Cspace%20%5Csum_%7Bi%3D1%7D%5En%20%5Csum_%7Bj%3D1%7D%5Em%28%5Cmathcal%7BV%7D_%7Bij%7D%20log%20%5Cfrac%7B%5Cmathcal%7BV%7D_%7Bij%7D%7D%7B%28%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D%29%7D_%7Bij%7D-%5Cmathcal%7BV%7D_%7Bij%7D%20+%20%28%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D%29_%7Bij%7D%29%0A%60%60%60%20%20%0A%60%60%60mathjax%0A%20%20%20%20subject%20%5Cspace%20to%20%5Cspace%20%5Cmathcal%7BW%7D_%7Bia%7D%20%5Cge%200%2C%20%5Cmathcal%7BH%7D_%7Bbj%7D%20%5Cge%200%2C%20%5Cforall%20i%2Ca%2Cb%2Cj%0A%60%60%60%0A%u5982%u679C%60%24%5Cmathcal%7BV%7D_%7Bij%7D%3D0%24%60%20%u6216%u8005%20%60%24%28%5Cmathcal%7BW%7D%20%5Cmathcal%7BH%7D_%7Bij%7D%29%20%3D%200%24%60%2C%u90A3%u4E48%u4E0A%u5F0F%u4E0D%u662F%u826F%u597D%u5B9A%u4E49%u7684%28well-defined%29.%u56E0%u6B64%2C%u4E25%u683C%u6765%u8BF4%u4E0D%u80FD%u8BF4%u8FD9%u6837%u7684%u76EE%u6807%u51FD%u6570%u5F62%u5F0F%u662Fbound-constrained%20problem.%u6240%u4EE5%2C%u672C%u6587%u540E%u9762%u90E8%u5206%u4EC5%u8003%u8651%u77E9%u9635%u8303%u6570%u5F62%u5F0F%u7684%u76EE%u6807%u51FD%u6570.%20%20%0A%0A----%0A%23%23%20%u4E09%20%u4F18%u5316%u7B97%u6CD5%20%20%0A%u901A%u7528%u7B97%u6CD5%u6846%u67B6%u5982%u4E0B%u56FE%u7684%u7B2C%u4E8C%u4E2A%u5FAA%u73AF%u90E8%u5206%3A%20%20%0A%21%5BAlt%20text%5D%28data%3Aimage%2Clocal%3A//algo1.jpg%29%0A%23%23%23%233.1%20Multiplicative%20Update%20Methods%20%20%20%0A%u7B97%u6CD5%u4F2A%u4EE3%u7801%u5982%u4E0B%u56FE%3A%20%20%0A%0A%21%5BAlt%20text%5D%28data%3Aimage%2Clocal%3A//mu.jpg%29%20%20%0A%28%u672C%u56FE%u6765%u81EA%u63A8%u8350%u9605%u8BFB7%29%20%20%0A%0ATo%20have%20Algo1%20well-defined%2C%20one%20must%20ensure%20that%20denominators%20in%20%284%29%20and%20%285%29%20are%20**strictly%20positive**.%20Moreover%2C%20if%20%60%24%5Cmathcal%7BW%7D_%7Bia%7D%5Ek%3D0%24%60%20at%20the%20*k*th%20iteratoin%2C%20then%20%60%24%5Cmathcal%7BW%7D_%7Bia%7D%3D0%24%60%20at%20all%20subsequent%20iterations.%20The%20following%20theorem%20dicscuss%20when%20this%20property%20holds%3A%20%20%20%20%0A%0A**Theorem%201**%20*If%20%60%24%5Cmathcal%7BV%7D%24%60%20has%20neither%20zero%20column%20nor%20row%2C%20and%20%60%24%5Cmathcal%7BW%7D_%7Bia%7D%5E%7B1%7D%3E0%24%60%20and%20%60%24%5Cmathcal%7BH%7D_%7Bbj%7D%5E%7B1%7D%3E0%2C%20%5Cforall%20i%2C%20a%2C%20b%2C%20j%24%60%20then*%20%20%0A%60%60%60mathjax%0A%20%20%20%20%5Cmathcal%7BW%7D_%7Bia%7D%5E%7Bk%7D%3E0%20%5Cspace%20and%20%5Cmathcal%7BH%7D_%7Bbj%7D%5Ek%20%3E%200%2C%20%5Cforall%20i%2C%20a%2C%20b%2C%20j%2C%20%5Cforall%20k%20%5Cge%201%0A%60%60%60%20%20%20%20%20%0A*The%20overall%20cost%20of%20Algo1%20is%20*%0A%0A%60%60%60mathjax%0A%20%20%20%20%5Csharp%20iterations%20%5Ctimes%20O%28nmr%29%0A%60%60%60%20%20%0A**Notes%20that**%2C%20Thought%20Lee%20and%20Seung%20%60%242011%5E%7B%5B1%5D%7D%24%60%20have%20shown%20that%20the%20loss%20functions%20value%20is%20non-increasing%20afger%20every%20update.%20However%2C%20Gonzales%20and%20Zhang%20indicate%20that%20this%20claim%20is%20wrong.%20%20%20%0ATherefore%2C%20**this%20multiplicative%20update%20method%20still%20lacks%20optimization%20properties.%20%20**%0A%0A%23%23%23%233.2%20Alternating%20Least%20Squares%28ALS%29%20%20%0A%u7B97%u6CD5%u4F2A%u4EE3%u7801%u5982%u4E0B%3A%20%20%0A%0A%21%5BAlt%20text%5D%28data%3Aimage%2Clocal%3A//algo2.jpg%29%20%20%0A%0AALS%20will%20converge%20to%20a%20fixed%20point%20which%20may%20be%20a%20local%20extremum%20or%20a%20saddle%20point.%20The%20solution%20of%20Eqns.%20%289%29%20and%20%2810%29%20can%2C%20for%20example%2C%20be%20computed%20using%20a%20QR-factorization%20or%20an%20LU-decomposition%2C%20or%20based%20on%20computing%20the%20pseudo%20inverse%20of%20%60%24%5Cmathcal%7BH%7D%24%60%20and%20%60%24%5Cmathcal%7BH%7D%24%60%2C%20respectively.%20%20%20%0A%0AFormally%2C%20ALS%20has%20the%20following%20convergence%20result%3A%20%20%0A**Theorem%202**%20*Any%20limit%20point%20of%20the%20sequence%20%60%24%7B%5Cmathcal%7BW%7D%5Ek%2C%20%5Cmathcal%7BH%7D%5Ek%7D%24%60%20generated%20by%20Algo2%20is%20a%20stationary%20point%20of%20the%20loss%20function%281%29*%20%20%20%0A%0AIn%20summary%2C%20contraty%20to%20Algo1%2C%20which%20still%20lacks%20convergence%20results%2C%20ALS%20has%20nice%20optimization%20properties.%20%20%0A%0A%23%23%23%233.3%20Projected%20Gradient%28PG%29%20%20%0AThe%20Projected%20Gradient%20bound-constrained%20optimizaton%20method%20used%20for%20NMF%20in%20two%20situations%3A%20%20%0A1.%20**solving%20the%20alternating%20non-negative%20least%20squares%20problems%3A%20%20**%20%20%0A%20%20%20%20Analogously%20to%20ALS%2C%20one%20factor%20%60%24%28%5Cmathcal%7BW%7D%5Cspace%20or%20%5Cspace%20%5Cmathcal%7BH%7D%29%24%60%20is%20updated%20while%20%60%24%5Cmathcal%7BA%7D%24%60%20and%20the%20other%20factor%20are%20kept%20constant%3A%20%20%0A%60%60%60mathjax%0A%20%20%20%20%5Cmathcal%7BH%7D%5E%7Bt+1%7D%20%3D%20%5Cmathcal%7BH%7D%5Et%20-%20%5Calpha_%7BH%7D%5Cbigtriangledown%20_%7BH%7D%20f%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5Et%29%20%20%0A%60%60%60%0A%60%60%60mathjax%0A%20%20%20%20%5Cmathcal%7BW%7D%5E%7Bt+1%7D%20%3D%20%5Cmathcal%7BW%7D%5Et%20-%20%5Calpha_%7BW%7D%5Cbigtriangledown%20_%7BW%7D%20f%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5E%7Bt+1%7D%29%20%20%0A%60%60%60%20%20%0A%60%24%5Calpha_%7BH%7D%20%5Cspace%20and%20%5Cspace%20%5Calpha_W%24%60%20are%20step-size%20parameters.%20%20%0AThe%20partial%20derivatives%20are%20%60%24%5Cbigtriangledown_Hf%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5Et%29%3D%5Cmathcal%7B%28W%7D%5ET%29%5Et%28%5Cmathcal%7BW%7D%5Et%5Cmathcal%7BH%7D%5Et-%5Cmathcal%7BA%7D%29%24%60%20%20%0Aand%20%20%60%24%5Cbigtriangledown_Wf%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5E%7Bt+1%7D%29%3D%28%5Cmathcal%7BW%7D%5Et%5Cmathcal%7BH%7D%5E%7Bt+1%7D-%5Cmathcal%7BA%7D%29%5Cmathcal%7B%28H%5ET%29%7D%5E%7Bt+1%7D%24%60%2C%20respectively.%20%20%20%20%0A%0A%20%20%20%20This%20method%20is%20computationally%20very%20competitive%20and%20has%20better%20convergence%20properties%20than%20the%20standard%20MU%20approach%20in%20many%20cases.%20%20%0A%0A2.%20**minimizing%20the%20loss%20function%20directly%3A**%20%20%0A%20%20%20%20In%20this%20method%2C%20projected%20gradient%20is%20used%20to%20directly%20minimize%20the%20loss%20function.%20From%20the%20current%20solution%20%60%24%28%5Cmathcal%7BW%7D%5Et%2C%5Cmathcal%7BH%7D%5Et%29%24%60%2C%20both%20matrices%20are%20simultaneously%20updated%20to%20%60%24%28%5Cmathcal%7BW%7D%5E%7Bt+1%7D%2C%5Cmathcal%7BH%7D%5E%7Bt+1%7D%29%24%60%20in%20the%20general%20form%3A%20%20%0A%60%60%60mathjax%0A%20%20%20%20%28%5Cmathcal%7BW%7D%5E%7Bt+1%7D%2C%5Cmathcal%7BH%7D%5E%7Bt+1%7D%29%3D%28%5Cmathcal%7BW%7D%5Et%2C%5Cmathcal%7BH%7D%5Et%29-%5Calpha_%7BH%7D%28%5Cbigtriangledown%20_%7BW%7D%20f%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5Et%2C%5Cbigtriangledown%20_%7BH%7D%20f%28%5Cmathcal%7BW%7D%5Et%2C%20%5Cmathcal%7BH%7D%5Et%29%0A%60%60%60%20%20%20%20%0A%20%20%0A%0A%23%23%23%233.4%20Termination%20Criteria%20%20%0A-%20a%20fixed%20number%20of%20iterations%2C%20the%20number%20if%20problem-dependent.%20%20%0A-%20the%20approximation%20accuracy%20of%20the%20NMF%20loss%20function.%20This%20depends%20on%20the%20size%20and%20strucure%20of%20the%20data.%20%20%0A-%20The%20relative%20change%20of%20factors%20%60%24%5Cmathcal%7BW%7D%20%5Cspace%20and%20%5Cspace%20%5Cmathcal%7BH%7D%24%60%20from%20one%20iteration%20to%20the%20next%20iteraion.%20If%20the%20change%20is%20below%20some%20pre-defined%20threshold%20%60%24%5Cdelta%24%60%2C%20then%20terminates.%20%0A%0A----%0A%0A%23%23%23source%20code%0A1.%20%5BLibMF%5D%28http%3A//pan.baidu.com/s/1dDGaPQp%29%0A2.%20%5BPython%20Matrix%20Factorization%20Module%5D%28https%3A//code.google.com/p/pymf/%29%0A%0A%23%23%23%u63A8%u8350%u9605%u8BFB%3A%0A1.%20%5BSeung%20D%2C%20Lee%20L.%20Algorithms%20for%20non-negative%20matrix%20factorization%5BJ%5D.%20Advances%20in%20neural%20information%20processing%20systems%2C%202001%2C%2013%3A%20556-562.%5D%28http%3A//hebb.mit.edu/people/seung/papers/nmfconverge.pdf%29%20%0A2.%20%5BJanecek%20A%2C%20Grotthoff%20S%20S%2C%20Gansterer%20W%20N.%20libNMF--A%20Library%20for%20Nonnegative%20Matrix%20Factorization%5BJ%5D.%20Computing%20and%20Informatics%2C%202012%2C%2030%282%29%3A%20205-224.%5D%28http%3A//www.cai.sk/ojs/index.php/cai/article/view/161/136%29%0A2.%20%5BNon-negative%20Matrix%20Factorization%20%28NMF%29%20by%20Chih-Jen%20Lin%5D%28http%3A//www.csie.ntu.edu.tw/%7Ecjlin/nmf/%29%0A3.%20%5BGradient%20Projection%20and%20Reduced%20Gradient%20Methods%5D%28http%3A//www2.esm.vt.edu/%7Ezgurdal/COURSES/4084/4084-Docs/LECTURES/GradProj.pdf%29%0A3.%20%5BParallelism%20on%20the%20Nonnegative%20Matrix%20Factorization%5D%28http%3A//artecs.dacya.ucm.es/sites/default/files/PARCO.pdf%29%0A3.%20%5BMatrix%20Factorization%3A%20A%20Simple%20Tutorial%20and%20Implementation%20in%20Python%5D%28http%3A//www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/%29%0A4.%20%5BLin%20C%20J.%20Projected%20gradient%20methods%20for%20nonnegative%20matrix%20factorization%5BJ%5D.%20Neural%20computation%2C%202007%2C%2019%2810%29%3A%202756-2779.%5D%28http%3A//www.csie.ntu.edu.tw/%7Ecjlin/papers/pgradnmf.pdf%29


comments powered by Disqus