Non-negative Matrix Factorization
Non-negative Matrix Factorization
一 问题定义:
给定矩阵A∈Rm×n,αij≥0以及预先制定的正整数k≪min(m,n)通过NMF将A成两个非负矩阵:
W∈Rm×k以及H∈Rk×n, 即:
A≈WH
k为近似矩阵WH的秩
如下图所示:
二 目标函数的形式:
NMF的目标函数表示成非线性优化问题形式:
2.1 可以定义为矩阵范数的形式:
其中∥.∥F 是矩阵的F范数(Frobenius norm,∥A∥F=(∑∣αij∣2)12)
2.2 也可以定义成KL散度(Kullback-Leibler divergence)的形式:
如果Vij=0 或者 (WHij)=0,那么上式不是良好定义的(well-defined).因此,严格来说不能说这样的目标函数形式是bound-constrained problem.所以,本文后面部分仅考虑矩阵范数形式的目标函数.
三 优化算法
通用算法框架如下图的第二个循环部分:
3.1 Multiplicative Update Methods
算法伪代码如下图:
(本图来自推荐阅读7)
To have Algo1 well-defined, one must ensure that denominators in (4) and (5) are strictly positive. Moreover, if Wkia=0 at the kth iteratoin, then Wia=0 at all subsequent iterations. The following theorem dicscuss when this property holds:
Theorem 1 If V has neither zero column nor row, and W1ia>0 and H1bj>0,∀i,a,b,j then
The overall cost of Algo1 is
Notes that, Thought Lee and Seung 2011[1] have shown that the loss functions value is non-increasing afger every update. However, Gonzales and Zhang indicate that this claim is wrong.
Therefore, this multiplicative update method still lacks optimization properties.
3.2 Alternating Least Squares(ALS)
算法伪代码如下:
ALS will converge to a fixed point which may be a local extremum or a saddle point. The solution of Eqns. (9) and (10) can, for example, be computed using a QR-factorization or an LU-decomposition, or based on computing the pseudo inverse of H and H, respectively.
Formally, ALS has the following convergence result:
Theorem 2 Any limit point of the sequence Wk,Hk generated by Algo2 is a stationary point of the loss function(1)
In summary, contraty to Algo1, which still lacks convergence results, ALS has nice optimization properties.
3.3 Projected Gradient(PG)
The Projected Gradient bound-constrained optimizaton method used for NMF in two situations:
solving the alternating non-negative least squares problems:
Analogously to ALS, one factor (W or H) is updated while A and the other factor are kept constant:Ht+1=Ht−αH▽Hf(Wt,Ht)Wt+1=Wt−αW▽Wf(Wt,Ht+1)αH and αW are step-size parameters.
The partial derivatives are ▽Hf(Wt,Ht)=(WT)t(WtHt−A)
and ▽Wf(Wt,Ht+1)=(WtHt+1−A)(HT)t+1, respectively.This method is computationally very competitive and has better convergence properties than the standard MU approach in many cases.
minimizing the loss function directly:
In this method, projected gradient is used to directly minimize the loss function. From the current solution (Wt,Ht), both matrices are simultaneously updated to (Wt+1,Ht+1) in the general form:(Wt+1,Ht+1)=(Wt,Ht)−αH(▽Wf(Wt,Ht,▽Hf(Wt,Ht)
3.4 Termination Criteria
- a fixed number of iterations, the number if problem-dependent.
- the approximation accuracy of the NMF loss function. This depends on the size and strucure of the data.
- The relative change of factors W and H from one iteration to the next iteraion. If the change is below some pre-defined threshold δ, then terminates.
source code
- LibMF
- Python Matrix Factorization Module
推荐阅读:
- Seung D, Lee L. Algorithms for non-negative matrix factorization[J]. Advances in neural information processing systems, 2001, 13: 556-562.
- Janecek A, Grotthoff S S, Gansterer W N. libNMF--A Library for Nonnegative Matrix Factorization[J]. Computing and Informatics, 2012, 30(2): 205-224.
- Non-negative Matrix Factorization (NMF) by Chih-Jen Lin
- Gradient Projection and Reduced Gradient Methods
- Parallelism on the Nonnegative Matrix Factorization
- Matrix Factorization: A Simple Tutorial and Implementation in Python
- Lin C J. Projected gradient methods for nonnegative matrix factorization[J]. Neural computation, 2007, 19(10): 2756-2779.