Non-negative Matrix Factorization

一问题定义:

给定矩阵A∈Rm×n,αij≥0以及预先制定的正整数k≪min(m,n)通过NMF将A成两个非负矩阵:
W∈Rm×k以及H∈Rk×n, 即：
A≈WH
k为近似矩阵WH的秩

如下图所示:
Alt text

二目标函数的形式:

NMF的目标函数表示成非线性优化问题形式:

2.1 可以定义为矩阵范数的形式:

minW,H f(W,H)=12∥A−WH∥2F

subject to Wia≥0, Hbj≥0,∀i,a,b,j

其中∥.∥F 是矩阵的F范数(Frobenius norm,∥A∥F=(∑∣αij∣2)12)

2.2 也可以定义成KL散度(Kullback-Leibler divergence)的形式:

minW,H ∑ni=1∑mj=1(VijlogVij(WH)ij−Vij+(WH)ij)

subject to Wia≥0,Hbj≥0,∀i,a,b,j

如果Vij=0 或者 (WHij)=0,那么上式不是良好定义的(well-defined).因此,严格来说不能说这样的目标函数形式是bound-constrained problem.所以,本文后面部分仅考虑矩阵范数形式的目标函数.

三优化算法

通用算法框架如下图的第二个循环部分:
Alt text

3.1 Multiplicative Update Methods

算法伪代码如下图:

Alt text
(本图来自推荐阅读7)

To have Algo1 well-defined, one must ensure that denominators in (4) and (5) are strictly positive. Moreover, if Wkia=0 at the kth iteratoin, then Wia=0 at all subsequent iterations. The following theorem dicscuss when this property holds:

Theorem 1 If V has neither zero column nor row, and W1ia>0 and H1bj>0,∀i,a,b,j then

Wkia>0 andHkbj>0,∀i,a,b,j,∀k≥1

The overall cost of Algo1 is

♯iterations×O(nmr)

Notes that, Thought Lee and Seung 2011[1] have shown that the loss functions value is non-increasing afger every update. However, Gonzales and Zhang indicate that this claim is wrong.
Therefore, this multiplicative update method still lacks optimization properties.

3.2 Alternating Least Squares(ALS)

算法伪代码如下:

Alt text

ALS will converge to a fixed point which may be a local extremum or a saddle point. The solution of Eqns. (9) and (10) can, for example, be computed using a QR-factorization or an LU-decomposition, or based on computing the pseudo inverse of H and H, respectively.

Formally, ALS has the following convergence result:
Theorem 2 Any limit point of the sequence Wk,Hk generated by Algo2 is a stationary point of the loss function(1)

In summary, contraty to Algo1, which still lacks convergence results, ALS has nice optimization properties.

3.3 Projected Gradient(PG)

The Projected Gradient bound-constrained optimizaton method used for NMF in two situations:

solving the alternating non-negative least squares problems:
Analogously to ALS, one factor (W or H) is updated while A and the other factor are kept constant:

Ht+1=Ht−αH▽Hf(Wt,Ht)

Wt+1=Wt−αW▽Wf(Wt,Ht+1)

αH and αW are step-size parameters.
The partial derivatives are ▽Hf(Wt,Ht)=(WT)t(WtHt−A)
and ▽Wf(Wt,Ht+1)=(WtHt+1−A)(HT)t+1, respectively.

This method is computationally very competitive and has better convergence properties than the standard MU approach in many cases.
minimizing the loss function directly:
In this method, projected gradient is used to directly minimize the loss function. From the current solution (Wt,Ht), both matrices are simultaneously updated to (Wt+1,Ht+1) in the general form:

(Wt+1,Ht+1)=(Wt,Ht)−αH(▽Wf(Wt,Ht,▽Hf(Wt,Ht)

3.4 Termination Criteria

a fixed number of iterations, the number if problem-dependent.
the approximation accuracy of the NMF loss function. This depends on the size and strucure of the data.
The relative change of factors W and H from one iteration to the next iteraion. If the change is below some pre-defined threshold δ, then terminates.

source code

LibMF
Python Matrix Factorization Module

ivor

Non-negative Matrix Factorization

Non-negative Matrix Factorization

一问题定义:

二目标函数的形式:

2.1 可以定义为矩阵范数的形式:

2.2 也可以定义成KL散度(Kullback-Leibler divergence)的形式:

三优化算法

3.1 Multiplicative Update Methods

3.2 Alternating Least Squares(ALS)

3.3 Projected Gradient(PG)

3.4 Termination Criteria

source code

推荐阅读:

Non-negative Matrix Factorization

Non-negative Matrix Factorization

一 问题定义:

二 目标函数的形式:

2.1 可以定义为矩阵范数的形式:

2.2 也可以定义成KL散度(Kullback-Leibler divergence)的形式:

三 优化算法

3.1 Multiplicative Update Methods

3.2 Alternating Least Squares(ALS)

3.3 Projected Gradient(PG)

3.4 Termination Criteria

source code

推荐阅读:

一问题定义:

二目标函数的形式:

三优化算法