Notes for Conjugate Gradient Menthod

The Conjugate Gradient Methods is the most prominent iterative method for solving large (sparse) systems of linear equations of this form:

Ax=b

f(x)=12xTAx−bTx+c

if A is symmetric and positive-definite, then f(x) is minimized by Ax=b(if not symmetric, then 12(AT+A)x=b)

The Solution to Ax=b Minimizes the Quadratic Form:

Supporse A is symmetric, Let x be a point that satisfies Ax=b, and e be an error term, then:

f(x+e)=12(x+e)TA(x+e)−bT(x+e)−c=12xTAx+eTAx+12eTAe−bTx−bTe+c=f(x)+12eTAe

If A is positive-definite, then the latter term is positive for any e≠0, therefore x minimizes f(x)

The fact that f(x) is a paraboloid is our best intuition of what it means for a matrix to be positive-definite. if A is negative-definite——the result of negatiing a positive-definite matrix. A could be singular, in which case no solution is unique; the set of solutions is a line or a hyperplane having a uniform vale for f. if A is none of the above, then x is a saddle point, and techniques like Steepest Descent and CG will likely fail. The value of b and c determine where the minimum point of paraboloid lies, but do not affect the paraboloid's shape.
Alt text

Steepest Descent Method:

For the sake of most optimization problems have not closed-form solution, they usually tackles as iterative methods: Starting at some initial point x(0), then taking a series of steps x(1),x(2),..., until the stop condition is satisfied. So as Steepest Descent Method!
In Line search for setting step size, firstly obstains the step sieze α according to x(i+1)=x(i)+αr(i), where r(i)=b−Ax(i) is the residual in the iterative steps.
Totally, the method of Steepest Descent is:

r(i)=b−Ax(i),α(i)=rT(i)r(i)rT(i)Ar(i),x(i+1)=x(i)+αr(i)

The computational cost of Steepest Descent is dominated by matrx-vector products. There is a mathematical trick to avoid matrix-vector productation directly. See Equation 13 in An Introduction to the Conjugate Gradient Method Without the Agonizing Pain for detail.

The convergence rate is defined as:

f(x(i))−f(x)f(x(0))−f(x)=12eT(i)Ae(i)12eT(0)ATe(0)≤(κ−1κ+1)2i

Alt text

The Method of Conjugate Directions:

Steepest Descent often finds itself taking steps in the same directions as earlier steps(remind the zigzag path, which appears because each gradient is orthogonal to the previous gradient). Conjugate Directions update each step via:

x(i+1)=x(i)+α(i)d(i)

d(i) is a set of orthogonal search directions d(0),d(1),...,d(n−1). e(i+1)=ATx−ATx(i+1) is orthogonal to d(i), and that need never step in the same direction of d(i) again.
However, the solution x is unknown. The solution is to make the search directions A-orthogonal instead of orthogonal. Two vectors d(i) and d(j) are A-orthogonal/conjugate, if

dT(i)Ad(j)=0

The expression for α(i) turns to:

α(i)=−dT(i)Ae(i)dT(i)Ad(i)=dT(i)r(i)dT(i)Ad(i)

If the search vector were the residual, then this formula would be identical to Steepest Descent.

After n iterations, every component of the error term is cut away, and e(n)=0:

e(i)=e(0)+∑i−1j=0α(j)d(j)=∑n−1j=0δ(j)d(j)−∑i−1j=0δ(j)d(j)=∑n−1j=iδ(j)d(j)

The A-orthogonal search directions set d(i) could be found by conjugate Gram-Schmidt process, see Equation 36 in An Introduction to the Conjugate Gradient Method Without the Agonizing Pain for detail.

The Method of Conjugate Gradients:

References

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
Conjugate Gradient Method
Nonlinear Conjugate Gradient Methods
Gram-Schmidt process example
Special Symbols