Numerical methods for Data Assimilation

Newton method

Depends on the problem size. for reasonable size problems (machine?) it requires the solution of
It converges locally quadratically to a critical point under some assumptions ( is Lipschitz continuous around and is SPD)
It is possible to make the convergence global under mild assumptions with trust-region techniques. Linesearch is also possible.
If feasible (size, computational cost) this is a method of choice.
Impraticable for the ocean and atmosphere data assimilation problems.

Quasi-Newton Method

Several variantis available (SR1, PSB, DFP, BFGS,..)
For BFGS, the matrice is not computed, but its inverse is approximated using the secant equation :
The matrix is the closest matrix to for some weighted norm
Updating formula
Limited memory implementation : is kept in factored form and only the last pairs are taken into account . Example : L-BFGS, M1QN3.

Quasi-Newton method for data assimilation

Gradient computation ,
and are the jacobian matrices of and at and at
Computing, storing at all step is not feasible in large scale data assimilation

The Gauss-Newton algorithm

Solution of a sequence of linear least squares problems and update
It can be seen as a stationary iteration
Does not require second order information. The linear least-squares problem can be solved by a direct or an iterative method (truncated Gauss-Newton method).

Local convergence of stationary iterations

Definition :

The process is locally convergent to , if there exist such that any sequence generated by ( ) converges to whenever . In this case, is called a point of attraction.

If is a point of attraction of ( ) and if is continuous at , then , that is, is a fixed point of
In the case where is a fixed point of , but not a point of attraction of ( ), the sequence may behave in a variety of ways:
it may diverge, cycle, or even exhibit a chaotic behaviour.

Ostrowsky's sufficient convergence condition

Let , , and . We denote by the identity matrix of order .

It is well known that, if , converges to for any . In particular, is a point of attraction of ( ).

Generalization of this result in the case where $G$ is no longer a linear mapping:

Fundamental : Theorem

Suppose that has a fixed point , that is Fréchet differentiable at , and that .

Then is a point of attraction of the iteration ( ).

Asymptotic speed of convergence

Let be a fixed point of . Let be the set of all of the sequences generated by ( ) and that converge to .

Fundamental : Theorem

Suppose and satisfy the assumptions of Theorem . For , one has

Moreover, for any and , there exists a norm such that

for k large enough.

Remarks on Ostrowsky Theorem

he quantity is referred to as the root convergence factor of the sequence ,
converging to . It can be shown (Ortega, Rheinboldt) that the root convergence factor does not depend on the norm chosen on .
Theorem ( ) shows that, if , for any sequence generated by ( )
that converges to and for any such that , there exists such that

Therefore, the convergence of is as rapid as a geometric progression of rate .
Using the norm of Theorem ( ) , there exists such that
that is, converge linearly to .
Using Theorem ( ), for large enough, the smaller , the faster the error decreases as increases.
Therefore, if and satisfy the assumptions of Theorem ( ), the quantity
can be used to measure the asymptotic speed of the convergence of the stationary iteration .

Local convergence and linear rate of convergence

We denote by the Hessian matrix of the function .

Let

We define by

Obviously, is a fixed point of if .

The following theorem of local convergence holds :

Fundamental : Theorem

If is a fixed point of , is twice continuously Fréchet differentiable in a neighborhood of ,

and that has full rank . If , then the Gauss-Newton iteration converges locally to .

Proof

A direct computation of reveals that

.

Let us denote by the surface in given by the parametric representation .

The fixed point condition has the geometrical interpretation that follows.

let be the point of of coordinates and the origin of the coordinate system.

The vector is orthogonal to the plane tangent through to the surface .

Fundamental : Theorem

Suppose that the assumptions of the previous Theorem hold and is nonzero.

One has ,

where is the maximal principal curvature of the surface ${\cal{S}}$ at point with respect to the normal direction .

Proof

We consider a curve through on the surface given by the parametric representation , and let be such that . The intrinsic parameter on the curve satisfies ,

where . The curvature vector of the curve at is by definition . Using the chain rule, yields

Again, the chain rule enables to obtain

The maximal principal curvature of the surface at point with respect to the normal direction is by definition the supremum over all the curves on passing through of the scalar product .

Since is a fixed point of the iterations, , therefore,

were we have denoted by the quantity . The searched supremum is

and matrix manipulations show that this quantity is equal to

Local convergence properties of the Gauss-Newton algorithm

The algorithm is not globally convergent. It is not locally convergent either : for some problems, a fixed point may be a repelling fixed point.
- Consider .
- The unique local minimum is .
- It is possible to show that .
- Therefore, if , is a repelling fixed point.