Numerical methods for Data Assimilation

Gradient computation

We have seen that data assimilation leads to optimization problems.
Gradient based optimization is often required to solve large problem. The alternative is ”derivative free” optimization, which may not be suitable to large problems.
Algorithms to compute the gradient of the 4D Var functional are needed.

Very intuitive : finite differences
Problem : CPU Expensive, choice of
Direct use of the previous slides: store matrices and and implement the formulas.
Problem: Memory Expensive.
rator based approach: implement the operations , . This is the right approach for large problems.

The problem considered is

with . By differentiating this expression, we get from the chain rule

where .

This yields

Using a Horner-like scheme, we write

Therefore, for

can be evaluated by the backward recursion

and for the gradient of the quadratic problem

we get

Computation of the gradients and requires

For a given quadratic , only steps . and . are needed to compute .
To compute , evalutation needed :
application of the operator representing the adjoint of the tangent of the linear model.
Operator differentiation. Use programs to compute
- the devivatives of a code
- the adjoint of a code.
Automatic differentiation takes a program as input and generates a the code implementing the derivative (tangent) or the transpose of the tangent (adjoint) as output.
- It can be used for higher order differentiation
- In CPU, computing the derivative of a function costs a few times the cost for the function evaluation.

Consider . We want to compute

The direct code reads	The tangent code reads (forward mode)

The automatic differentiation tool translate the direct code, routine by routine and line by line, into the tangent

code. For the adjoint code, it is convenient to express the operations in terms of matrices. The tangent linear

operation

the adjoint operation is

The gradient is obtained when the computation is initialized with and

When applied to a mapping , the direct code computes (directional derivative). Obtaining the gradient of a scalar function costs at least times the cost of the function evaluation.
Applied to a scalar function , the reverse mode gives the gradient and costs a few times the cost of a function evaluation. Applied to a general mapping , it enables the computation of . The main difficulty associated with the reverse mode is the memory mangement (storage of a tree).
Important queston related to the sparsity of Jacobian and Hessian to better optimize the computation.
Popular codes are ADIFOR, ADIC, ADOL-C.
These tools are not "magic" : role of round-off errors, of dicretization errors, pathological "if then else" structures. The user has to think about differentiability issue and stability of the computation.

Print S. Gratton and Ph. Toint, submitted to Open Learn. Res. Ed. INPT 0502 (2013) 6h