Numerical Optimisation

Representation of Functions

Needed for some of the optimisation algorythms
Different ways of calculating derivatives
Finite Difference, with different orders depending on number of neighbours
Difference Stencil
Collocation Methods, can represent function/data in terms of some basis functions whose derivatives are easy to compute (in fact all numerical differentiation is this).

G x^′ = d

Where G is an (in priciple nonlinear) opperator, and x^′ and d live in (different) vector spaces.
The dimensions of x^′ and d need not be the same.
I will refer to x^′’s vector space as model space and d’s as data space (This is just a terminology from a particular application of optimisation).
If x is a trial solution to the above then we have the residual

r = G x - d

φ = |G x - d|_N

Where $|⋅|_N$ is some appropriate norm
Now the solution corresponds to finding the minima of φ i.e. find
\frac{δ φ}{δ x} = 0
Where the δ denote functional derivatives
In minimising φ we want to make use of the gradient information as this allows for faster convergence
Can get this numerically
Alternatively if φ takes the form of an action then we can obtain this from a Euler-Lagrange Equation

In general G x = d is quite generic and can include massively over/underdetermined systems
$G$ can also be singular/near singular
This can lead to unstable solutions which can be highly non-smooth/contain very large values
The solution to this is regulerisation, we modify the objective function so that it penalises unwanted behaviour in the solution
e.g.
φ = |G x - d|_N + μ₁ |x|_M + μ₂ |∇ x |_M
where μ₁ and μ₂ are trade off parameters
note strictly the norms are over different vector spaces
∇ x is an appropriate gradient of x (if this is defined). e.g. a central difference if x is a 1D array
We now minimise this new objective function
The first of these penalises the solver for making x too large
The second penalises the solver if x isn’t smooth
As an example if G were a Singular/Near Singular Matrix then the first term is equivilent to adding a small value to the diagonal of this matrix so it can be inverted without running into floating point issues

There are many methods of minimising φ
One example is steepest decent - which is essentially a multi-dimensional Newton Raphson
The Algorithm is as follows
1. Calculate the gradient of φ : ∇ φ(x_i)
2. Conpute a new solution candidate x_i+1 = x_i - h ∇ φ(x_i), h is the step size
3. Compute φ(x_i+1)
4. Accept if φ(x_i) > φ(x_i+1)
This can be improved in several ways
- The size of h can be determinded by a line-seach algorythm, i.e. a mini 1D optimisation problem to determine h within our multidimensional optimisation problem
- The step directions calculated by steepest decent are orthogonal in ‘model space’. It would be better if they were orthogonal in ‘data space’. A closely related algrythm is the conjugate gradient method which garuntees orthogonality in data space.

TODO
- Function representation
- Clarify Norms/Inner products
- Write out conjugate gradient method