- possibly should go here?
- Needed for some of the optimisation algorythms
- Different ways of calculating derivatives
- Finite Difference, with different orders depending on number of neighbours
- Difference Stencil
- Collocation Methods, can represent function/data in terms of some basis functions whose derivatives are easy to compute (in fact all numerical differentiation is this).
- Many problems can be rewritten as an optimisation problem
- e.g.
G x′ = d
- Where G is an (in priciple nonlinear) opperator, and x′ and d live in (different) vector spaces.
- The dimensions of x′ and d need not be the same.
- I will refer to x′’s vector space as model space and d’s as data space (This is just a terminology from a particular application of optimisation).
- If x is a trial solution to the above then we have the residual
r = G x - d
- To solve have to find a way of improving our guess for x so that r=0
- Normal way to do this is with an objective function
φ = |G x - d|_N
- Where
$|⋅|_N$ is some appropriate norm - Now the solution corresponds to finding the minima of φ i.e. find
\frac{δ φ}{δ x} = 0
- Where the δ denote functional derivatives
- In minimising φ we want to make use of the gradient information as this allows for faster convergence
- Can get this numerically
- Alternatively if φ takes the form of an action then we can obtain this from a Euler-Lagrange Equation
- In general G x = d is quite generic and can include massively over/underdetermined systems
-
$G$ can also be singular/near singular - This can lead to unstable solutions which can be highly non-smooth/contain very large values
- The solution to this is regulerisation, we modify the objective function so that it penalises unwanted behaviour in the solution
- e.g.
φ = |G x - d|N + μ1 |x|M + μ2 |∇ x |M
- where μ1 and μ2 are trade off parameters
- note strictly the norms are over different vector spaces
- ∇ x is an appropriate gradient of x (if this is defined). e.g. a central difference if x is a 1D array
- We now minimise this new objective function
- The first of these penalises the solver for making x too large
- The second penalises the solver if x isn’t smooth
- As an example if G were a Singular/Near Singular Matrix then the first term is equivilent to adding a small value to the diagonal of this matrix so it can be inverted without running into floating point issues
- There are many methods of minimising φ
- One example is steepest decent - which is essentially a multi-dimensional Newton Raphson
- The Algorithm is as follows
- Calculate the gradient of φ : ∇ φ(xi)
- Conpute a new solution candidate xi+1 = xi - h ∇ φ(xi), h is the step size
- Compute φ(xi+1)
- Accept if φ(xi) > φ(xi+1)
- This can be improved in several ways
- The size of h can be determinded by a line-seach algorythm, i.e. a mini 1D optimisation problem to determine h within our multidimensional optimisation problem
- The step directions calculated by steepest decent are orthogonal in ‘model space’. It would be better if they were orthogonal in ‘data space’. A closely related algrythm is the conjugate gradient method which garuntees orthogonality in data space.
- TODO
- Function representation
- Clarify Norms/Inner products
- Write out conjugate gradient method