Skip to content

Latest commit

 

History

History
87 lines (66 loc) · 3.68 KB

optimisation_new.org

File metadata and controls

87 lines (66 loc) · 3.68 KB

Numerical Optimisation

Representation of Functions

  • possibly should go here?

Numerical Calculus

  • Needed for some of the optimisation algorythms
  • Different ways of calculating derivatives
  • Finite Difference, with different orders depending on number of neighbours
  • Difference Stencil
  • Collocation Methods, can represent function/data in terms of some basis functions whose derivatives are easy to compute (in fact all numerical differentiation is this).

General Problems as Optimisation Problems

  • Many problems can be rewritten as an optimisation problem
  • e.g.

G x = d

  • Where G is an (in priciple nonlinear) opperator, and x and d live in (different) vector spaces.
  • The dimensions of x and d need not be the same.
  • I will refer to x’s vector space as model space and d’s as data space (This is just a terminology from a particular application of optimisation).
  • If x is a trial solution to the above then we have the residual

r = G x - d

  • To solve have to find a way of improving our guess for x so that r=0

The Objective Function

  • Normal way to do this is with an objective function

φ = |G x - d|_N

  • Where $|⋅|_N$ is some appropriate norm
  • Now the solution corresponds to finding the minima of φ i.e. find

    \frac{δ φ}{δ x} = 0

  • Where the δ denote functional derivatives
  • In minimising φ we want to make use of the gradient information as this allows for faster convergence
  • Can get this numerically
  • Alternatively if φ takes the form of an action then we can obtain this from a Euler-Lagrange Equation

Regulerisation

  • In general G x = d is quite generic and can include massively over/underdetermined systems
  • $G$ can also be singular/near singular
  • This can lead to unstable solutions which can be highly non-smooth/contain very large values
  • The solution to this is regulerisation, we modify the objective function so that it penalises unwanted behaviour in the solution
  • e.g.

    φ = |G x - d|N + μ1 |x|M + μ2 |∇ x |M

  • where μ1 and μ2 are trade off parameters
  • note strictly the norms are over different vector spaces
  • ∇ x is an appropriate gradient of x (if this is defined). e.g. a central difference if x is a 1D array
  • We now minimise this new objective function
  • The first of these penalises the solver for making x too large
  • The second penalises the solver if x isn’t smooth
  • As an example if G were a Singular/Near Singular Matrix then the first term is equivilent to adding a small value to the diagonal of this matrix so it can be inverted without running into floating point issues

Methods of Minimising φ

  • There are many methods of minimising φ
  • One example is steepest decent - which is essentially a multi-dimensional Newton Raphson
  • The Algorithm is as follows
    1. Calculate the gradient of φ : ∇ φ(xi)
    2. Conpute a new solution candidate xi+1 = xi - h ∇ φ(xi), h is the step size
    3. Compute φ(xi+1)
    4. Accept if φ(xi) > φ(xi+1)
  • This can be improved in several ways
    • The size of h can be determinded by a line-seach algorythm, i.e. a mini 1D optimisation problem to determine h within our multidimensional optimisation problem
    • The step directions calculated by steepest decent are orthogonal in ‘model space’. It would be better if they were orthogonal in ‘data space’. A closely related algrythm is the conjugate gradient method which garuntees orthogonality in data space.

optimsiation_method_images.png

  • TODO
    • Function representation
    • Clarify Norms/Inner products
    • Write out conjugate gradient method