Changes between Version 4 and Version 5 of Working Groups/TAM/Reference Manual/Nemo Tam Basics


Ignore:
Timestamp:
2010-02-22T17:30:27+01:00 (11 years ago)
Author:
avidard
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Working Groups/TAM/Reference Manual/Nemo Tam Basics

    v4 v5  
    1111cost function around a background control. 
    1212 
    13 = Tangent and Adjoint coding techniques =  
    14 the original program P, whatever its size and run time, computes a function F, X∈IRm 􏰄→ Y ∈IRn which is the composition of the elementary functions computed by each run-time instruction. In other words if P executes a sequence of elementary statements Ik,k ∈ [1..p], then P actually evaluates 
    15 F =fp ◦fp−1 ◦···◦f1 , 
    16 where each fk is the function implemented by Ik. Therefore one can apply the chain rule of derivative calculus to get the Jacobian matrix F′, i.e. the partial derivatives of each component of Y with respect to each component of X. Calling X0 = X and Xk = fk(Xk−1) the successive values of all intermediate variables, i.e. the successive states of the memory throughout execution of P, we get 
    17 F′(X)=fp′(Xp−1)×fp′−1(Xp−2)×···×f1′(X0) .       (1) 
    18 The derivatives fk′ of each elementary instruction are easily built, and must be inserted in the differentiated program so that each of them has the values Xk−1 directly available for use. This process yields analytic derivatives, that are exact up to numerical accuracy. 
    19 In practice, two sorts of derivatives are of particular importance in scien- tific computing: the tangent (or directional) derivatives, and the adjoint (or reverse) derivatives. In particular, tangent and adjoint are the two sorts of derivative programs required for OPA, and TAPENADE provides both. The tangent derivative is the product Y ̇ = F ′(X) × X ̇ of the full Jacobian times a direction X ̇ in the input space. >From equation (1), we find 
    20 Y ̇ =F′(X)×X ̇ =fp′(Xp−1)×fp′−1(Xp−2)×···×f1′(X0)×X ̇   (2) 
    21 which is most cheaply executed from right to left because matrix×vector prod- ucts are much cheaper than matrix×matrix products. This is also the most convenient execution order because it uses the intermediate values Xk in the same order as the program P builds them. On the other hand the adjoint derivative is the product X = F ′∗(X) × Y of the transposed Jacobian times a 
    22 weight vector Y in the output space. The resulting X is the gradient of the 
    23 dot product (Y · Y ). >From equation (1), we find X =F′∗(X)×Y =f′∗(X )×···×f′∗  (X      )×f′∗(X )×Y     (3) 
    24 1 0     p−1 p−2 p p−1 
    25 which is also most cheaply executed from right to left. However, this uses the intermediate values Xk in the inverse of their building order in P. 
     13= Tangent and Adjoint coding principle =  
     14the original program {{{P}}}, whatever its size and run time, computes a function ''F'', ''X''∈ IR^''m''^ → ''Y'' ∈ IR^''n''^ which is the composition of the elementary functions computed by each run-time instruction. In other words if {{{P}}} executes a sequence of elementary statements ''I,,k,,,k'' ∈ [1..''p''], then {{{P}}} actually evaluates 
     15 
     16''F'' = ''f,,p,, ◦ ''f,,p−1,,'' ◦···◦ ''f,,1,,'' , 
     17 
     18where each ''f,,k,,'' is the function implemented by ''I,,k,,''. Therefore one can apply the chain rule of derivative calculus to get the Jacobian matrix ''F!′'', i.e. the partial derivatives of each component of ''Y'' with respect to each component of ''X''. Calling ''X,,0,,'' = ''X'' and ''X,,k,,'' = ''f,,k,,''(''X,,k−1,,'') the successive values of all intermediate variables, i.e. the successive states of the memory throughout execution of {{{P}}}, we get 
     19 
     20''F′''(''X'') = ''f′,,p,,''(''X,,p−1,,'') × ''f′,,p,,'' − 1(''X,,p−2,,'')×···×''f′,,1,,''(''X,,0,,'') . (1) 
     21 
     22The derivatives ''f′,,k,,'' of each elementary instruction are easily built, and must be inserted in the differentiated program so that each of them has the values Xk−1 directly available for use. This process yields analytic derivatives, that are exact up to numerical accuracy. 
     23In practice, two sorts of derivatives are of particular importance in scientific computing: the tangent (or directional) derivatives, and the adjoint (or reverse) derivatives. The tangent derivative is the product  
     24 
     25''dY'' = ''F′''(''X'') × ''dX''  
     26 
     27of the full Jacobian times a direction ''dX'' in the input space. From equation (1), we find 
     28 
     29''dY'' = ''F′''(''X'') × ''dX''  = ''f′,,p,,''(''X,,p−1,,'') × ''f′,,p−1,,''(''X,,p−2,,'')×···×''f′,,1,,''(''X,,0,,'')×''dX''   (2) 
     30 
     31which is most cheaply executed from right to left because matrix × vector products are much cheaper than matrix × matrix products. This is also the most convenient execution order because it uses the intermediate values ''X,,k,,'' in the same order as the program {{{P}}} builds them. On the other hand the adjoint derivative is the product ''Xad'' = ''F′^∗^''(''X'') × ''Y,,ad,,'' of the transposed Jacobian times a 
     32weight vector ''Y,,ad,,'' in the output space. The resulting ''X,,ad,,'' is the gradient of the 
     33dot product (''Y'' · ''Y,,ad,,'' ).  
     34From equation (1), we find  
     35 
     36''X,,ad,,'' =''F′^∗^''(''X'')×''Y,,ad,,'' =''f′,,1,,''∗(''X,,0,,'')×···×''f′^∗^,,p-1,,''(''X,,p-2,,'')×''f′^*^,,p,,''(''X,,p-1,,'')×''Y,,ad,,'' (3) 
     37 
     38 
     39which is also most cheaply executed from right to left. However, this uses the intermediate values ''X,,k,,'' in the inverse of their building order in {{{P}}}. 
     40 
     41for specific details about practical adjoint coding refer to Giering and Kaminski (1998)  
    2642 
    2743= Potential issues =