Changes between Version 3 and Version 4 of Working Groups/TAM/Reference Manual/Nemo Tam Basics


Ignore:
Timestamp:
2010-02-22T17:06:20+01:00 (11 years ago)
Author:
avidard
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Working Groups/TAM/Reference Manual/Nemo Tam Basics

    v3 v4  
    1212 
    1313= Tangent and Adjoint coding techniques =  
    14 the original program {{{P}}}, 
    15 whatever its size and run time, computes a function 
    16 [[Image(ad_1)]] 
    17 which is the composition of the elementary functions 
    18 computed by each run-time instruction. In other words if 
    19 {{{P}}} executes a sequence of elementary statements 
    20 [[Image(ad_2.png)]], then {{{P}}} actually evaluates 
    21 [[Image(ad_3.png)]] 
    22 where each ''f,,k,,'' is the function implemented by ''I,,k,,''. 
    23 Therefore one can apply the chain rule of derivative 
    24 calculus 
    25 to get the Jacobian matrix ''F!''', i.e. the partial 
    26 derivatives of each component of ''Y'' with respect to 
    27 each component of ''X''. Calling ''X,,0,,=X'' and 
    28 ''X,,k,,=f,,k,,(X,,{k-1},,)'' the successive values of all 
    29 intermediate variables, i.e. the successive '''states''' of 
    30 the memory throughout execution of {{{P}}}, we get 
    31  
    32 [[Image(ad_4.png)]] 
    33  
    34 The derivatives ''f!',,k,,'' 
    35 of each elementary instruction are easily built, and must 
    36 be inserted in the differentiated program so 
    37 that each of them has the values ''X,,k-1,,'' directly available 
    38 for use. 
    39 This process yields analytic derivatives, 
    40 that are exact up to numerical accuracy. 
    41  
    42 In practice, two sorts of derivatives are of 
    43 particular importance in scientific computing: the 
    44 tangent (or directional) derivatives, and the 
    45 adjoint (or reverse) derivatives. 
    46  
    47 The tangent derivative is the product 
    48 $\dot{Y} = F'(X) \times \dot{X}$ of the full Jacobian times 
    49 a direction $\dot{X}$ in the input space. 
    50 >From equation above, we find 
    51  
    52 [[Image(ad_5)]] 
    53  
    54 which is most cheaply executed from right to left 
    55 because matrix {{{x}}} vector products are much cheaper 
    56 than matrix {{{x}}} matrix products. 
    57 This is also the most convenient execution order because 
    58 it uses the intermediate values ''X,,k,,'' in the same order 
    59 as the program {{{P}}} builds them. 
    60 On the other hand the adjoint derivative is the product 
    61 [[Image(ad_6)]] of 
    62 the ''transposed'' Jacobian times a weight vector 
    63 $\overline{Y}$ in the output space. The 
    64 resulting $\overline{X}$ is the gradient of the 
    65 dot product $(Y \cdot \overline{Y})$. 
    66 >From equation (\ref{eqchainrule}), we find 
    67 \begin{equation}\label{eqadjmode} 
    68 \overline{X} = F'^{*}(X) \times \overline{Y} = 
    69 f'^{*}_1(X_0) \times \dots \times f'^{*}_{p-1}(X_{p-2}) 
    70 \times f'^{*}_p(X_{p-1}) \times \overline{Y} 
    71 \end{equation} 
    72 which is also most cheaply executed from right to left. 
    73 However, this uses the intermediate values $X_k$ in the 
    74 inverse of their building order in {\tt P}. 
     14the original program P, whatever its size and run time, computes a function F, X∈IRm 􏰄→ Y ∈IRn which is the composition of the elementary functions computed by each run-time instruction. In other words if P executes a sequence of elementary statements Ik,k ∈ [1..p], then P actually evaluates 
     15F =fp ◦fp−1 ◦···◦f1 , 
     16where each fk is the function implemented by Ik. Therefore one can apply the chain rule of derivative calculus to get the Jacobian matrix F′, i.e. the partial derivatives of each component of Y with respect to each component of X. Calling X0 = X and Xk = fk(Xk−1) the successive values of all intermediate variables, i.e. the successive states of the memory throughout execution of P, we get 
     17F′(X)=fp′(Xp−1)×fp′−1(Xp−2)×···×f1′(X0) .       (1) 
     18The derivatives fk′ of each elementary instruction are easily built, and must be inserted in the differentiated program so that each of them has the values Xk−1 directly available for use. This process yields analytic derivatives, that are exact up to numerical accuracy. 
     19In practice, two sorts of derivatives are of particular importance in scien- tific computing: the tangent (or directional) derivatives, and the adjoint (or reverse) derivatives. In particular, tangent and adjoint are the two sorts of derivative programs required for OPA, and TAPENADE provides both. The tangent derivative is the product Y ̇ = F ′(X) × X ̇ of the full Jacobian times a direction X ̇ in the input space. >From equation (1), we find 
     20Y ̇ =F′(X)×X ̇ =fp′(Xp−1)×fp′−1(Xp−2)×···×f1′(X0)×X ̇   (2) 
     21which is most cheaply executed from right to left because matrix×vector prod- ucts are much cheaper than matrix×matrix products. This is also the most convenient execution order because it uses the intermediate values Xk in the same order as the program P builds them. On the other hand the adjoint derivative is the product X = F ′∗(X) × Y of the transposed Jacobian times a 
     22weight vector Y in the output space. The resulting X is the gradient of the 
     23dot product (Y · Y ). >From equation (1), we find X =F′∗(X)×Y =f′∗(X )×···×f′∗  (X      )×f′∗(X )×Y     (3) 
     241 0     p−1 p−2 p p−1 
     25which is also most cheaply executed from right to left. However, this uses the intermediate values Xk in the inverse of their building order in P. 
    7526 
    7627= Potential issues =