@(@\newcommand{\W}[1]{ \; #1 \; }
\newcommand{\R}[1]{ {\rm #1} }
\newcommand{\B}[1]{ {\bf #1} }
\newcommand{\D}[2]{ \frac{\partial #1}{\partial #2} }
\newcommand{\DD}[3]{ \frac{\partial^2 #1}{\partial #2 \partial #3} }
\newcommand{\Dpow}[2]{ \frac{\partial^{#1}}{\partial {#2}^{#1}} }
\newcommand{\dpow}[2]{ \frac{ {\rm d}^{#1}}{{\rm d}\, {#2}^{#1}} }@)@This is cppad-20221105 documentation. Here is a link to its
current documentation
.
AD Theory for Cholesky Factorization
Reference
See section 3.6 of
Sebastian F. Walter's Ph.D. thesis,
Structured Higher-Order Algorithmic Differentiation
in the Forward and Reverse Mode
with Application in Optimum Experimental Design
,
Humboldt-Universitat zu Berlin,
2011.
Cholesky Factor
We are given a positive definite symmetric matrix
@(@
A \in \B{R}^{n \times n}
@)@
and a Cholesky factorization
@[@
A = L L^\R{T}
@]@
where @(@
L \in \B{R}^{n \times n}
@)@ is lower triangular.
Taylor Coefficient
The matrix @(@
A
@)@ is a function of a scalar argument
@(@
t
@)@.
For @(@
k = 0 , \ldots , K
@)@, we use @(@
A_k
@)@ for the
corresponding Taylor coefficients; i.e.,
@[@
A(t) = o( t^K ) + \sum_{k = 0}^K A_k t^k
@]@
where @(@
o( t^K ) / t^K \rightarrow 0
@)@ as @(@
t \rightarrow 0
@)@.
We use a similar notation for @(@
L(t)
@)@.
Lower Triangular Part
For a square matrix @(@
C
@)@,
@(@
\R{lower} (C)
@)@ is the lower triangular part of @(@
C
@)@,
@(@
\R{diag} (C)
@)@ is the diagonal matrix with the same diagonal as
@(@
C
@)@ and
@[@
\R{low} ( C ) = \R{lower} (C) - \frac{1}{2} \R{diag} (C)
@]@
Forward Mode
For Taylor coefficient order @(@
k = 0 , \ldots , K
@)@
the coefficients
@(@
A_k \in \B{R}^{n \times n}
@)@, and
satisfy the equation
@[@
A_k = \sum_{\ell=0}^k L_\ell L_{k-\ell}^\R{T}
@]@
In the case where @(@
k=0
@)@, the
@[@
A_0 = L_0 L_0^\R{T}
@]@
The value of @(@
L_0
@)@ can be computed using the Cholesky factorization.
In the case where @(@
k > 0
@)@,
@[@
A_k = L_k L_0^\R{T} + L_0 L_k^\R{T} + B_k
@]@
where
@[@
B_k = \sum_{\ell=1}^{k-1} L_\ell L_{k-\ell}^\R{T}
@]@
Note that @(@
B_k
@)@ is defined in terms of Taylor coefficients
of @(@
L(t)
@)@ that have order less than @(@
k
@)@.
We also note that
@[@
L_0^{-1} ( A_k - B_k ) L_0^\R{-T}
=
L_0^{-1} L_k + L_k^\R{T} L_0^\R{-T}
@]@
The first matrix on the right hand side is lower triangular,
the second is upper triangular,
and the diagonals are equal.
It follows that
@[@
L_0^{-1} L_k
=
\R{low} [ L_0^{-1} ( A_k - B_k ) L_0^\R{-T} ]
@]@
@[@
L_k
=
L_0 \R{low} [ L_0^{-1} ( A_k - B_k ) L_0^\R{-T} ]
@]@
This expresses @(@
L_k
@)@ in term of the
Taylor coefficients of @(@
A(t)
@)@ and the lower order coefficients
of @(@
L(t)
@)@.
Lemma 1
We use the notation @(@
\dot{C}
@)@ for the derivative of a matrix
valued function @(@
C(s)
@)@ with respect to a scalar argument @(@
s
@)@.
We use the notation @(@
\bar{S}
@)@ and @(@
\bar{L}
@)@ for the
partial derivative of a scalar value function @(@
\bar{F}( S, L)
@)@
with respect to a symmetric matrix @(@
S
@)@ and
an lower triangular matrix @(@
L
@)@.
Define the scalar valued function
@[@
\hat{F}( C ) = \bar{F} [ S , \hat{L} (S) ]
@]@
We use @(@
\hat{S}
@)@ for the total derivative of @(@
\hat{F}
@)@ with
respect to @(@
S
@)@.
Suppose that @(@
\hat{L} ( S )
@)@ is such that
@[@
\dot{L} = L_0 \R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} )
@]@
for any @(@
S(s)
@)@. It follows that
@[@
\hat{S} = \bar{S} + \frac{1}{2} ( M + M^\R{T} )
@]@
where
@[@
M = L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} )^\R{T} L_0^{-1}
@]@
Proof
@[@
\partial_s \hat{F} [ S(s) , L(s) ]
=
\R{tr} ( \bar{S}^\R{T} \dot{S} )
+
\R{tr} ( \bar{L}^\R{T} \dot{L} )
@]@@[@
\R{tr} ( \bar{L}^\R{T} \dot{L} )
=
\R{tr} [
\bar{L}^\R{T} L_0
\R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} )
]
@]@@[@
=
\R{tr} [
\R{low} ( L_0^{-1} \dot{S} L_0^\R{-T} )^\R{T}
L_0^\R{T} \bar{L}
]
@]@@[@
=
\R{tr} [
L_0^{-1} \dot{S} L_0^\R{-T}
\R{low}( L_0^\R{T} \bar{L} )
]
@]@@[@
=
\R{tr} [
L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} ) L_0^{-1} \dot{S}
]
@]@@[@
\partial_s \hat{F} [ S(s) , L(s) ]
=
\R{tr} ( \bar{S}^\R{T} \dot{S} )
+
\R{tr} [
L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} ) L_0^{-1} \dot{S}
]
@]@We now consider the @(@
(i, j)
@)@ component function,
for a symmetric matrix @(@
S(s)
@)@,
defined by
@[@
S_{k, \ell} (s) = \left\{ \begin{array}{ll}
1 & \R{if} \; k = i \; \R{and} \; \ell = j \\
1 & \R{if} \; k = j \; \R{and} \; \ell = i \\
0 & \R{otherwise}
\end{array} \right\}
@]@
This shows that the formula in the lemma is correct for
@(@
\hat{S}_{i,j}
@)@ and @(@
\hat{S}_{j,i}
@)@.
This completes the proof because the component @(@
(i, j)
@)@ was arbitrary.
Lemma 2
We use the same assumptions as in Lemma 1 except that the
matrix @(@
S
@)@ is lower triangular (instead of symmetric).
It follows that
@[@
\hat{S} = \bar{S} + \R{lower}(M)
@]@
where
@[@
M = L_0^\R{-T} \R{low}( L_0^\R{T} \bar{L} )^\R{T} L_0^{-1}
@]@
The proof of this lemma is identical to Lemma 2 except that component function
is defined by
@[@
S_{k, \ell} (s) = \left\{ \begin{array}{ll}
1 & \R{if} \; k = i \; \R{and} \; \ell = j \\
0 & \R{otherwise}
\end{array} \right\}
@]@
Case k = 0
For the case @(@
k = 0
@)@,
@[@
\dot{A}_0
=
\dot{L}_0 L_0^\R{T}
+
L_0 \dot{L}_0^\R{T}
@]@
@[@
L_0^{-1} \dot{A}_0 L_0^\R{-T}
=
L_0^{-1} \dot{L}_0
+
\dot{L}_0^\R{T} L_0^\R{-T}
@]@
@[@
\R{low} ( L_0^{-1} \dot{A}_0 L_0^\R{-T} )
=
L_0^{-1} \dot{L}_0
@]@
@[@
\dot{L}_0
=
L_0 \R{low} ( L_0^{-1} \dot{A}_0 L_0^\R{-T} )
@]@
It follows from Lemma 1 that
@[@
\bar{A}_0 \stackrel{+}{=} \frac{1}{2} ( M + M^\R{T} )
@]@
where
@[@
M = L_0^\R{-T} \R{low} ( L_0^\R{T} \bar{L}_0 )^\R{T} L_0^{-1}
@]@
and @(@
\bar{A}_0
@)@ is the partial before and after
is before and after @(@
L_0
@)@ is removed from the scalar function
dependency.
Case k > 0
In the case where @(@
k > 0
@)@,
@[@
A_k = L_k L_0^\R{T} + L_0 L_k^\R{T} + B_k
@]@
where @(@
B_k
@)@ is defined in terms of Taylor coefficients
of @(@
L(t)
@)@ that have order less than @(@
k
@)@.
It follows that
@[@
\dot{L}_k L_0^\R{T}
+
L_0 \dot{L}_k^\R{T}
=
\dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T}
@]@
@[@
L_0^{-1} \dot{L}_k
+
\dot{L}_k^\R{T} L_0^\R{-T}
=
L_0^{-1} (
\dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T}
) L_0^\R{-T}
@]@
@[@
L_0^{-1} \dot{L}_k
=
\R{low} [ L_0^{-1} (
\dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T}
) L_0^\R{-T} ]
@]@
@[@
\dot{L}_k
=
L_0 \R{low} [ L_0^{-1} (
\dot{A}_k - \dot{B}_k - \dot{L}_0 L_k^\R{T} - L_k \dot{L}_0^\R{T}
) L_0^\R{-T} ]
@]@
The matrix @(@
A_k
@)@ is symmetric, it follows that
@[@
\bar{A}_k \stackrel{+}{=} \frac{1}{2} ( M_k + M_k^\R{T} )
@]@
where
@[@
M_k = L_0^\R{-T} \R{low} ( L_0^\R{T} \bar{L}_k )^\R{T} L_0^{-1}
@]@
The matrix @(@
B_k
@)@ is also symmetric, hence
@[@
\bar{B}_k = - \; \frac{1}{2} ( M_k + M_k^\R{T} )
@]@
We define the symmetric matrix @(@
C_k (s)
@)@ by
@[@
\dot{C}_k = \dot{L}_0 L_k^\R{T} + L_k \dot{L}_0^\R{T}
@]@
and remove the dependency on @(@
C_k
@)@ with
@[@
\R{tr}( \bar{C}_k^\R{T} \dot{C}_k )
=
\R{tr}( \bar{B}_k^\R{T} \dot{C}_k )
=
\R{tr}( \bar{B}_k^\R{T} \dot{L}_0 L_k^\R{T} )
+
\R{tr}( \bar{B}_k^\R{T} L_k \dot{L}_0^\R{T} )
@]@
@[@
=
\R{tr}( L_k^\R{T} \bar{B}_k^\R{T} \dot{L}_0 )
+
\R{tr}( L_k^\R{T} \bar{B}_k \dot{L}_0 )
@]@
@[@
=
\R{tr}[ L_k^\R{T} ( \bar{B}_k + \bar{B}_k^\R{T} ) \dot{L}_0 ]
@]@
Thus, removing @(@
C_k
@)@ from the dependency results in the
following update to @(@
\bar{L}_0
@)@:
@[@
\bar{L}_0 \stackrel{+}{=} \R{lower} [ ( \bar{B}_k + \bar{B}_k^\R{T} ) L_k ]
@]@
which is the same as
@[@
\bar{L}_0 \stackrel{+}{=} 2 \; \R{lower} [ \bar{B}_k L_k ]
@]@
We still need to remove @(@
B_k
@)@ from the dependency.
It follows from its definition that
@[@
\dot{B}_k = \sum_{\ell=1}^{k-1}
\dot{L}_\ell L_{k-\ell}^\R{T} + L_\ell \dot{L}_{k-\ell}^\R{T}
@]@
@[@
\R{tr}( \bar{B}_k^\R{T} \dot{B}_k )
=
\sum_{\ell=1}^{k-1}
\R{tr}( \bar{B}_k^\R{T} \dot{L}_\ell L_{k-\ell}^\R{T} )
+
\R{tr}( \bar{B}_k^\R{T} L_\ell \dot{L}_{k-\ell}^\R{T} )
@]@
@[@
=
\sum_{\ell=1}^{k-1}
\R{tr}( L_{k-\ell}^\R{T} \bar{B}_k^\R{T} \dot{L}_\ell )
+
\sum_{\ell=1}^{k-1}
\R{tr}( L_\ell^\R{T} \bar{B}_k \dot{L}_{k-\ell} )
@]@
We now use the fact that @(@
\bar{B}_k
@)@ is symmetric to conclude
@[@
\R{tr}( \bar{B}_k^\R{T} \dot{B}_k )
=
2 \sum_{\ell=1}^{k-1}
\R{tr}( L_{k-\ell}^\R{T} \bar{B}_k^\R{T} \dot{L}_\ell )
@]@
Each of the @(@
\dot{L}_\ell
@)@ matrices is lower triangular.
Thus, removing @(@
B_k
@)@ from the dependency results in the following
update for @(@
\ell = 1 , \ldots , k-1
@)@:
@[@
\bar{L}_\ell
\stackrel{+}{=} 2 \; \R{lower}( \bar{B}_k L_{k-\ell} )
@]@
Input File: omh/theory/cholesky.omh