Last Updated:

Jacobian matrix of a composite function

S. M. Saeed Damadi

This post is the continuation of what I have discussed here to clarify what is Jacobian matrix. We are going to see an example in which Jacobian matrix is being applied on a composite function including two functions. This case is very intuitive since it will shed light on the general case. Using what we will find here, the general case will be done.

Consider $h(x)=f(g(x))$ where $x \in \mathbb{R}^n$, $g:  \mathbb{R}^n \rightarrow  \mathbb{R}^\ell$, and $f:  \mathbb{R}^{\ell} \rightarrow  \mathbb{R}^m$.

Hence, $$h:  \mathbb{R}^{n} \rightarrow  \mathbb{R}^m$$

According to what I explained here, Jacobian matrix of $h$ is an $m \times n$.

Also,

$$
J_h(x) = J_f(g(x))J_g(x)
$$ 

where $J_g(x)$ is an $\ell \times n$ and $J_f(g(x))$ is an $m \times \ell $ matrix.

Remark:

  • $J_g(x)$ is an $\ell \times n$ matrix because it takes an $n$ dimensional vector and maps it to $\ell$ dimensional vector.
  • $J_f(g(x))$ is an $m \times \ell$ matrix because its input $g(x)$ is an $\ell$ dimensional vector and the output of $f$ is an $m$ dimensional vector.
  • Since the number of columns of $J_f(g(x))$ is the same as the number of rows of $J_g(x)$  they are comformable and the product is an $m \times n$ matrix.
  • $J_g(x)$ is simply Jacobian matrix of a vector-valued function which was defined in the other post.
  • To find $J_f(g(x))$, it is required to have $f$ in terms of each coordinate of $g(x)$ in order to be able to find all the first-order partial derivatives. If $f$ and $g$ are given separately as two functions like $f(z)$ and $g(x)$, to find $J_f(g(x))$ it suffice to find the jacobian of $f$ with respect to $z$ then substitute coordinates of $g(x)$ in as the coordinates of $z$. Precisely, we have $J_f(z) \circ g(x)$.
  • When $h(x)$ is given as a function of $x$, then we are back to the case where Jacobian matrix can be found as the way we found it for a vector-valued fucntion in the other post.

In what follows, I will walk you through what I discussed for the above case. Suppose we are given

$f\left(u,v\right)=u^{2}+3v^{2}$,

$g\left(x,y\right)=\begin{bmatrix} e^{x}\cos y  \\ e^{x}\sin y \end{bmatrix} $,

and the goal is finding the Jacobian matrix of the composite function.  Also, let the composite function be $h$.

Question: $h$ is the composite function of $f$ and $g$ or $g$ and $f$?

Answer: To clarify the above question, we are required to think of $f$ and $g$ as two mappings where $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ and $g: \mathbb{R}^2 \rightarrow \mathbb{R}^2$.

Suppose $h = f \circ g$.

Since $g$ is the last function in the composite function, variables of $h$ are going to be vaibales of $g$. Therefore, $J_h(x, y) = J_f(g(u,v))J_g(x, y)$. According to the first and second bullet, $J_f(g(u,v))$ is a $1 \times 2$ matrix and $J_g(x, y)$ is a $2 \times 2$ matrix. Hence, the result is a $1 \times 2$ matrix. However, if we flip $f$ and $g$, and consider $h$ as $h=g \circ f$ and we should have

$$J_h(u, v) \stackrel{?}{=} J_g(f(u,v))J_f(u, y)$$

It is easy to see from the first bullet that we have a $1 \times 2$ for $J_f(u, v)$ matrix but $J_g(f(x,y))$ is not defind since one should take the derivatives of all coordinates of $g$ with respect to all varibales of $g$ which are now coordinates of $f$; but $f$ is a scalar-valued function and does not output the same dimension as $g$ input needs. In effect, $h=g \circ f$ is not defined. Hence, by compostion function, we mean $h = f \circ g$.

Let's first compute $J_g(x, y)$ where $g$ is a function of $x,y$ and it is a vector-valued function so

$$
J_g(x, y) = \begin{bmatrix}
\frac{\partial g_1}{\partial x} & \frac{\partial g_1}{\partial y} \\
\frac{\partial g_2}{\partial x} & \frac{\partial g_2}{\partial y} 
\end{bmatrix}=
\begin{bmatrix}
e^x\cos y & -e^x\sin y \\ e^x\sin y & e^x\cos y
\end{bmatrix}$$  

Now to compute $J_f(g(u,v))$ using the fifth bullet which yields
$$
\begin{align}
J_f(g(u,v)) &= J_f(z = (u,v))\circ g(x,y)\\
&=
\begin{bmatrix}
\frac{\partial f}{\partial u} & \frac{\partial f}{\partial v}
\end{bmatrix} \circ g(x,y) \\
&=\begin{bmatrix}
2u & 6v
\end{bmatrix}\circ g(x,y)\\
&=\begin{bmatrix}
2u & 6v
\end{bmatrix}\circ
\begin{bmatrix}
e^{x}\cos y \\ e^{x}\sin y
\end{bmatrix}\\
&=
\begin{bmatrix} 2e^{x}\cos y & 6e^{x}\sin y \end{bmatrix}
\end{align}
$$

Finally,

$$
\begin{align}
J_{h}(x,y) &= J_{f \circ g}(x,y) = J_f(g(u,v)) J_g(x, y)\\
&=
\begin{bmatrix} 2e^{x}\cos y & 6e^{x}\sin y \end{bmatrix}
\begin{bmatrix}
e^x\cos y & -e^x\sin y \\ e^x\sin y & e^x\cos y
\end{bmatrix}\\
&=
\begin{bmatrix}
2e^{2x}\cos^2y + 6e^{2x}\sin^2y & -2e^{2x}\cos y \sin y + 6e^{2x}\sin y\cos y
\end{bmatrix} \\
&=
2e^{2x}
\begin{bmatrix}
1 + 2\sin^2y & 2\sin y\cos y
\end{bmatrix}
\end{align}\\
$$

 Conclusion: We went through all the steps of finding the Jacobian matrix of a composite function consists of two functions. It is one step left to apply all these result to the lsos function of a neural network.