What is Jacobian matrix and why do we need it?

S. M. Saeed Damadi December 23, 2019

Derivative of univariate function

To understand what is Jacobian, we need to revisit the derivative of a univariate function wherein $f$ maps the real line into the real line, that is, $f: \mathbb{R} \rightarrow \mathbb{R} $. The derivative of $f$ denoted by $f'$ measures the sensitivity to change of the function value (output value, $f(x)$) with respect to a change in its argument (input value, $x$). Now the question is how we can measure the change of a function whose inputs is a vector in $\mathbb{R}^n$. From this point on we will focus on the change in a value of a function and try to associate it to the derivative of a univariate function.

Note: $f'$ is also denoted by $\frac{df}{dx}$ which is read as the derivative of $f$ with respect to $x$. Here we have used $d$ since there is no other variables that derivative is being taken with respect to them.

Remark: If $x$ becomes a number in real line instead of a vector, the gradient becomes the derivative which was expected.

The gradient of a real-valued function

When $f$ takes on a vector $x \in \mathbb{R}^n$ where $x = [x_1, x_2, \cdots,x_n]^{\top}$ and maps it to a number on the real line, we write $f : \mathbb{R}^n \rightarrow \mathbb{R}$. Since we have the ability to play with each coordinate of $x$ to change the value of the function, the concept of gradient comes into the picture. The gradient is a vector with the same length of $x$ and each coordinate shows the change in the value of $f$ when the corresponding coordinate changes. Therefore, we write

\begin{equation}
\nabla f = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n}]^{\top}
\end{equation}

Here we use $\nabla$ (nabla) symbol to suppress the right hand side when we want to address all the changes along $n$ directions. Also, $\partial$ is used to clarify that other than (let's say $x_1$), there are other variables whose change can affect the value of $f$.

Jacobian of a vector-valued function

Now let's get back to the question that I asked in the page title. You might say, "well, Jacobian matrix should be something that somehow connects to the change of a vector-valued function which takes a vector as its input". Yes correct! But how we do we write it?

First notice that we have $f$ where $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, that is:

$$
f(x)=f(x_1, x_2, \cdots,x_n)= \begin{bmatrix}
f_1(x_1, x_2, \cdots,x_n)\\ f_2(x_1, x_2, \cdots,x_n)\\ \vdots \\ f_n(x_1, x_2, \cdots,x_n)
\end{bmatrix}
$$

As you can see the value of each function $f_i$ where $i= 1, 2, \cdots, m$ can vary when one of $x_j$'s changes. This is where Jacobian matrix comes into the picture. It is defined as the following:

$$
J_f(x)= \begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2}& \cdots& \frac{\partial f_1}{\partial x_n}\\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2}& \cdots& \frac{\partial f_2}{\partial x_n}\\
\vdots & \vdots & \cdots & \vdots\\
\frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2}& \cdots& \frac{\partial f_m}{\partial x_n}
\end{bmatrix}
=
\begin{bmatrix}
\nabla^{\top} f_1\\
\nabla^{\top} f_2\\
\vdots\\
\nabla^{\top} f_n\\
\end{bmatrix}
$$

By looking at $i$-th row of the Jacobian matrix, we see that it is transpose of the gradient of $f_i$, i.e., $(\nabla f_i)^{\top}$ which I denoted by $\nabla^{\top} f_i$ .

Remark 1: If $f$ becomes a scalar-valued function and $x$ becomes a number in real line instead of a vector, the Jacobian matrix becomes the derivative!

Remark 2: Jacobian matrix is the matrix of all first-order partial derivatives of the function.

Why do we need Jacobian matrix?

Jacobian of a composition function is the product of the Jacobian matrices of each functions. Suppose we have the following fucntion:

$$
f (x)= f_1(f_2(\cdots( f_n(x) ) ) = f_1 \circ f_2 \circ \cdots\circ f_n(x)
$$

The Jacobian matrix of $f$ with respect to $x$ is:

$$
J_f = J_{f_1} \cdot J_{f_2} \cdots J_{f_n}(x)
$$

However, there is a big caveat here; in order to find Jacobian matrix of $f$, we need to find all other Jacobian matrices. We know Jacobian matrix is all first-order partial derivatives of the vector-valued function with respect to its variable. But what is the variable of, let's say $f_1$? The variable of $f_1$ is the f but $f_1$, i.e., $f_2(f_3(\cdots( f_n(x) ) ) = f_2 \circ f_3 \circ \cdots\circ f_n(x)$.

In this post I will go through an example and clarify all those abstract concepts that were discussed in the last paragraph.

Derivative of univariate function

The gradient of a real-valued function

Jacobian of a vector-valued function

Why do we need Jacobian matrix?

Authors

Newsletter