What is Jacobian matrix and why do we need it?

Derivative of univariate function

To understand what is Jacobian, we need to revisit the derivative of a univariate function wherein $f$ maps the real line into the real line, that is, $f: \mathbb{R} \rightarrow \mathbb{R} $. The derivative of $f$ denoted by $f'$ measures the sensitivity to change of the function value (output value, $f(x)$) with respect to a change in its argument (input value, $x$). Now the question is how we can measure the change of a function whose inputs is a vector in $\mathbb{R}^n$. From this point on we will focus on the change in a value of a function and try to associate it to the derivative of a univariate function.

Note: $f'$ is also denoted by $\frac{df}{dx}$ which is read as the derivative of $f$ with respect to $x$. Here we have used $d$ since there is no other variables that derivative is being taken with respect to them.

Remark: If $x$ becomes a number in real line instead of a vector, the gradient becomes the derivative which was expected.

The gradient of a real-valued function

When $f$ takes on a vector $x \in \mathbb{R}^n$ where $x = [x_1, x_2, \cdots,x_n]^{\top}$ and maps it to a number on the real line, we write $f : \mathbb{R}^n \rightarrow \mathbb{R}$. Since we have the ability to play with each coordinate of $x$ to change the value of the function, the concept of gradient comes into the picture. The gradient is a vector with the same length of $x$ and each coordinate shows the change in the value of $f$ when the corresponding coordinate changes. Therefore, we write

 \begin{equation}
\nabla f = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \cdots, \frac{\partial f}{\partial x_n}]^{\top}
\end{equation}

Here we use $\nabla$ (nabla) symbol to suppress the right hand side when we want to address all the changes along $n$ directions. Also, $\partial$ is used to clarify that other than (let's say $x_1$), there are other variables whose change can affect the value of $f$.

Jacobian of a vector-valued function

Now let's get back to the question that I asked in the page title. You might say, "well, Jacobian matrix should be something that somehow connects to the change of a vector-valued function which takes a vector as its input". Yes correct! But how we do we write it?

First notice that we have $f$ where $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$, that is:

$$
f(x)=f(x_1, x_2, \cdots,x_n)= \begin{bmatrix}
f_1(x_1, x_2, \cdots,x_n)\\ f_2(x_1, x_2, \cdots,x_n)\\ \vdots \\ f_n(x_1, x_2, \cdots,x_n)
\end{bmatrix}
$$

As you can see the value of each function $f_i$ where $i= 1, 2, \cdots, m$ can vary when one of $x_j$'s changes. This is where Jacobian matrix comes into the picture. It is defined as the following:

$$
J_f(x)= \begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2}& \cdots& \frac{\partial f_1}{\partial x_n}\\
\frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2}& \cdots& \frac{\partial f_2}{\partial x_n}\\
\vdots & \vdots & \cdots & \vdots\\
\frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2}& \cdots& \frac{\partial f_m}{\partial x_n}
\end{bmatrix}
=
\begin{bmatrix}
\nabla^{\top} f_1\\
\nabla^{\top} f_2\\
\vdots\\
\nabla^{\top} f_n\\
\end{bmatrix}
$$

By looking at $i$-th row of the Jacobian matrix, we see that it is transpose of the gradient of $f_i$, i.e., $(\nabla f_i)^{\top}$ which I denoted by $\nabla^{\top} f_i$ . 

Remark 1: If $f$ becomes a scalar-valued function and $x$ becomes a number in real line instead of a vector, the Jacobian matrix becomes the derivative!

Remark 2: Jacobian matrix is the matrix of all first-order partial derivatives of the function.

Why do we need Jacobian matrix?

Jacobian of a composition function is the product of the Jacobian matrices of each functions. Suppose we have the following fucntion:

$$
f (x)= f_1(f_2(\cdots( f_n(x) ) ) = f_1 \circ f_2 \circ \cdots\circ f_n(x)
$$

The Jacobian matrix of $f$ with respect to $x$ is:

$$
J_f = J_{f_1} \cdot J_{f_2} \cdots J_{f_n}(x)
$$

However, there is a big caveat here; in order to find Jacobian matrix of $f$, we need to find all other Jacobian matrices. We know Jacobian matrix is all first-order partial derivatives of the vector-valued function with respect to its variable. But what is the variable of, let's say $f_1$? The variable of $f_1$ is the f but $f_1$, i.e., $f_2(f_3(\cdots( f_n(x) ) ) = f_2 \circ f_3 \circ \cdots\circ f_n(x)$. 

In this post I will go through an example and clarify all those abstract concepts that were discussed in the last paragraph.