Loading [MathJax]/jax/output/CommonHTML/jax.js
Last Updated:

Jacobian matrix of a composite function

S. M. Saeed Damadi

This post is the continuation of what I have discussed here to clarify what is Jacobian matrix. We are going to see an example in which Jacobian matrix is being applied on a composite function including two functions. This case is very intuitive since it will shed light on the general case. Using what we will find here, the general case will be done.

Consider h(x)=f(g(x)) where xRn, g:RnR, and f:RRm.

Hence, h:RnRm

According to what I explained here, Jacobian matrix of h is an m×n.

Also,

Jh(x)=Jf(g(x))Jg(x) 

where Jg(x) is an ×n and Jf(g(x)) is an m× matrix.

Remark:

  • Jg(x) is an ×n matrix because it takes an n dimensional vector and maps it to dimensional vector.
  • Jf(g(x)) is an m× matrix because its input g(x) is an dimensional vector and the output of f is an m dimensional vector.
  • Since the number of columns of Jf(g(x)) is the same as the number of rows of Jg(x)  they are comformable and the product is an m×n matrix.
  • Jg(x) is simply Jacobian matrix of a vector-valued function which was defined in the other post.
  • To find Jf(g(x)), it is required to have f in terms of each coordinate of g(x) in order to be able to find all the first-order partial derivatives. If f and g are given separately as two functions like f(z) and g(x), to find Jf(g(x)) it suffice to find the jacobian of f with respect to z then substitute coordinates of g(x) in as the coordinates of z. Precisely, we have Jf(z)g(x).
  • When h(x) is given as a function of x, then we are back to the case where Jacobian matrix can be found as the way we found it for a vector-valued fucntion in the other post.

In what follows, I will walk you through what I discussed for the above case. Suppose we are given

f(u,v)=u2+3v2,

g(x,y)=[excosyexsiny],

and the goal is finding the Jacobian matrix of the composite function.  Also, let the composite function be h.

Question: h is the composite function of f and g or g and f?

Answer: To clarify the above question, we are required to think of f and g as two mappings where f:R2R and g:R2R2.

Suppose h=fg.

Since g is the last function in the composite function, variables of h are going to be vaibales of g. Therefore, Jh(x,y)=Jf(g(u,v))Jg(x,y). According to the first and second bullet, Jf(g(u,v)) is a 1×2 matrix and Jg(x,y) is a 2×2 matrix. Hence, the result is a 1×2 matrix. However, if we flip f and g, and consider h as h=gf and we should have

Jh(u,v)?=Jg(f(u,v))Jf(u,y)

It is easy to see from the first bullet that we have a 1×2 for Jf(u,v) matrix but Jg(f(x,y)) is not defind since one should take the derivatives of all coordinates of g with respect to all varibales of g which are now coordinates of f; but f is a scalar-valued function and does not output the same dimension as g input needs. In effect, h=gf is not defined. Hence, by compostion function, we mean h=fg.

Let's first compute Jg(x,y) where g is a function of x,y and it is a vector-valued function so

Jg(x,y)=[g1xg1yg2xg2y]=[excosyexsinyexsinyexcosy]  

Now to compute Jf(g(u,v)) using the fifth bullet which yields
Jf(g(u,v))=Jf(z=(u,v))g(x,y)=[fufv]g(x,y)=[2u6v]g(x,y)=[2u6v][excosyexsiny]=[2excosy6exsiny]

Finally,

Jh(x,y)=Jfg(x,y)=Jf(g(u,v))Jg(x,y)=[2excosy6exsiny][excosyexsinyexsinyexcosy]=[2e2xcos2y+6e2xsin2y2e2xcosysiny+6e2xsinycosy]=2e2x[1+2sin2y2sinycosy]

 Conclusion: We went through all the steps of finding the Jacobian matrix of a composite function consists of two functions. It is one step left to apply all these result to the lsos function of a neural network.