If a function with 2 input variables is differentiable, it’s derivation with respect to one variable is defined like:
data:image/s3,"s3://crabby-images/2bb85/2bb85da9bf3a3585c2201a1844f3ecd16e2a9e8d" alt="DiffCalc"
This is called partial derivative and can generally be written for a function of variables x1…xn:
data:image/s3,"s3://crabby-images/85789/85789141b5f10097ebb4ca44c8248b073f2a505c" alt="DiffCalc"
Of course, a derivation is not written like this. It is carried out in the same manner as the derivation of a function with just one variable is done (see Differential calculus). Only all the other variables in the function are regarded as constant for the derivation and this is done with respect to each included variable.For instance for the function
data:image/s3,"s3://crabby-images/e750a/e750aa17300d10672271c32817725d7f43b01f26" alt="DiffCalc"
data:image/s3,"s3://crabby-images/78083/78083ac3011af6d457fadde03507bd7c3c97a027" alt="DiffCalc"
and
data:image/s3,"s3://crabby-images/77f33/77f33c642b4ef8826875555073ed0c8a0a4cd8f7" alt="DiffCalc"
Gradient
All partial derivatives combined in one vector build the Gradient which is usually indicated by the Nabla-operator ∇:
data:image/s3,"s3://crabby-images/a1614/a161444e78cef30399af67f6d907d7525c676f48" alt="DiffCalc"
Directional derivative
The directional derivative should not get mixed up with the Gradient. The directional derivative adds the derivations with respect to each variable multiplied by a unit vector pointing into a certain direction. Its result is a single value.
With the unit vector a, that would look like:
data:image/s3,"s3://crabby-images/ed9d3/ed9d30bdba8e36230bd369df5fb2f4a81a0a5f74" alt="DiffCalc"
Hessian Matrix
If the function f(x1…xn) can be differentiated twice and all second derivatives are built, we get the Hessian Matrix:
data:image/s3,"s3://crabby-images/18027/18027caad6a872f6909682b085c1dfb5963eb7a1" alt="DiffCalc"
The Hessian matrix is symmetrical to the main diagonal as
data:image/s3,"s3://crabby-images/9c88c/9c88cfb935b5a80ab022e68e3d2cc0252cb289cc" alt="DiffCalc"
Taylor's theorem
According to Taylor each n times differentiable function can be expressed as a polynomial function at a place x0 plus remainder F(a):
data:image/s3,"s3://crabby-images/82a90/82a90e4b31a60156c8c154cad918255027f06793" alt="DiffCalc"
where
data:image/s3,"s3://crabby-images/a2d7e/a2d7ec31756c4f7be436df573744a19bf189f82f" alt="DiffCalc"
Is the k-th derivation of p at the place x0.
(see Taylor Polynomials)
This formulation can be extended to functions with more than one variable as well.
For a function with 2 variables x and y that means we have to replace
data:image/s3,"s3://crabby-images/28df4/28df423459a93a81af95cf5fdec60a6657196954" alt="DiffCalc"
For instance, for k = 2 that would be:
data:image/s3,"s3://crabby-images/51343/513430836401384422950bd85e0cf45f82a59247" alt="DiffCalc"
data:image/s3,"s3://crabby-images/10f3b/10f3b3896e741030c1d0d28a68235bb611931ff6" alt="DiffCalc"
Now, as
data:image/s3,"s3://crabby-images/2ce5d/2ce5d33de696f372241d3d586984ef6d0994fc6e" alt="DiffCalc"
With the Binominal coefficients
data:image/s3,"s3://crabby-images/1ae43/1ae43dd98a3c3900b7f7cccdd834db531953a923" alt="DiffCalc"
We can write
data:image/s3,"s3://crabby-images/43525/4352525fe9edc78c56f63fe3fe4b21fe3913609d" alt="DiffCalc"
data:image/s3,"s3://crabby-images/4d6a1/4d6a16b5649266898e8f0ab3cdf2895d27c69f38" alt="DiffCalc"
With the remainder:
data:image/s3,"s3://crabby-images/81f8e/81f8e76a8a72a86150a0afde61570efc5270cd07" alt="DiffCalc"
With (ξ,η) a not known point between (x0,y0) and (x,y)
If, for instance, the function
data:image/s3,"s3://crabby-images/d7e4b/d7e4b6b31ee640acf78874be65436bb8f0f3c85e" alt="DiffCalc"
With (ξ,η) a not known point between (x0,y0) and (x,y) shall be approximated at the point x = 0.3 and y = 0.3 by the Taylor function built an the position x0 = 0 and y0 = 0, the first derivations are:
data:image/s3,"s3://crabby-images/d120b/d120b514ca854a428418acb2cab1b2b5e1992696" alt="DiffCalc"
and the second:
data:image/s3,"s3://crabby-images/65878/65878d06dea2e4b5b84ac469b6a2cb712aa6fb3e" alt="DiffCalc"
With these the approximation becomes:
data:image/s3,"s3://crabby-images/98671/9867128d24b46cfe9802a1064d95b3c98a2d72fa" alt="DiffCalc"
data:image/s3,"s3://crabby-images/7493c/7493c6008ad12e99c98e06f653bb0021050a74a3" alt="DiffCalc"
Whereas the origin function:
data:image/s3,"s3://crabby-images/90c37/90c37e3f36f084c655fd0fa34344f06ca2bc1354" alt="DiffCalc"
For a function with n variables things become really complicate
data:image/s3,"s3://crabby-images/68ed3/68ed3376bdf4868118cea911ff3b36a32a9e91b4" alt="DiffCalc"
Let’s leave that like this
data:image/s3,"s3://crabby-images/687bd/687bda72167c5a433be3cd710c2431e9af6f225c" alt=":-)"
If m = 2, the Tailor polynomial is the quadratic approximation of p.
With the direction vector
data:image/s3,"s3://crabby-images/4baa1/4baa11a881950d77669a5b6a0d77e0eb173a3462" alt="DiffCalc"
the Gradient of f
data:image/s3,"s3://crabby-images/dc8f1/dc8f1d55c730d1edc2b5cb86ab66fc8abb7ef374" alt="DiffCalc"
and the Hessian matrix
data:image/s3,"s3://crabby-images/603e1/603e1ba1c9e34d1f60a9d98d0220985b5586e608" alt="DiffCalc"
The formulation for the quadratic approximation can be written as:
data:image/s3,"s3://crabby-images/e28be/e28bec6bd445a4ecd106b6db1f998f60433bfe41" alt="DiffCalc"
A formulation often mentioned in books about machine learning which looks a bit simpler with a few Matrix operations and it can easily be extended to n variables
data:image/s3,"s3://crabby-images/687bd/687bda72167c5a433be3cd710c2431e9af6f225c" alt=":-)"