Method of the least squares

General

The method of the least squares is a standard procedure to approximate a function like for instance a polynomial function to a set of reference points. There could as well another function be used to interpolate the reference points. It’s just a matter of setting up the matrix equation and the interpolation function.

I’m using a polynomial function here as I assume this is the most common way.

The number of unknown parameters in the searched function is lower than the number of reference points (a polynomial would have a lower order n than the number of reference points). That leads to an overdetermined system of equations. If we have for instance a set of 7 reference points (x and y) and want to approximate by a polynomial function of the 3. order, we can set up 7 equations like

for 3 independent values. Leading to a Matrix equation

Cx = d

The method of the least squares searches for the polynomial parameters c that lead to the smallest possible sum of the squares of the deviation of each equation.

One approach to solve this problem is the use of the normal equation.

Normal equation

The Normal equation was used by Carl Friedrich Gauss to calculate the elliptic orbit of stars by their observation points. It introduces an error-vector r (the residual vector) into the matrix equation from above and says: Cx – d = r Now a polynomial shall be found that makes the sum of the squares of all elements of r the smallest possible. The squares of all elements of r is basically r^T * r (the transposed Matrix of r multiplied by r). It took me quite some time to realize this :-)

With this conclusion that we are searching for the minimum of r^T * r the above equation can be extended by the transposed Matrixes on both sides and we get

Now we dissolve the brackets

For the next step we just look at the part containing the expression

With the Associative law for transposed matrixes A * B^T = (B * A^T)^T this expression can be converted to

Now there are 2 terms looking almost the same: (C^T*d)^T*x and the transposed of it. But almost the same is not enough :-)

The trick is now to realize that the whole equation for r^T * r yields just a number and not a vector nor a matrix and so these 2 terms each represent a single number too. This number can by transposed or not, it’s the same. And so these 2 terms can be taken as equal and we can write.

Now there is a small substitution to be made: Let’s say C^TC = A and C^Td = B and write:

And recall what we are looking for: The minimum for r^T * r (hope you still recall ) This minimum we get when the differentiation of the left part become 0. We have to differentiate 3 parts and start with the easiest: The differentiation of dTd becomes 0. The differentiation of 2b^Tx is a bit more difficult. As B = C^Td and d is a n dimensional vector, B is a n dimensional vector too and 2B^T * x becomes just a number.

The differentiation of this term is a n dimensional vector again with the n partial differentiations for x₁, x₂ to x_n. And that’s

2b₁
2b₂
..
2b_n

And that’s 2B

Now the third part x^TAx is the most complicate and so I build it first for a 3 * 3 matrix: As A = C^TC, A is symmetric to its main diagonal. So we can write

With this x^T * A becomes

And x^T * A * x is

With the brackets dissolved

This we differentiate partial for x₁, x₂ and x₃ and get a 3 dimensional vector again:

And as A is symmetric and therefore a_ij = a_ji, we can make a back substitution and get

The same would be valid for an n * n matrix too. That’s quite well visible I think :-)

Now we put all together and get for the entire differentiation

And this becomes 0 if Ax = B.

Or with the substitutions for A and B

That means the both sides of the overdetermined Matrix equation from above have to be multiplied by the transposed Matrix of C. That leads to a n * n Matrix equation with n = order of the searched polynomial. This new Matrix equation can be solved and that yields the correct result in the vector x.

The multiplication by the transposed Matrix converts the overdetermined Matrix equation to a well determined system. In my example further above with 7 reference points and 3. Degree polynomial C has 7 lines and 3 columns. Multiplying it by its transposed makes a 3 x 3 Matrix of it and d becomes a 3 dimensional vector too. This Matrix equation can be solved by the elimination method of Gauss easily.

So the first step is to fill the Matrix equation. In C we have to put the x values of our reverence points. The first column of C consists of ones only as the first parameter we are looking for is c₀ that is multiplied by 1. The second column gets x₁…x_n the third column (x₁)²…(x_n)² and so on.

I use a 2 dimensional array as matrix m and a array for d and x. So

for (i = 0; i < samples; i++)

{

d[i] = y_k[i];

for (k = 0; k < order; k++)

{

m[i, k] = Math.Pow(x_k[i], k);

}

fills the Matrix equation.

Now we have to multiply the equation by the transposed of C. The transposed Matrix of C is C mirrored at its main diagonal. In the sample the 3 x 7 Matrix becomes a 7 x 3 Matrix mirrored at the red marked main diagonal.

To multiply C^T with C the values 1 to n of a line of C^T have to be multiplied by the values 1 to n of a column of C and add together. So each value in the resulting matrix of the sample is the sum of 7 multiplications and we get a 3 x 3 matrix. The same is with d: The values of each line in C^T and multiplied by the values of d and add together. For this this we need a temporary matrix a.

for (i = 0; i < order; i++)

{

y[i] = 0.0;

for (k = 0; k < samples; k++)

y[i] = y[i] + m[k, i] * d[k];

for (j = 0; j < order; j++)

{

a[j,i] = 0.0;

for (k = 0; k < samples; k++)

{

a[j, i] = a[j, i] + m[k,j] * m[k, i];

}

Calculates transposed(C) * C and transposed(C) * d and puts the result in a. The Matrix equation in a can be solved by Gauss now and we get the result in x[0] .. x[n-1].

Using a Cholesky decomposition to solve the matrix equation

The Cholesky decomposition is a quite lean approach to solve a matrix equation with some boundary conditions. The main matrix must be symmetrical to its main diagonal and the elements of the main matrix must be positive and bigger than the square of the other elements in the same row. That’s the case here and so the Cholesky decomposition can be used to solve the matrix equation.
For a detailed description of the Colesky decomposition see (see Cholesky decomposation)

Special case n = m

So far we had the situation that the number of supporting points was bigger than the number of polynomial coefficients. Now let’s have a look at the special case that these two numbers are equal. That means n = m. In this case the approximated shape meats the supporting points exactly and therefore we should get an interpolation instead of approximation. In fact we get a polynomial interpolation. To see what happens now I use a 3 x 3 matrix equation. For a polynomial interpolation that looks like (see Polynomial interpolation)

Now C^T * C is

And C^T * d is

So we get the 3 equations

These do not really look like the one of the polynomial interpolation. But there is a way to see a relationship :-)

If we just add the 3 equations

Together we get

That’s already the first equation of the approximation. Now multiply the first one by t₁, the second by t₂ and the third by t₃ and add them together. We get

That’s the second one. And finally multiply equation 1 by t₁², equation 2 by t₂² and equation 3 by t₃² and add them all together. That makes

So we have all the 3 equations of the polynomial interpolation in our equations of the least squares. That means if we perform the method of the least squares in the special case of searching a polynomial of the same order as the number of supporting points we get the same parameters a₀, a₁, a₂ as we would get them in a polynomial interpolation with the same supporting points. That means we perform a polynomial interpolation in this case.

For a n*n matrix equation it’s basically the same. Only much more calculation effort and therefore we leave that :-)

Using orthogonal transformations

In the literature there is said that the use of the normal equation can lead to an inaccurate result. This is mainly caused by the multiplication of A * A^T what can lead to a bad condition of the resulting matrix and due to that rounding errors can occur in the following operations. O.k. I have not experienced that so far but anyhow: The use of orthogonal transformations is an interesting approach :-)

Using orthogonal transformations means basically using rotation matrixes by Givens (see RotationMatrix) and the basic idea is that the multiplication of a vector by an orthogonal matrix does not change the length of the vector and therefore the matrix equation with the residual vector

Cx – d = r

Can be transformed by these orthogonal matrixes without affecting the sum of the squares of the residua. It took me quite a while to figure out what that means. But now I think I got it :-)

If we extend the main matrix C which is in the sample from above a 3 * 7 matrix to get a square matrix and set the new elements all to 0. In our sample that looks like:

With this modified matrix the equation for the residua becomes

C’x – d = r

Now the entire equation can be multiplied by rotation matrixes to transform C’ into an upper trianglular matrix without affecting the sum of the squares of the residua. That’s an important fact.

The transformation runs iterative. A rotation matrix to eliminate one certain element in C’ is built, the multiplications are carried out and then the next rotation matrix is built to eliminate the next element in C’ till we have the upper triangular matrix looking like:

This is done in a function:

public void RunGivens(int maxOrder)
{
      int i, k;
      for (i = 0; i < maxOrder; i++)
      {
            for (k = i + 1; k < maxOrder; k++)
            {
                  if (m[k, i] != 0.0)
                  {
                        CalcRMatrix(i, k);
                        MultMatrix(r, i, k);
                  }
            }
      }
}

with

public void CalcRMatrix(int p, int q)
{
      int i, j;
      double radius;
      radius = Math.Sqrt(m[p, p] * m[p, p] + m[q, p] * m[q, p]);
      for (i = 0; i < m.size; i++)
      {
            for (j = 0; j < m.size; j++)
            {
                  r[i, j] = 0.0;
            }
            r[i, i] = 1.0;
      }
      if (radius != 0.0)
      {
            if (m[p, p] == 0.0)
            {
                  r[p, p] = 0.0;
                  r[p, q] = 1.0;
            }
            else
            {
                  r[p, p] = m[p, p] / radius;
                  r[p, q] = m[q, p] / radius;
            }
            r[q, q] = r[p, p];
            r[q, p] = -r[p, q];
      }
}

That builds the upper triangular matrix in r

I marked all the elements which normally are not 0 in an upper triangular matrix, but we are not interested in by “..” as these elements will become 0 in this case here anyway.

To build this upper triangular matrix we have to let p and q of the rotation matrix run from 1 to N basically. The vector d we have to multiply by the same rotation matrixes and principally r too. But as we are not really interested in r, we can leave that. Finally we have (in theory)

C’Qx – dQ = rQ

C’’x – d’ = r’

With Q as the product of all rotation matrixes. If we have a look into the transformed C now, we can see that the elements in the rows 4 to 7 are all 0. That means the elements 4 to 7 of r’ are quite fixed as element 4 to 7 of d’. We are looking for a solution of the matrix equation where the sum of the squares of the elements of r’ is the least. That’s the case when the elements 1 to 3 of r’ are 0.

That means we just have to solve this remaining matrix equation

by a backward substitution like

public void Solve()
{
      int k, l;
      for (k = maxOrder - 1; k >= 0; k--)
      {
            for (l = maxOrder - 1; l >= k; l--)
            {
                d[k] = d[k] - x[l] * m[k, l];
            }
            if (m[k, k] != 0)
                x[k] = d[k] / m[k, k];
            else

x[k] = 0;

}
}

and then we are done :-)

Some speed improvement

I made a small speed improvement here: In the sample from above, during transformation, we finally are interested in the elements of column 1 to 3 only, as we are looking for a polynomial of the order 3. If p is bigger than 3, the multiplication by the rotation matrix only affects the elements in the rows bigger than 3. So we can skip these multiplications and have p running only from 1 to order. That saves quite some calculation time. With this the function to transform the matrix C’ into an upper triangular matrix looks like:

public void RunGivens(int maxP, int maxQ)
{
      int i, k;
      for (i = 0; i < maxP; i++)
      {
            for (k = i + 1; k < maxQ; k++)
            {
                  if (m.c[k, i] != 0.0)
                  {
                        CalcRMatrix(i, k);
                        MultMatrix(r, i, k);
                  }
            }
      }
}

With maxP set to order and maxQ set to samples (see my sample project).

Compared the method of the least squares it’s a completely different approach. Isn’t that fascinating? :-)

I implemented an online solver to do a equalisation calculus by the method of the least squares. This solver can be found here :-)

C# Demo Projects Method of the least squares

Smallest_Squares.zip

Smallest_Squares_Cholesky.zip

Smallest_Squares_Givens.zip

Java Demo Projects Method of the least squares

LeastSquares.zip

LeastSquareCholesky.zip

LeastSpuaresGivens.zip