Householder transformation

Basically

Alston Scott Householder was a real genius I think. The Householder transformation he invented is a master piece of mathematics :-)

The Householder transformation is an orthogonal transformation to eliminate more than one element of a given column in a matrix. It builds a transformation matrix

U = I – 2 * ω * ω^T

With I as the unit matrix and ω as a direction vector whit the length of 1. It’s quite a cool approach to solve a matrix equation and a very smart one if it comes to the method of the least squares to approximate some data points.

Most of the descriptions say what the transformation matrix does is basically mirroring a vector by the hyperplane which is orthogonal to ω and running through the origin. In my opinion this can be misleading a bit. At least me it was misguiding in the beginning :-)

The vector is not mirrored on the surface of the hyperplane as a light beam would be mirrored on the mirror of a bathroom. The hyperplane is just a plane and it is actually the axis for the mirroring. That means the vector is flapped to the other side of this hyperplane. That’s an important fact in my opinion.

The direction of this plane is chosen that way, that all the elements of the resulting vector will be 0 except the first one.

Therefore for a first step we have a look at a vector s which is orthogonal to ω. That means it is lying flat on the hyperplane. This we multiply by the Householder matrix and get:

s’ = Us = (I – 2ωω^T) * s = s - 2ω (ω^Ts) = s

Remark that ω^T * s is 0 because of the orthogonality of ω and s. So the transformation does not change anything on this vector.

For the second step we have a look at a vector z which is parallel to ω. It’s standing upright on the hyperplane. So it can be substituted by a constant multiplied by ω: z = cω

z’ = Uz = cUω = c(I – 2ωω^T)ω = c(ω - 2 ω(ω^Tω)) = - cω = -z

With I * ω = ω and ω^Tω = 1. The transformation switches the direction of this vector.

Now any vector x can be regarded as a sum of s and z: X = s + z and the transformation of this vector x is:

x’ = Ux = U(s + z) = s – z

The s part of this vector, which is lying flat on the hyperplane, is not affected and the z part which is upright on the hyperplane switches the sign.

Graphically that looks like this: :-)

Whit the blue rectangle marking a part of the hyperplane.
(I didn’t mark the coordinate origin. It’s where ω, x and x’ cross)

Calculating the parameters

Now the interesting part is how to get to the parameters of ω. Therefore we start with the first row of the main matrix in the matrix equation

M * x = y

This row we regard as a vector m₁. This vector shall be transformed into a vector γ * e₁ in which only the first element is not 0. It consists of the unit vector e₁ multiplied by a constant γ. So the equation for the first transformation matrix is

U₁ * m₁ = γ * e₁

and as e1 has the length of 1, γ must be the length of m1:

But as a root always has 2 solutions, γ could be positive as well as negative. So it is taken as positive if m₁₁ is positive and negative if m11 is negative. This should be to avoid extinction (o.k. :-)

)

That was the first step. For the next step it might be helpful to have a look onto the graphical situation in a 2 dimensional room:

To rotate the vector x to x’ we would basically need to find the location of the hyperplane (the blue line here) to mirror x. In the 2 dimensional room that would be the angle bisector between x and x’. In the 3 dimensional room it’s the same and in a more dimensional root too. Even if we can not imagine that anymore. But finally we are not interested in the hyperplane but in ω which occurs in the Householder matrix. The vector ω is orthogonal to the hyperplane and its direction is the same as x – x’ and as x’ should be γ * e₁ and x is m₁, the direction of ω will be m₁ - γ * e₁ and as γ can be negative as well, we can say m₁ + γ * e₁ for the direction as well.

Now there is some substitution: m₁ + γ * e₁ = h and we need to calculate the correction of the length for ω as m₁ + γ * e₁ or h has the same direction, but not the same length jet, we need to fix that. Now there is an interesting trick: The length of h is:

and this is the same as

the transposed matrix of h multiplied by h.

Now this written with m₁ + γ * e₁

and expanded

In the special case here we can say

and get

and

and so

and with this we get for ω

as vector and the elements

For k = 1 ... N

Now we can put this all into an algorithm to calculate the elements of U = I – 2 * ω * ω^T For a 5 * 5 matrix that looks like:

This matrix multiplied by the matrix M eliminates all the elements in the first row except the element m₁₁:

To eliminate the elements m₃₂, m₄₂ and m₅₂ in the second row now, we have the do the same again using the matrix m₂₂ to m₅₅.

And so on

This I put into a C# function;

public void RunHouseholder(int col)
{
     int i, j;
     double lamda = 0;
     double beta;
     double[] w = new double[samples];
     double[,] I = new double[samples, samples];

     // calc parameters
     for (i = col; i < samples; i++)
     {
         lamda = lamda + m[i, col] * m[i, col];
     }
     lamda = Math.Sqrt(lamda);
     if (m[col, col] < 0)
         lamda = -lamda;
     beta = Math.Sqrt(2 * lamda * (lamda + m[col, col]));
     w[col] = (lamda + m[col, col]) / beta;
     for (i = col + 1; i < samples; i++)
         w[i] = m[i, col] / beta;

     //calc matrix parameters
     for (i = 0; i < samples; i++) // unit matrix
     {
         for (j = 0; j < samples; j++)
         {
              I[i, j] = 0.0;
         }
         I[i, i] = 1.0;
     }
     for (i = 0; i < samples; i++)
     {
         if (i >= col)
         {
              for (j = 0; j < samples; j++)
              {
                   if (j >= col)
                        r[i, j] = I[i, j] - 2 * w[i] * w[j];
                   else
                        r[i, j] = I[i, j];
              }
         }
         else
         {
              for (j = 0; j < samples; j++)
                   r[i, j] = I[i, j];
         }
     }
     MultMatrix(r);
}

With

public void MultMatrix(double[,] m1)
{
     int i, j, k;
     double[,] al = new double[size, size];
     double[] yl = new double[size];
     for (i = 0; i < size; i++)
     {
         yl[i] = 0.0;
         for (j = 0; j < size; j++)
         {
              yl[i] = yl[i] + (y[j] * m1[i, j]);
              al[i, j] = 0.0;
              for (k = 0; k < size; k++)
              {
                   al[i, j] = al[i, j] + (m[k, j] * m1[i, k]);
              }
         }
     }
     for (i = 0; i < size; i++)
     {
         y[i] = yl[i];
         for (j = 0; j < size; j++)
         {
              m[i, j] = al[i, j];
         }
     }
}

This function needs to be called N-1 times (MaxOrder stands for N here). Then the main matrix C is converted to an upper triangle matrix.

for (i = 0; i < MaxOrder-1; i++)
RunHouseholder(i);

And the matrix equation C’’x = y can be solved be a backward substitution.

public double[] Solve()
{
     int k, l;
     double[] x = new double[maxOrder];
     for (k = maxOrder - 1; k >= 0; k--)
     {
         for (l = maxOrder - 1; l > k; l--)
         {
              y[k] = y[k] - x[l] * m[k, l];
         }
         if (m[k, k] != 0)
              x[k] = y[k] / m[k, k];
         else
              x[k] = 0;
     }
     return x;
}

A beautiful algorithm. Isn’t it :-)

Some related articles:

Cholesky decomposition

LU decomposition

Singular value decomposition

Gram-Schmidt QR decomposition

C# Demo Project Householder TransformationCardano

Matrix_Householder.zip

Java Demo Project Householder TransformationCardano

MatrixHouseholder.zip