QR Factorization

A QR factorization is a factorization of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. This kind of decomposition is useful in solving linear least squares problems and in the eigendecomposition of a matrix, which shows the structure of the matrix in terms of its eigenvalues and eigenvectors.

QR factorization is a factorization of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. Since it reveals very useful information about the matrix A, it is a very important factorization. It is used in solving linear least squares problems and in the eigendecomposition of a matrix, which shows the structure of the matrix in terms of its eigenvalues and eigenvectors.

Definition

Recall that a matrix QQ is an orthogonal matrix if QTQ=IQ^T Q = I. If QQ is orthogonal, then Qx2=x2||Qx||_2 = ||x||_2 for all xx; that is, QQ does not change the length of vectors. The operation that don’t change the length of vectors are rotations and reflections, so an orthogonal matrix can then be thought of as a map that combines rotations and reflections.

QR factorization. Let AA be an m×nm \times n real matrix where mnm \geq n. Then there is an orthogonal matrix QQ and an upper-triangular matrix RR so that A=QRA = QR. This is called the QR factorization.

When AA is a complex matrix, we can still write A=QRA = QR, where QQ is unitary instead of orthogonal.

For the rest of this chapter, we will assume that AA is real, but it’s important to know that QRQR decomposition works for complex matrices as well.

The QRQR decomposition is a fundamentally important matrix factorization. It is straightforward to implement, is numerically stable, and provides the basis of several important algorithms. In this lab we explore several ways to produce the QR decomposition and implement a few immediate applications.

The QR decomposition of a matrix AA is a factoration

A=QRA =QR

where QQ has orthonormal columns and RR is upper triangular. Every m×nm \times n matrix AA of rank nmn \leq m has a QR decomposition, with two main forms.

-Full QR: Q is m×mm \times m, R is m×nm \times n. In this case, the columns {qj}j=1m\{q_j\}_{j=1}^m of Q form an orthonormal basis for all of FmF^m, and the last mnm-n rows of R only contain zeros.

We distinguish between these two forms by writing Q^\hat{Q} and R^\hat{R} for reduced decomposition and Q and R for the full decomposition.

floating number illustrated
Figure 1. The illustration of QR factorization.

Different forms of the QR factorization, depending on the matrix dimensions, are shown in the following figure:

floating number illustrated
Figure 2. The illustration of QR factorization.

The QR factorization has many applications:

  1. The QR factorization can be used to solve least-squares problems, that is, problems of the form minxAxb2\min_x ||Ax - b||_2, where AA is a tall skinny matrix.

  2. The QR factorization is used as part of the eigenvalue and singular value algorithms. We’ll see these in Chapter 5.

  3. The QR factorization is also used to in iterative methods to solve linear system and compute eigenvalues (e.g., Krylov methods). We’ll see these in Chapter 6.

Least-squares problems. To see why the QR decomposition might be useful (and to get a taste of what’s to come in section 4.4 below), let’s look briefly at the least-squares problem. let ARm×n,mnA \in \mathbb{R}^{m \times n}, m \geq n. We’d like to find the xx such that AxAx is closest to bb in Euclidean distance. That is,

x=argminxAxb2x^* = \arg \min_x ||Ax - b||_2

To do this, we’ll use the QR factorization (with QQ square):

Axb2=QT(Axb)2=QT(QRxb)2=RxQTb2||Ax - b||_2 = ||Q^T (Ax - b)||_2 = ||Q^T(QRx - b) ||_2 = ||Rx - Q^Tb||_2

Above, we used the fact that for any orthogonal matrix QQ and any vector xx, we have

Qx2=x2||Qx||_2 = ||x||_2

This is because

Qx22=(Qx)T(Qx)=xTQTQx=xTx=x22||Qx||_2^2 = (Qx)^T(Qx) = x^T Q^T Q x = x^T x = ||x||_2^2

Finding xx that minimizes RxQTb2||Rx - Q^Tb||_2 turns out to be much easier than finding xx that minimizes QRxb2||QRx - b||_2.

Householder reflections

First, let’s consider the QR decomposition when QQ is square. One of the most reliable methods for calculating QRQR factorization in this case is to use Householder reflections. This method is computationally efficient, robust, and accurate. In many ways, the idea behind this approach is similar to the LU factorization. For A=LUA = LU, we rewrote the equation as

L1A=UL^{-1}A = U

That is, we asked the question: is there a lower-triangular matrix that can transform AA into an upper-triangular matrix?

Now we ask: is there an orthogonal orthogonal matrix that can transform AA into a upper-triangular matrix:

QTA=RQ^T A = R

As before, our goal is to create zeros below the diagonal, and we begin with the first column.

floating number illustrated
Figure 3. The illustration of Householder reflections step 1.

Householder reflection. We need to apply an orthogonal transformation Q1TQ_1^T to transform the first column into a vector in the direction of e1e_1 (meaning xx-axis in the dimension of two). Let’s write

A=[a1an]A = [a_1 | \cdots |a_n]

Then Q1Ta1Q_1^T a_1 should be parallel to e1e_1. Since Q1Q_1 does not change the norm of a1a_1, we must have

floating number illustrated
Figure 4. The illustration of Householder reflections step 2.

How should we choose Q1Q_1? One logical choice would be a rotation that maps a1a_1 parallel to e1e_1; however, it turns out that rotations in high dimensions are not so easy to set up. Thus, we will instead to choose Q1Q_1 to be a reflection that maps a1a_1 parallel to e1e_1. The picture looks like this:

floating number illustrated
Figure 5. The illustration of Householder reflections.

Now that we have an idea of what the reflection should be doing, let’s figure out a mathematical formula to capture it.

To begin, we suppose that we can find the hyperplane and its orthogonal vector vv. And this reflection could be defined by this vector vv. If we have the vector vv and its corresponding hyperplane that sits in the middle of xx and e1e_1, we could

y=vTxvTvvy = \frac{v^Tx}{v^Tv} v

x2y=(IvvTvTv)xx-2y = (I - \frac{vv^T}{v^Tv}) x

floating number illustrated
Figure 5. The illustration of Householder reflections.

Reflections. Let HH be the matrix which represents reflection over the hyperplane orthogonal to some vector vv. Then HH is given by

H=IβvvT,withβ=2vTvH = I - \beta v v^T, \quad \text{with} \quad \beta = \frac{2}{v^Tv}

We need to pick v correctly to arrive at hour Householder reflections. We need to find a formula for vv that will reflect a given vector xx onto xe1||x||e_1. The following geometric argument shows that

v=xxe1v = x- ||x||e_1

will do the trick.

floating number illustrated
Figure 6. The illustration of Householder reflections.

To see this formally, recall that we have

Hx=(IβvvT)x=xβ(vTx)vHx = (I - \beta v v^T) x = x - \beta (v^T x) v

Therefore, we conclude

β(vTx)v=xxe1\beta(v^T x) v = x - ||x||e_1

Since the norm of vv does not matter and vv should parallel to xxe1x - ||x||e_1, then we can just pick

v=xxe1v = x- ||x|| e_1

Householder reflections. The Householder reflection that maps xx to xe1||x||e_1 is given by

H=IβvvTH = I - \beta v v^T

where v=xxe1v = x - ||x||e_1 and β=2vTv\beta = \frac{2}{v^Tv}.

Lemma: For xRnx \in \mathbb{R}^n, if v=x+xe1v=x+\|x\| e_1 is nonzero, then Hv(x)=xe1H_v(x)=-\|x\| e_1, and if v=xxe1v=x-\|x\| e_1 is nonzero, then Hv(x)=xe1H_v(x)=\|x\| e_1.

Proof. The argument for the two cases is similar; we show the argument for v=x+v=x+ xe10\|x\| e_1 \neq 0. For x1x_1 the first entry of xx we have vTv=(x+xe1)T(x+xe1)=xTx+xxTe1+xe1Tx+x2e1Te1=2x2+2xx1=2x(x+x1), \begin{aligned} v^T v & =\left(x+\|x\| e_1\right)^T\left(x+\|x\| e_1\right) \\ & =x^T x+\|x\| x^T e_1+\|x\| e_1^T x+\|x\|^2 e_1^T e_1 \\ & =2\|x\|^2+2\|x\| x_1=2\|x\|\left(\|x\|+x_1\right), \end{aligned} and vvTx=(x+xe1)(x+xe1)Tx=xxTx+xxe1Tx+xe1xTx+x2e1e1Tx=x2x+xx1x+x3e1+x2x1e1=x(xx+x1x+x2e1+xx1e1)=x(x+x1)x+x(x+x1)e1)=x(x+x1)(x+xe1). \begin{aligned} v v^T x & =\left(x+\|x\| e_1\right)\left(x+\|x\| e_1\right)^T x \\ & =x x^T x+\|x\| x e_1^T x+\|x\| e_1 x^T x+\|x\|^2 e_1 e_1^T x \\ & =\|x\|^2 x+\|x\| x_1 x+\|x\|^3 e_1+\|x\|^2 x_1 e_1 \\ & =\|x\|\left(\|x\| x+x_1 x+\|x\|^2 e_1+\|x\| x_1 e_1\right) \\ & \left.=\|x\|\left(\|x\|+x_1\right) x+\|x\|\left(\|x\|+x_1\right) e_1\right) \\ & =\|x\|\left(\|x\|+x_1\right)\left(x+\|x\| e_1\right) . \end{aligned} Then Hv(x)=x2vvTxvTv=x2x(x+x1)(x+xe1)2x(x+x1)=x(x+xe1)=xe1. \begin{aligned} H_v(x) & =x-2 \frac{v v^T x}{v^T v} \\ & =x-2 \frac{\|x\|\left(\|x\|+x_1\right)\left(x+\|x\| e_1\right)}{2\|x\|\left(\|x\|+x_1\right)} \\ & =x-\left(x+\|x\| e_1\right) \\ & =-\|x\| e_1 . \end{aligned}

Roundoff errors. This is a practical issue. we have shown that the first component of vv is

v1=x1xv_1 = x_1 - ||x||

while component i2i \geq 2 is xix_i. However, if x1>0x_1 > 0 and x22++xn2<<x12x_2^2 + \cdots + x_n^2 << x_1^2, the subtraction x1xx_1 - || x|| could lead to large cancellation errors.

large cancellation error or catastropic cancellation is the phenomenon that subtracting good approximations to two nearby numbers many yield a very bad approximation to the difference of the original numbers.

Remark: we care relative error when it comes to cancellation error (the absolute error might be small when you have 60 digits but the relative error could be huge and it might change the results dramatically once those errors were propagated, see this)

Although this sounds like a remote possibility, in practice this happends very often when we use the QRQR decomposition in the algorithm to compute eigenvalues (which we cover in the next chapter). To fix this, in that case we can project xx onto e1-e_1; v1v_1 is then given by

v1=x1+xv_1 = x_1 + ||x||

This choice for v1v_1 does not lead to large roundoff errors when x1>0x_1 > 0.

Before we implement the algorithm, let’s summarize what we have done:

(xxe1)T(x+xe1)=0 (x - ||x|| e_1)^T (x + ||x|| e_1) = 0

There is a special case we have to take it carefully. For a vector xx, if

(x[1:n])T(x[1:n])=0(x[1:n])^T (x[1:n]) = 0

since we do the following projection:

v=xxe1={x+xe1if x1>0xxe1if x1<00if x1=0 v = x - ||x|| e_1 = \begin{cases} x + ||x|| e_1 & \text{if} \ x_1 > 0 \\ x - ||x|| e_1 & \text{if} \ x_1 < 0 \\ 0 & \text{if} \ x_1 = 0 \end{cases}

It is not difficcult to show that vTv=0v^T v = 0 (meaning β=0\beta = 0) whenever (x[1:n])T(x[1:n])=0(x[1:n])^T (x[1:n]) = 0.

def house(x: jnp.ndarray):
    """
    Householder projection for the vector x 
    It computes beta and v for the Householder reflection:
    P = I - beta v v^T 

    Parameters
    ----------
    x: jnp.ndarray, shape (n, ) or (n, 1)

    Returns
    ----------
    beta: scalar 
    v: jnp.ndarray, shape (n, 1)
    Note: the algorithm from Golub and VanLoan is slight different
    as they normalize V into [1, v_2, ..., v_n]
    e.g. https://fa.bianp.net/blog/2013/householder-matrices/
    """

    x = jnp.asarray(x, dtype=jnp.float32).reshape(-1, 1)
    v = x.copy()

    # calculate the sigma
    sigma = jnp.linalg.norm(x[1:, 0]) ** 2
    # if sigma == 0, it means x is alread on e1
    if sigma == 0:
        beta = 0 
        # since x is alread on e1 we could just return its copy
        return beta, v 
    # calculate ||x||
    x_norm = jnp.sqrt(x[0, 0]**2 + sigma)

    if x[0, 0] > 0:
        # update v 
        v = v.at[0, 0].set(x[0, 0] + x_norm)
    else:
        v = v.at[0, 0].set(x[0, 0] - x_norm)

    beta = 2.0  / (v[0, 0]**2 + sigma)

    return beta, v


key = random.PRNGKey(333)
size = 5
x = random.randint(key, (size, ), -10, 10)
x
# Array([-3,  4, -4,  5, -9], dtype=int32)
beta, v = house(x)
print(beta, '\n', v)
# 0.0054533635 
#  [[-15.124355]
#  [  4.      ]
#  [ -4.      ]
#  [  5.      ]
#  [ -9.      ]]

Now we can verify

β(vTx)=1,Hx=xe1\beta (v^T x) = 1, \quad Hx = ||x|| e_1

beta * (v.T @ x)
# Array([0.9999999], dtype=float32)
H = np.eye(size) - beta * np.dot(v, v.T)
print(np.round(H.dot(x) / np.linalg.norm(x), decimals=15))
# [ 1.  0. -0.  0.  0.]
# test the accuracy of house
e1 = jnp.array([1.0, 0, 0, 0])
e1
# Array([1., 0., 0., 0.], dtype=float32)
beta, v = house(2*e1)
print(beta, '\n', v)
# 0 
#  [[2.]
#  [0.]
#  [0.]
#  [0.]]
H = np.eye(4) - beta * np.dot(v, v.T)
print(np.round(H.dot(e1) / np.linalg.norm(e1), decimals=15))
# [1. 0. 0. 0.]

def gvsn(n):
    """
    Generate a random vector with small norm
    """
    x = np.random.normal(size=n)
    x -= (x.mean()/2)
    return x / 1000

r = gvsn(4)
r
# array([ 0.0001777,  0.0003931, -0.0003471,  0.0017381])
r = jnp.array([0.0001777,  0.0003931, -0.0003471,  0.0017381])
np.linalg.norm(r)
# 0.0018241642
# e1 + r (a random vector with a very norm)
x = e1 + r
x 
# Array([ 1.0001777,  0.0003931, -0.0003471,  0.0017381], dtype=float32)
beta, v = house(x)
print(beta, '\n', v)

# 0.49982107 
#  [[ 2.0003572]
#  [ 0.0003931]
#  [-0.0003471]
#  [ 0.0017381]]
H = np.eye(4) - beta * np.dot(v, v.T)
print(np.round(H.dot(x) / np.linalg.norm(x), decimals=15))

# [-0.9999999  0.        -0.         0.       ]

For x=e1+rx = e_1 +r where rr is a random vector with a very small norm. In this case, we have x1>0x_1 > 0 and x22++xn2<<x12x_2^2 + \cdots + x_n^2 << x_1^2, to avoid the large cancellation errors, we project xx onto e1-e_1, which is

[0.9999999000] \begin{bmatrix} -0.9999999 & 0 & -0 & 0 \end{bmatrix}

In summary, we could

def house2(x: jnp.ndarray):
    """
    Householder projection for the vector x 
    It computes beta and v for the Householder reflection:
    P = I - beta v v^T 

    Parameters
    ----------
    x: jnp.ndarray, shape (n, ) or (n, 1)

    Returns
    ----------
    beta: scalar 
    v: jnp.ndarray, shape (n, 1)
    Note: the algorithm is based on Golub and VanLoan 5.1.1
    """

    x = jnp.asarray(x, dtype=jnp.float32).reshape(-1, 1)
    v = x.copy()

    # calculate the sigma
    sigma = jnp.linalg.norm(x[1:, 0]) ** 2
    # if sigma == 0, it means x is alread on e1
    if sigma == 0:
        beta = 0
    else:
        # calculate ||x||
        x_norm = jnp.sqrt(x[0, 0]**2 + sigma)

        if x[0, 0] <= 0:
            # update v 
            v = v.at[0, 0].set(x[0, 0] - x_norm)
        else:
            temp = - sigma / (x[0, 0] + x_norm)
            v = v.at[0, 0].set(temp)
    
        beta = 2.0 * (v[0, 0] ** 2) / (v[0, 0]**2 + sigma)
        temp = v / v[0, 0]
        v = v.at[:, :].set(temp)
        
    return beta, v


beta, v = house2(x)
print(beta, '\n', v)

# 1.6474086e-06 
#  [[    1.     ]
#  [ -238.57436]
#  [  210.65672]
#  [-1054.8616 ]]

H = np.eye(4) - beta * np.dot(v, v.T)
print(np.round((H @ x) / np.linalg.norm(x), decimals=15))

# [ 0.9999999 -0.         0.        -0.       ]

Note: The algorithm from Golub and VanLoan always project xx onto e1e_1 (Golub & Van Loan, 2013). In our post, we will simplify the following

v=xxe1={x+xe1if x1>0xxe1if x1<00if x1=0 v = x - ||x|| e_1 = \begin{cases} x + ||x|| e_1 & \text{if} \ x_1 > 0 \\ x - ||x|| e_1 & \text{if} \ x_1 < 0 \\ 0 & \text{if} \ x_1 = 0 \end{cases}

into

v=x+sign(x1)xe1 v = x + \text{sign}(x_1) ||x|| e_1

If we are dealing with a matrix AA, then the first Householder vector should be

v1=a1+sign(a11)a1e1v^1 = a_1 + \text{sign}(a_{11}) ||a_1|| e_1

Iterate this idea. In a manner similar to the LU factorization, we can apply a series of Householder transformation to progressively reduce AA to upper-triangular form, with first zeroing entries in the first column, the second column, etc. In the end, for ARm×n,mnA \in \mathbb{R}^{m \times n}, m \geq n, we have

Qn1TQ1TA=RQ_{n-1}^T \cdots Q_1^T A = R

which is equivalent to

A=Q1Qn1R=QRA = Q_1 \cdots Q_{n-1} R = QR

This is our QR factorization. Note that QRm×nQ \in \mathbb{R}^{m \times n}, and RR has zeros in the last mnm-n rows if mnm \geq n.

The code to implement this algorithm is given below. In the code, instead of allocating memory for QQ and RR, we implicitly store the QR decomposition like this:

floating number illustrated
Figure 7. The illustration of Householder reflections.

Here are the details of our algorithm, we need to do Householder transformation on AA such as

HA=(IβvvT)A=Aβv(vTA)HA = (I - \beta v v^T)A = A - \beta v (v^T A)

def qr_factorization(A: jnp.ndarray):
    """
    QR factorization with Householder transformation

    Parameters
    ----------
    A: jnp.ndarray shape (m, n), assuming m >= n
        if m < n, we will transpose the matrix

    Returns
    ----------
    factor R
    A with factor R in upper-triangular part and
    Lower-triangular part of A is sequence of v vectors  
    """

    m, n = A.shape
    A = A.astype(jnp.float32)  # (m, n)

    if m < n:
        A = A.T
        m, n = A.shape

    # loop over columns as we will transform column vector
    for i in range(n):
        beta, v = house(A[i:, i])  # v.shape = (m-i, 1)
        # calculate beta v (v^T A)
        # A[i:, i:].shape = (m-i, n-i)
        # temp.shape = (m-i, n-i)
        temp = beta * v @ (v.T @ A[i:, i:])  
        # update A for R part
        A = A.at[i:, i:].set(A[i:, i:]-temp)
        # update A for Q part
        # saving v in the lower-triangular part of A
        # since v[0] = 1, there is no need to store it
        A = A.at[i+1:, i].set(v[1:, 0])
    
    return jnp.tril(A.T).T, A


A = jnp.array(
    [
        [1, -1, 4],
        [1, 4, -2],
        [1, 4, 2],
        [1, -1, 0]
    ]
)
A

# Array([[ 1, -1,  4],
#        [ 1,  4, -2],
#        [ 1,  4,  2],
#        [ 1, -1,  0]], dtype=int32)

np.linalg.qr(A)

# (array([[-0.5,  0.5, -0.5],
#         [-0.5, -0.5,  0.5],
#         [-0.5, -0.5, -0.5],
#         [-0.5,  0.5,  0.5]]),
#  array([[-2., -3., -2.],
#         [ 0., -5.,  2.],
#         [ 0.,  0., -4.]]))

qr_factorization(A)

# (Array([[-2.       , -3.       , -2.       ],
#         [ 0.       , -5.       ,  1.9999995],
#         [ 0.       ,  0.       , -4.       ],
#         [ 0.       ,  0.       ,  0.       ]], dtype=float32),
#  Array([[-2.       , -3.       , -2.       ],
#         [ 1.       , -5.       ,  1.9999995],
#         [ 1.       ,  3.3333333, -4.       ],
#         [ 1.       , -1.6666667, -3.2000003]], dtype=float32))

A = jnp.array(
    [
        [0.5, 0.903281 ,  1.10219 , 1.09724],
        [0.5, 0.520598 , -0.152935,  -0.767982],
        [0.5,  -0.0205981,  -0.513732,   0.267982],
        [0.5,  -0.403281,    0.231146 , -0.0972388]
    ]
)

R, Av = qr_factorization(A)
R 
# Array([[-1.       , -0.4999999, -0.3333346, -0.2500006],
#        [ 0.       , -0.9999994, -0.666668 , -0.5000008],
#        [ 0.       ,  0.       ,  1.0000012,  0.7500015],
#        [ 0.       ,  0.       ,  0.       , -0.9999991]], dtype=float32)

Q, R = jnp.linalg.qr(A)
R

# Array([[-0.9999999, -0.5      , -0.3333346, -0.2500006],
#        [ 0.       , -0.9999993, -0.6666681, -0.5000008],
#        [ 0.       ,  0.       ,  1.0000014,  0.7500015],
#        [ 0.       ,  0.       ,  0.       , -0.9999993]], dtype=float32)

You may notice that we do not explicitly calculate the Q part in our algorithm. To calculate QQ we need a slightly different algorithm. Let’s review our formula:

HA=(IβvvT)A=Aβv(vTA)HA = (I - \beta v v^T)A = A - \beta v (v^T A)

where the vector vv is constructed by

v=x±xe1v = x \pm ||x|| e_1

where the sign was important for the first element as it determines which direction (e1e_1 or e1-e_1) it will project xx onto.

The scale of vv does not matter. Therefore, we could normalize vv as unit vector then we have

β=2vTv=2\beta = \frac{2}{v^Tv} = 2

def qr_factorization2(A: jnp.ndarray) -> jnp.ndarray:
    """
    QR factorization with Householder transformation

    Parameters
    ----------
    A: jnp.ndarray, shape (m, n), assuming m >= n

    Returns
    ---------
    Q: Q factor, shape (m, m)
    R: R factor, shape (m, n)
    """
    m, n = A.shape
    A = A.astype(jnp.float32)
    R = A.copy()
    Q = jnp.eye(m)

    # construct sign function
    sign = lambda x: 1 if x >= 0 else -1

    # loop over columns
    for i in range(n):
        # construct column vector v from R based on formula
        # v = x + sign(x_1) ||x|| e_1
        # (as R is copy of A now) 
        v = R[i:, i].reshape(-1, 1)
        v_norm = jnp.linalg.norm(v)
        v = v.at[0, 0].set(v[0, 0] + sign(v[0, 0]) * v_norm)
        # normalize v
        v = v.at[:, :].set(v/jnp.linalg.norm(v))

        # update R and Q
        R = R.at[i:, i:].set(R[i:, i:] - 2 * v @ v.T @ R[i:, i:])
        Q = Q.at[i:, :].set(Q[i:, :] - 2 * v @ v.T @ Q[i:, :])

    
    return Q.T, R

Givens rotations

Householder transforms are the big hammer of orthogonal transforms. They are efficient at creating a lot of zeros in a given column. However, consider the problem of computing the QR factorization of a matrix AA that looks like this:

floating number illustrated
Figure 8. The illustration of Householder reflections.

That is, Ai,jA_{i, j} is zero for all indices such that i>j+1i > j +1. (Such a matrix is called an upper Hessenberg matrix). In each step, we only need to zero out a single entry (the one right below the diagonal), and the Householder transforms are overkill.

Since we only need to zero out a single entry, we need a small to do do less transformation. It turns out that 2D rotation will work for upper Hessenberg matrix.

Given rotations. Instead we need a small srewdriver that is adapted to small jobs.

The problem can be reduced to considering a 2D vector u=(u1,u2)u = (u_1, u_2) and finding a rotation GTG^T such that the vector becomes aligned with +e1+e_1.

A Given rotation which rotates u=(u1,u2)Tu = (u_1, u_2)^T to ue1||u|| e_1 is the 2×22 \times 2 matrix defined by

GT=[cssc],c=u1u2,s=u2u2 G^T = \begin{bmatrix} c & -s \\ s & c \end{bmatrix}, c = \frac{u_1}{||u||_2}, s = \frac{-u_2}{||u||_2}

def givens(u1, u2):
    """
    Givens rotation for upper Hessenberg matrix

    Parameters
    ----------
    u1, u2 are float numbers (jnp.float32)
    
    Returns
    -------
    two float numbers c s 
    """

    if u2 == 0:
        c = 1 
        s = 0
    else:
        if jnp.abs(u2) > jnp.abs(u1):
            # this condition avoids potential overflows
            # when tau is large
            # we will explain this in the coming part
            tau = -u1/u2
            s = 1.0 / jnp.sqrt(1.0 + tau * tau)
            c = s * tau
        else:
            tau = -u2/u1
            c = 1.0/jnp.sqrt(1.0 + tau * tau)
            s = c* tau

    return c, s


def givens_transform(A:jnp.ndarray) -> jnp.ndarray:
    for i in range(A.shape[0]-1):
        c, s = givens(A[i, i], A[i+1, i])
        # construct Givens rotation matrix
        G = jnp.array(
            [
                [c, -s],
                [s, c]
            ]
        )
        # apply the Givens rotation to row k and k+1
        u = A[i:i+2, :].reshape(2, -1)
        A = A.at[i:i+2, :].set(G @ u)

    return jnp.round(A, 6)

key = random.PRNGKey(567)
A = random.uniform(key, (6, 6))
A = jnp.triu(A, -1)
A
# Array([[0.976706 , 0.8106253, 0.781171 , 0.6478064, 0.939659 , 0.7351937],
#        [0.2583095, 0.6409593, 0.1166685, 0.7197964, 0.7315263, 0.0051122],
#        [0.       , 0.0422235, 0.3764414, 0.3792816, 0.9099863, 0.4950393],
#        [0.       , 0.       , 0.2414137, 0.7459104, 0.2407993, 0.0194172],
#        [0.       , 0.       , 0.       , 0.2245039, 0.6656947, 0.5180018],
#        [0.       , 0.       , 0.       , 0.       , 0.9256719, 0.2474174]],      dtype=float32)
givens_transform(A)
# Array([[ 1.010163,  0.947512,  0.785125,  0.810408,  1.095229,  0.712242],
#        [ 0.0001  ,  0.414682, -0.048121,  0.56605 ,  0.557192, -0.131594],
#        [-0.000009,  0.000004,  0.452973,  0.671187,  0.854258,  0.442946],
#        [ 0.000005, -0.000002,  0.000114,  0.51102 ,  0.065007, -0.002208],
#        [ 0.000001, -0.000001,  0.000034,  0.000009, -1.166365, -0.548181],
#        [-0.000002,  0.000001, -0.000044, -0.000011, -0.000349,  0.30846 ]],      dtype=float32)

Gram-Schmidt. Our starting point is that observation that, for every kk, the first kk columns q1,,qkq_1, \cdots, q_k of QQ are an orthonormal basis for the subspace spanned by a1,,aka_1, \cdots, a_k.

Consider the Gram-Schmidt procedure, with the vectors to be considered in the process as columns of the matrix AA. That is

A=[a1a2an] A = \begin{bmatrix} \begin{array}{c|c|c|c} a_1 & a_2 & \cdots & a_n \end{array} \end{bmatrix}

Then,

u1=a1,e1=u1u1u2=a2<a2,e1>e1,e2=u2u2 \begin{aligned} u_1 = a_1, & \quad e_1 = \frac{u_1}{||u_1||} \\ u_2 = a_2 - <a_2, e_1> e_1, & \quad e_2 = \frac{u_2}{||u_2||} \\ \vdots \end{aligned}

The resulting QR factorization is

A=[a1a2an]=[e1e2en][a1e1a2e1,ane10a2e2ane2] A = \begin{bmatrix} \begin{array}{c|c|c|c} a_1 & a_2 & \cdots & a_n \end{array} \end{bmatrix} = \begin{bmatrix} \begin{array}{c|c|c|c} e_1 & e_2 & \cdots & e_n \end{array} \end{bmatrix} \begin{bmatrix} a_1 \cdot e_1 & a_2 \cdot e_1, & \cdots & a_n \cdot e_1 \\ 0 & a_2 \cdot e_2 & \cdots & a_n \cdot e_2 \\ \vdots & \end{bmatrix}

With the above formula, one could do QR calculation easily by:

One can see that this algorithm is not stable. We need to change the order of operations. The algorithm is then called the modified Gram-Schmidt (MGS) algorithm.

def modified_gram_schmidt(A: jnp.ndarray, reduced=False) -> np.ndarray:
    """
    Modified gram schmidt 
    """
    m, n = A.shape
    # it is important to use float64
    # the algorithm will not work for intergers without it 
    Q = A.copy()
    R = jnp.zeros_like(A)
    # iterate over columns 
    for i in range(n):
        # calculate the norm of each column 
        # assign it to the diagonal element of R
        R = R.at[i, i].set(jnp.linalg.norm(Q[:, i]))
        # normalize the ith column of Q 
        Q = Q.at[:, i].set(Q[:, i]/R[i, i])
        for j in range(i+1, n):
            # dot production for each element of R
            # fix row i and iterate over columns (upper triangle)
            # check the formula above 
            R = R.at[i, j].set(jnp.dot(Q[:, i], Q[:, j])) # projection coefficient 
            Q = Q.at[:, j].set(Q[:, j] - R[i, j] * Q[:, i]) # construct new axis
    
    if reduced:
        return Q, R[:n, :n]

    return Q, R


def mgs(A: jnp.ndarray):
    """
    different version of modified gram schmidt 
    """
    m, n = A.shape
    R = jnp.zeros((n, n))
    
    for j in range(n):
        for i in range(j-1):
            for k in range(m):
                temp = A[k, i] * A[k, j]
                R = R.at[i, j].set(R[i, j] + temp)
            for k in range(m):
                temp = A[k, i] * R[i, j]
                A = A.at[k, j].set(A[k, j] - temp)
                
        R = R.at[j, j].set(jnp.linalg.norm(A[:, j]))
        A = A.at[:, j].set(A[:, j] / R[j, j])
        
    return A, R

Recap. We’ve seen three ways to compute the QR decomposition: Householder reflection, Givens rotations, and Gram-Schmidt. Householder reflections are a great approach if AA is arbitrary and (close to) square. Gives rotations are useful when AA is upper Hessenberg. Gram-Schmidt is a goode idea when AA is tall and thin, and we want a QR decomposition where QQ is also tall and thin.

QR factorization via Householder transformation is usually faster and more accurate than Gram-Schmidt methods.

Upper Hessenberg Form

An upper Hessenberg matrix is a square matrix that is nearly upper triangular, with zeros below the first subdiagonal. Every n×nn \times n matrix AA can be written A=QHQTA = QHQ^T where QQ is orthonormal and HH, called the Hessenberg form of AA, is an upper Hessenberg matrix. Putting a matrix in upper Hessenberg form is an important first step to computing its eigenvalues numerically.

This algorithm also uses Householder transformations. To find orthogonal QQ and upper Hessenberg HH such that

A=QHQT,A = QHQ^T,

it suffices to find such matrices that satisfy

QTAQ=HQ^TAQ = H

Thus, the strategy is to multiply AA on the left and right by a series of orthonormal matrices until it is in Hessenberg form.

Using the same QkQ_k as in the kth step of the Householder algorithm introduces nkn − k zeros in the kkth column of AA, but multiplying QkAQ_kA on the right by QkTQ^T_k destroys all of those zeros.

Instead, choose a Q1Q_1 that fixes e1e_1 and reflects the first column of AA into the span of e1e_1 and e2e_2. The product Q1AQ_1A then leaves the first row of AA alone, and the product (Q1A)Q1T(Q_1A)Q^T_1 leaves the first column of (Q1A)(Q_1A) alone.

floating number illustrated
Figure 9. The illustration of Hessberg form.

Continuing the process results in the upper Hessenberg form of AA.

floating number illustrated
Figure 10. The illustration of Hessberg form.

This implies that A=Q1TQ2TQ3THQ3Q2Q1A = Q_1^T Q_2^T Q_3^T H Q_3 Q_2 Q_1, so setting Q=Q1TQ2TQ3TQ = Q_1^T Q_2^T Q_3^T results in the desired factorization A=QHQTA = QHQ^T.

def hessenberg(A: jnp.ndarray) -> jnp.ndarray:
    """
    Transform a matrix A into the Hessenberg form via Householder reflection
    It is very similar to the algorithm of QR with Householder 
    """
    m, n = A.shape
    H = A.copy()
    Q = jnp.eye(m)
    
    # construct sign function
    sign = lambda x: 1 if x >= 0 else -1
    # instead iterating all columns we loop over it upto n-2
    for i in range(n-2):
        # starting from the second row 
        v = H[i+1:, i].reshape(-1, 1)
        v_norm = jnp.linalg.norm(v)
        v = v.at[0, 0].set(v[0, 0] + sign(v[0, 0]) * v_norm)
        # normalize v
        v = v.at[:, :].set(v/jnp.linalg.norm(v))
        
        # update H and Q
        # apply Q_k H (H is copy of A)
        H = H.at[i+1:, i:].set(H[i+1:, i:] - 2 * v @ v.T @ H[i+1:, i:])
        # apply Q_k^T to H, H Q_k^T 
        H = H.at[:, i+1:].set(H[:, i+1:] - 2 * (H[:, i+1:] @ v) @ v.T)
        # calculate Q by Q_kQ_{k-1}
        Q = Q.at[i+1:, :].set(Q[i+1:, :] - 2 * v @ v.T @ Q[i+1:, :])
        
    return H, Q.T


def hessenberg_np(A):
    """
    Hessenberg decomposition 
    """
    m, n = A.shape
    # convert to float64 incase inputs are integer
    H = A.copy().astype(np.float64)  
    Q = np.eye(m)
    # construct sign function
    sign = lambda x: 1 if x >= 0 else -1
    # iterate over columns 
    for i in range(n-2):
        # get the column and reshape it into column vector 
        # it is important to copy 
        # as matrix is mutable in numpy 
        x = H[i+1:, i].reshape(-1, 1).copy()
        # calculate the column
        x_norm = np.linalg.norm(x)
        # update the first entry 
        x[0, 0] = x[0, 0] + sign(x[0,0]) * x_norm
        # normalize the vector
        x /= np.linalg.norm(x)

        # update H and Q
        H[i+1:, i:] = H[i+1:, i:] - 2 * x @ x.T @ H[i+1:, i:] 
        H[:, i+1:] = H[:, i+1:] - 2 * (H[:, i+1:] @ x) @ x.T 
        Q[i+1:, :] = Q[i+1:, :] - 2 * x @ x.T @ Q[i+1:, :]

    # transpose Q 
    return H, Q.T

All figures in this post are taken from the book by (Darve & Wootters, 2021).

  1. Darve, E., & Wootters, M. (2021). Numerical Linear Algebra with Julia (Vol. 172). SIAM.
  2. Golub, G. H., & Van Loan, C. F. (2013). Matrix computations. JHU press.