Drew Barbarello's Blog

The Perspective Projection Matrix

Here, we’re going to derive the glFrustum Perspective Projection matrix, which maps the perspective view volume to the parallel view volume. We start with coordinates in “eye space”, near and far clip plane distances \(n\) and \(f\) (a.k.a. zNear, zFar), and bounds \(t, b, l, r\) for the “general imaging rectangle” on the near clip plane as in Figure 1.

Figure 1: The “general imaging rectangle” on the near clip plane [1]

Then, we’re mapping the imaging rectangle on the near clip plane \(z=-n \;\;(l \leq x \leq r, \; b \leq y \leq t)\) to \(z=-1 \;\; (-1 \leq x, y \leq 1)\), and mapping the corresponding rectangle on the far clip plane (the intersection of the rays from the eye to the bounds of the imaging rectangle with the far clip plane) to \(z=1 \;\; (-1 \leq x, y \leq 1)\) (after homogenizing). See Figure 2

Figure 2: The transformation we would like to find from the perspective view volume to the standard view volume. [2]

Approach

While the derivations at songho.ca and scratchapixel.com are great, they both take incremental approaches, solving for matrix entries one or a few at a time and plugging them in. Instead, I find it easier to follow the derivation if we just pick a handful of points in the preimage and image of the desired transformation and use a linear algebra package (Sympy) to do the tedious work of solving for the transformation matrix.

Given 5 points \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d},\mathbf{e} \in \mathbb{RP}^3\), of which no four are linearly dependent, there is a unique (up to a scalar multiple) projective transformation mapping the points \(\left[\begin{matrix}1\\0\\0\\0\end{matrix}\right],\left[\begin{matrix}0\\1\\0\\0\end{matrix}\right],\left[\begin{matrix}0\\0\\1\\0\end{matrix}\right],\left[\begin{matrix}0\\0\\0\\1\end{matrix}\right],\left[\begin{matrix}1\\1\\1\\1\end{matrix}\right]\) (the “quadrilateral of reference” and the “unit” point) to \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d},\mathbf{e}\). We use an extension of the strategy outlined in section 3.3 of [3] to prove this by construction.

Clearly, for the quadrilateral of reference to be taken to \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d},\mathbf{e}\), the matrix \(\mathbf{A}\) of the transformation must be of the form \[ \left[\begin{matrix}a_{1} u & b_{1} v & c_{1} w & d_{1} x\\a_{2} u & b_{2} v & c_{2} w & d_{2} x\\a_{3} u & b_{3} v & c_{3} w & d_{3} x\\a_{4} u & b_{4} v & c_{4} w & d_{4} x\end{matrix}\right] \] for some non-zero \(u,v,w,x\). We also require that

\[\begin{align*} \left[\begin{matrix}a_{1} u & b_{1} v & c_{1} w & d_{1} x\\a_{2} u & b_{2} v & c_{2} w & d_{2} x\\a_{3} u & b_{3} v & c_{3} w & d_{3} x\\a_{4} u & b_{4} v & c_{4} w & d_{4} x\end{matrix}\right] \left[\begin{matrix}1\\1\\1\\1\end{matrix}\right] &= \left[\begin{matrix}e_{1}\\e_{2}\\e_{3}\\e_{4}\end{matrix}\right] \\ \implies u\left[\begin{matrix}a_{1}\\a_{2}\\a_{3}\\a_{4}\end{matrix}\right] + v \left[\begin{matrix}b_{1}\\b_{2}\\b_{3}\\b_{4}\end{matrix}\right] + w \left[\begin{matrix}c_{1}\\c_{2}\\c_{3}\\c_{4}\end{matrix}\right] + x \left[\begin{matrix}d_{1}\\d_{2}\\d_{3}\\d_{4}\end{matrix}\right] &= \left[\begin{matrix}e_{1}\\e_{2}\\e_{3}\\e_{4}\end{matrix}\right] \end{align*}\]

Since \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d}\) constitute a basis for \(\mathbb{R}^4\), there exist unique \(u,v,w,x\) satisfying the above: the coordinates of \(\mathbf{e}\) in the \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d}\) basis. Furthermore, none of \(u,v,w,x\) will be zero, since no four of \(\mathbf{a},\mathbf{b},\mathbf{c},\mathbf{d},\mathbf{e}\) are linearly dependent. Finally, since the columns of \(\mathbf{A}\) are non-zero multiples of linearly independent vectors, \(\mathbf{A}\) is invertible. Thus the projective transformation \(t: [\mathbf{x}] \mapsto [\mathbf{A}\mathbf{x}]\) exists and is unique.

We now use the theorem above to find the matrix of the transformation taking the perspective view volume to the standard one. We pick 5 points on the perspective view volume, and the 5 points on the standard parallel view volume to which we want them taken, such that for each set of points, no four of them are linearly dependent. We’ll then find the transformations \(t_1: [\mathbf{x}] \mapsto [\mathbf{A}_1\mathbf{x}], t_2: [\mathbf{x}] \mapsto [\mathbf{A}_2\mathbf{x}]\) taking the “quadrilateral of reference” and the “unit” point to each set of points (guaranteed to exist by the theorem). Finally, we’ll be able to compute \(t = t_2 \circ t_1^{-1} : [\mathbf{x}] \mapsto [\mathbf{A}_2\mathbf{A}_1^{-1}\mathbf{x}]\). The matrix \(\mathbf{A}_2\mathbf{A}_1^{-1}\) of this transformation will be the projection matrix.

Derivation

First, we’ll import what we need from Sympy to work this out.

from sympy import ones, Matrix, linsolve, latex, ratsimp, together, symbols
from sympy.abc import n, f, l, r, t, b, u, v, w, x

Next, we’ll create Sympy vectors for the points we’re mapping from/to.

# Points in the perspective view volume in eye space, no four of which
# are linearly dependent (these are all column vectors)
eye_p = Matrix([0, 0, 0, 1])
near_center = Matrix([(l + r)/2, (t+b)/2, -n, 1])
far_top_center = Matrix([f*(l + r)/(2*n), f*t/n, -f, 1])
far_top_left = Matrix([f*l/n, f*t/n, -f, 1])
near_bottom_left = Matrix([l, b, -n, 1])

# Points to which the above are mapped: the eye point is mapped to the
# "ideal" point, the remaining points are mapped to corresponding points in the
# normalized parallel view volume
ideal = Matrix([0, 0, 1, 0])
n_near_center = Matrix([0, 0, -1, 1])
n_far_top_center = Matrix([0, 1, 1, 1])
n_far_top_left = Matrix([-1, 1, 1, 1])
n_near_bottom_left = Matrix([-1,-1,-1, 1])
unit = ones(4,1)

We’ll define a helper matrix \(UVWX\) where every entry in the first column is equal to \(u\), the second column is all \(v\)’s, etc. Then, if we take a matrix with 4 of our reference points as columns, and multiply it with \(UVWX\) elementwise, we’ll get a matrix like \(\mathbf{A}\) above where we just need to solve for \(u, v, w, x\).

UVWX = Matrix([(u,v,w,x) for r in range(4)])
latex(UVWX)

\[ \left[\begin{matrix}u & v & w & x\\u & v & w & x\\u & v & w & x\\u & v & w & x\end{matrix}\right] \]

Now we’ll solve for \(\mathbf{A}_1\) and \(\mathbf{A}_2\).

# Find the mapping taking the "quadrilateral of reference" and unit point to
# the five points in the perspective view volume selected above
A1 = Matrix([eye_p, near_center, far_top_center, far_top_left]).reshape(4,4).T
A1 = A1.multiply_elementwise(UVWX)


# Solve A1*unit = near_bottom_left for u, v, w, x
(u_sol, v_sol, x_sol, w_sol), = linsolve(list(A1*unit - near_bottom_left),
                                         u, v, x, w)
A1 = A1.subs({u: u_sol, v: v_sol, x: x_sol, w: w_sol})

# Find the mapping taking the "quadrilateral of reference" and unit
# point to points on the standard view volume corresponding to the
# selected perspetive view volume points (eye point is taken to the "ideal"
# point)
A2 = Matrix([ideal,
             n_near_center,
             n_far_top_center,
             n_far_top_left]).reshape(4, 4).T
A2 = A2.multiply_elementwise(UVWX)

# Solve A2*unit = n_near_bottom_left for u,v,w,x
(u_sol, v_sol, x_sol, w_sol), = linsolve(list(A2*unit - n_near_bottom_left),
                                         u, v, x, w)
A2 = A2.subs({u: u_sol, v: v_sol, x: x_sol, w: w_sol})

# Finally, find the matrix of the transformation taking the
# first set of points to the second
P = A2 * A1.inv()

Before displaying the resulting matrix \(\mathbf{P}\), I’ll do some simplifications to have Sympy display it in the familiar form

# Make P_{3,2} = -1 by dividing the matrix by -P_{3,2}
c = P[3, 2]
P /= -c
# Simplify the matrix by putting expressions over a common
# denominator, canceling, and combining terms (more explicit/stable
# way to accomplish what sympy.simplify does)
P = P.applyfunc(ratsimp)
P = P.applyfunc(together)
latex(P)

\[ \left[\begin{matrix}\frac{2 n}{- l + r} & 0 & \frac{l + r}{- l + r} & 0\\0 & \frac{2 n}{- b + t} & \frac{b + t}{- b + t} & 0\\0 & 0 & \frac{- f - n}{f - n} & - \frac{2 f n}{f - n}\\0 & 0 & -1 & 0\end{matrix}\right] \]

Note that the first simplification step of dividing the matrix by \(c = -\mathbf{P}_{3,2}\) does not change the transformation because \(\mathbf{P}\) is the matrix of a homogeneous transformation (so all scalar multiples of the matrix are equivalent), and \(1/c\) is defined because \(n\) is nonzero:

\[ c = - \frac{1}{n} \]

Finally, if the imaging rectangle is symmetric about the point \(\left(0, 0, -\frac{n}{f}, 1\right)\) (so \(l = -r\) and \(b = -t\)), then the projection matrix simplifies to

latex(P.subs({l: -r, b:-t}))

\[ \left[\begin{matrix}\frac{n}{r} & 0 & 0 & 0\\0 & \frac{n}{t} & 0 & 0\\0 & 0 & \frac{- f - n}{f - n} & - \frac{2 f n}{f - n}\\0 & 0 & -1 & 0\end{matrix}\right] \]

The Parallel Projection Matrix

To derive the general parallel projection matrix, we’ll follow the same approach as above, just mapping a different view volume to the standard parallel view volume. Given the imaging rectangle on the near plane defined by \(z=-n, \; l \leq x \leq r, \; b \leq y \leq t\), and a vector \(\mathbf{d}\) giving the direction of projection (the “DOP” in Figure 3), the eye space view volume is fully defined.

import itertools

d1, d2, d3 = symbols('d_1:4')
d = Matrix([d1, d2, d3, 0])

# Points on the parallel view volume in eye space, no four of which are
# linearly dependent
near_top_left = Matrix([l, t, -n, 1])
near_bottom_left = Matrix([l, b, -n, 1])
near_top_right = Matrix([r, t, -n, 1])
# d is a unit vector in direction of near plane from far plane
# => <point on near plane> - d * (f-n)/d3 is corresponding point on far plane
far_top_left = near_top_left + -d * ((f-n)/d3)
far_bottom_right = Matrix([r, b, -n, 1]) + -d * ((f-n)/d3)

# Just double checking that no four of the 5 points are linearly
# dependent, in general
assert all([Matrix(s).reshape(4,4).rank() == 4
            for s in itertools.combinations([
                    near_top_left,
                    near_bottom_left,
                    near_top_right,
                    far_top_left,
                    far_bottom_right], 4)])

# Points to which the above are mapped on the
# normalized parallel view volume
n_near_top_left = Matrix([-1, 1, -1, 1])
n_near_bottom_left = Matrix([-1, -1, -1, 1])
n_near_top_right = Matrix([1, 1, -1, 1])
n_far_top_left = Matrix([-1, 1, 1, 1])
n_far_bottom_right = Matrix([1, -1, 1, 1])

# Find the mapping taking the "quadrilateral of reference" and unit point to
# the five points in the parallel view volume selected above
A1 = Matrix([near_top_left,
             near_bottom_left,
             near_top_right,
             far_top_left]).reshape(4,4).T
A1 = A1.multiply_elementwise(UVWX)

# Solve A1*unit = far_bottom_right for u, v, w, x
(u_sol, v_sol, x_sol, w_sol), = linsolve(list(A1*unit - far_bottom_right),
                                         u, v, x, w)
A1 = A1.subs({u: u_sol, v: v_sol, x: x_sol, w: w_sol})

# Find the mapping taking the "quadrilateral of reference" and unit point to
# points on the standard view volume
A2 = Matrix([n_near_top_left,
             n_near_bottom_left,
             n_near_top_right,
             n_far_top_left]).reshape(4,4).T
A2 = A2.multiply_elementwise(UVWX)

# Solve A2*unit = n_far_bottom_left for u,v,w,x
(u_sol, v_sol, x_sol, w_sol), = linsolve(list(A2*unit - n_far_bottom_right),
                                         u, v, x, w)
A2 = A2.subs({u: u_sol, v: v_sol, x: x_sol, w: w_sol})

# Finally, find the matrix of the transformation taking the
# first set of points to the second
P = A2 * A1.inv()
P = P.applyfunc(ratsimp)
P = P.applyfunc(together)
latex(P)

\[ \left[\begin{matrix}\frac{2}{- l + r} & 0 & - \frac{2 d_{1}}{d_{3} \left(- l + r\right)} & \frac{1}{d_{3} \left(- l + r\right)} \left(- 2 d_{1} n - 2 d_{3} l - d_{3} \left(- l + r\right)\right)\\0 & \frac{2}{- b + t} & - \frac{2 d_{2}}{d_{3} \left(- b + t\right)} & \frac{1}{d_{3} \left(- b + t\right)} \left(- 2 b d_{3} - 2 d_{2} n - d_{3} \left(- b + t\right)\right)\\0 & 0 & - \frac{2}{f - n} & \frac{- f - n}{f - n}\\0 & 0 & 0 & 1\end{matrix}\right] \]

In the case where \(\mathbf{d}\) is perpendicular to the image plane \(d_1 = d_2 = 0, \; d_3 = 1\) we get the familiar glOrtho matrix:

latex(P.subs({d1: 0, d2: 0, d3: 1}))

\[ \left[\begin{matrix}\frac{2}{- l + r} & 0 & 0 & \frac{- l - r}{- l + r}\\0 & \frac{2}{- b + t} & 0 & \frac{- b - t}{- b + t}\\0 & 0 & - \frac{2}{f - n} & \frac{- f - n}{f - n}\\0 & 0 & 0 & 1\end{matrix}\right] \]

By using different direction vectors, we can get projection matrices with different angles of projection.

Tokoi, Hoke, Frustum, https://tokoik.github.io/opengl/faq.html
Ahn, Song Ho, Perspective Frustum and Normalized Device Coordinates (NDC), http://www.songho.ca/opengl/gl_projectionmatrix.html
Brannan, D.A.; Esplen, M.F. and Gray, J.J., Geometry, (2011).
Rhodes, Loren, Parallel Projection, http://jcsites.juniata.edu/faculty/rhodes/graphics/viewing.htm

Drew Barbarello's Blog

Deriving Perspective and Parallel Projection Matrices

The Perspective Projection Matrix

Approach

Derivation

The Parallel Projection Matrix

Comments