a non-professional teacher of computer graphics to at least two people, I’ve often found that a major stumbling block for students is this ominous beast called the matrix. If you don’t stare too closely and simply take a textbook example of “build your projection matrix” and remember rules like “invert-transpose a matrix to transform your normals”, it works fine. But there’s often a lot of difficult-to-explain magic in there.
When I gave a recent internal talk at Endless on basic computer graphics, I took a new approach. Instead of using a rotation matrix and multiplying it out, I simply rotate the points. After all, matrices and matrix multiplication are just a convenient way of writing out a linear system. Instead of:
we instead write:
But the two groups of equations are, in fact, by definition, 100% equivalent. The latter is basically a short-hand way of writing the former.
To explain the perspective divide, instead of walking through the badly named “w” coordinate and homogeneous coordinate systems, I simply divide by Z.
But I can’t go a full blog post without explaining the horrendous nature of the projection matrix. I do not like the projection matrix as found in most libraries. You see, while a local rotation matrix is relatively simple once you understand the concept, the projection matrix does its best to obscure and confuse you, mixing three independent concepts into one inseparable ball of mud.
Here is a standard perspective projection matrix generated by gl-matrix:
And here is the part that actually does the perspective projection.
Since all points are divided by w, we simply divide by the Z value. Well, we divide by negative Z. What is the rest of the matrix doing? Well, the values here should be obvious.
It’s the scale. This is used for field-of-view projection. We simply change the scale at which objects appear. The bigger the field-of-view, the smaller the objects end up being.
And this last bit? This is something I have never have had explained to me well. Both OpenGL and Direct3D have a depth buffer. The value stored in the depth buffer is “hardcoded” to be taken from the third component of the vector, the Z component. (This is why you should always have XZ as your ground plane, and Y should face towards the sky. Z is hardcoded as depth in both APIs! *ahem*). This component needs to be between 0 and 1.
When we build the projection matrix, we take in two values, the near plane and the far plane. We need to remap values near the near plane to 0, and values near the far plane to 1. Using linear interpolation, we can easily derive:
Eek. Translating this into the projection matrix isn’t easy. After all, we have a constant value which we can’t really express, since the . But wait a minute, all input coordinates are guaranteed to be homogeneous, right? Which means the “w” is guaranteed to be 1. So let’s just borrow the input coordinate’s w!
This, in my opinion, is too clever for its own good. The only reason we care about w in this case isn’t because there’s extra nuance in the already subtly related w and z coordinates. We’re simply stealing w for its 1.
Personally, I think this is when we should have realized we’re at the limitations of what matrix math can do. After all, we can already write it the simpler way:
v_position.z = (v_position.z - u_nearPlane) / (u_farPlane - u_nearPlane);
And, for extra credit, why do we take the inverse-transpose of a our transformation matrix when transforming normals? There’s two parts to that question. First, why won’t the regular transformation matrix work? Let’s take a simple object:
When rotating this object, normals transform fine.
When scaling this object, however, normals don’t transform correctly. In fact, when scaling, we need to make sure we scale by the opposite amount.
However, a simple inverse transformation matrix won’t do, because then rotations won’t work. We need to make sure to only invert the part of the matrix that deals with scaling. It turns out that because of how rotations are specified, transposing a rotation-only matrix is exactly the same as inverting it. And transposing a scale-only matrix is a no-op, because transposing flips the numbers on either side of the diagonal, and a scale-only matrix goes right down the center. So, when we invert-transpose a matrix, we’re actually pulling a neat trick to simply invert only the scale of a transformation.