Why are 2D vector graphics so much harder than 3D?

There’s a lot of fantastic research into 2D graphics rendering these days. Petr Kobalicek and Fabian Yzerman have been working on Blend2D, one of the fastest and most accurate CPU rasterizers on the market, with a novel JIT approach. Patrick Walton of Mozilla has explored not just one, but three separate approaches in Pathfinder, culminating in now Pathfinder V3. Raph Levien has built a compute-based pipeline based on Gan et al’s ahead-of-its-time 2014 paper on vector textures. Signed distance fields seem to be getting some further development from both Adam Simmons and Sarah Frisken independently.

One might wonder: why is there so much commotion about 2D? It seriously can’t be that much harder than 3D, right? 3D is a whole other dimension! Real-time raytracing is around the corner, with accurate lighting and and yet we can’t manage dinky 2D graphics with solid colors?

To those not well-versed in the details of the modern GPU, it’s a very surprising conclusion! But 2D graphics has plenty of unique constraints that make it a difficult problem to solve, and one that doesn’t lend itself well to parallel approaches. Let’s take a stroll down history lane and trace the path that led us here in the first place, shall we?

The rise of PostScript

In the beginning, there was the plotter. The first graphics devices to interact with computers were “plotters”, which had one or multiple pens and an arm that could move over the paper. Things were drawn by submitting a “pen-down” command, moving the arm in some unique way, possibly curved, and then submitting “pen-up”. HP, manufacturer of some of the earliest plotter printers, used a variant of BASIC called “AGL” on the host computer, which then would send commands to the plotter peripheral itself in a another language like HP-GL. During the 1970s, we saw the rise of affordable graphics terminals, starting with the Tektronix 4010. It has a CRT for its display, but don’t be fooled: it’s not a pixel display. Tektronix came from the analog oscilloscope industry, and these machines work by driving the electron beam in a certain path, not in a grid-like order. As such, the Tektronix 4010 didn’t have pixel output. Instead, you sent commands to it with a simple graphing mode that could draw lines but, again, in a pen-up pen-down fashion.

Like a lot of other inventions, this all changed at Xerox PARC. Researchers there were starting to develop a new kind of printer, one that was more computationally expressive than what was seen in plotters. This new printer was based on a small, stack-based Turing-complete language similar to Forth, and they named it… the Interpress! Xerox, obviously, was unable to sell it, so the inventors jumped ship and founded a small, scrappy startup named “Adobe”. They took Interpress with them and tweaked it until was no longer recognizable as Interpress, and they renamed it PostScript. Besides the cute, Turing-complete stack-language language it comes with to calculate its shapes, the original PostScript Language Reference marks up an Imaging Model in Chapter 4, near-identical to the APIs we widely see today. Example 4.1 of the manual has a code example which can be translated to HTML5 <canvas> nearly line-by-line.

/box {                  function box() {
    newpath                 ctx.beginPath();
    0 0 moveto              ctx.moveTo(0, 0);
    0 1 lineto              ctx.lineTo(0, 1);
    1 1 lineto              ctx.lineTo(1, 1);
    1 0 lineto              ctx.lineTo(1, 0);
    closepath               ctx.closePath();
} def                   }
                        
gsave                   ctx.save();
72 72 scale             ctx.scale(72, 72);
box fill                box(); ctx.fill();
2 2 translate           ctx.translate(2, 2);
box fill                box(); ctx.fill();
grestore                ctx.restore();

This is not a coincidence.

Apple’s Steve Jobs had met the Interpress engineers on his visit to PARC. Jobs thought that the printing business would be lucrative, and tried to simply buy Adobe at birth. Instead, Adobe countered and eventually sold a five-year license for PostScript to Apple. The third pillar in Jobs’s printing plan was funding a small startup, Aldus, which was making a WSYWIG app to create PostScript documents, “PageMaker”. In early 1985, Apple released the first PostScript-compliant printer, the Apple LaserWriter. The combination of the point-and-click Macintosh, PageMaker, and the LaserWriter singlehandedly turned the printing industry on its head, giving way to “desktop publishing” and solidifying PostScript its place in history. The main competition, Hewlett-Packward, would eventually license PostScript for its competing LaserJet series of printers in 1991, after consumer pressure.

PostScript slowly moved from being a printer control language to a file format in and of itself. Clever programmers noticed the underlying PostScript sent to the printers, and started writing PostScript documents by hand, introducing charts and graphs and art to their documents, with the PostScript evaluated for on-screen display. Demand sprung up for graphics outside of the printer! Adobe noticed, and quickly rushed out the Encapsulated PostScript format, which was nothing more than a few specially-formatted PostScript comments to give metadata about the size of the image, and restrictions about using printer-centric commands like “page feed”. That same year, 1985, Adobe started development on “Illustrator”, an application for artists to draw Encapsulated PostScript files through a point-and-click interface. These files could then be placed into Word Processors, which then created… PostScript documents which were sent to PostScript printers. The whole world was PostScript, and Adobe couldn’t be happier. Microsoft, while working on Windows 1.0, wanted to create its own graphics API for developers, and a primary goal was making it compatible with existing printers so the graphics could be sent to printers as easily as a screen. This API was eventually released as GDI, a core component used by every engineer during Windows’s meteoric rise to popularity in the 90s. Generations of programmers developing for the Windows platform started to unknowingly equate “2D vector graphics” with the PostScript imaging model, cementing its status as the 2D imaging model.

The only major problem with PostScript was its Turing-completeness — viewing page 86 of a document means first running the script for pages 1-85. And that could be slow. Adobe caught wind of this user complaint, and decided to create a new document format that didn’t have these restrictions, called the “Portable Document Format”, or “PDF” for short. It threw out the programming language — but the graphics technology stayed the same. A quote from the PDF specification, Chapter 2.1, “Imaging Model”:

At the heart of PDF is its ability to describe the appearance of sophisticated graphics and typography. This ability is achieved through the use of the Adobe imaging model, the same high-level, device-independent representation used in the PostScript page description language.
By the time the W3C wanted to develop a 2D graphics markup language for the web, Adobe championed the XML-based PGML, which had the PostScript graphics model front and center.
PGML should encompass the PDF/PostScript imaging model to guarantee a 2D scalable graphics capability that satisfies the needs of both both casual users and graphics professionals.
Microsoft’s competing format, VML, was based on GDI, which as we know was based on PostScript. The two competing proposals, both still effectively PostScript, were combined to make up W3C’s “Scalable Vector Graphics” (“SVG”) technology we know and love today.

Even though it’s old, let’s not pretend that the innovations PostScript brought to the world are anything less than a technological marvel. Apple’s PostScript printer, the LaserWriter, had a CPU twice as powerful as the Macintosh that was controlling it, just to interpret the PostScript and rasterize the vector paths to points on paper. That might seem excessive, but if you were already buying a fancy printer with a laser in it, the expensive CPU on the side doesn’t seem so expensive in comparison. In its first incarnation, PostScript invented a fairly sophisticated imaging model, with all the features that we take for granted today. But the most powerful, wowing feature? Fonts. Fonts were, at the time, drawn by hand with ruler and protractor, and cast onto film, to be printed photochemically. In 1977, Donald Knuth showed the world what could be with his METAFONT system, introduced together with his typesetting application TeX, but it didn’t catch on. It required the user to describe fonts mathematically, using brushes and curves, which wasn’t a skill that most fontgraphers really wanted to learn. And the fancy curves turned into mush at small sizes: the printers of the time did not have dots small enough, so they tended to bleed and blur into each other. Adobe’s PostScript proposed a novel solution to this: an algorithm to “snap” these paths to the coarser grids that printers had. This is known as “grid-fitting”. To prevent the geometry from getting too distorted, they allowed fonts to specify “hints” about what parts of the geometry were the most important, and how much should be preserved.

Adobe’s original business model was to sell this font technology to people that make printers, and sell special recreations of fonts, with added hints, to publishers, which is why Adobe, to this day, sells their versions of Times and Futura. Adobe can do this, by the way, because fonts, or, more formally, “typefaces”, are one of five things explicitly excluded by US Copyright Law, since they were originally designated as “too plain or utilitarian to be creative works”. What is sold and copyrighted instead is the digital program that reproduces the font on the screen. So, to prevent people from copying Adobe’s fonts and adding their own, the Type 1 Font format was originally proprietary to Adobe and contained “font encryption” code. Only Adobe’s PostScript could interpret a Type 1 Font, and only Adobe’s Type 1 Fonts had the custom hinting technology allowing them to be visible at small sizes.

Grid fitting, by the way, was so universally popular that when Microsoft and Apple were tired of paying licensing fees to Adobe, they invented an alternate method for their font file competitor, TrueType. Instead of specifying declarative “hints”, TrueType gives the font author a complete Turing-complete stack language so that the author can control every part of grid-fitting (coincidentally avoiding Adobe’s patents on declarative “hints”). For years, the wars between the Adobe-backed Type 1 and the TrueType raged on, with font foundries being stuck in the middle, having to provide both formats to their users. Eventually, the industry reached a compromise: OpenType. But rather than actually decide a winner, they simply plopped both specifications into one file format: Adobe, now in the business of selling Photoshop and Illustrator rather than Type 1 Fonts, removed the encryption bits, gave the format a small amount of spit polish, and released CFF / Type 2 fonts, which were grafted into OpenType wholesale as the cff table. TrueType, on the other hand, got shoved in as glyf and other tables. OpenType, while ugly, seemed to get the job done for users, mostly by war of endurance: just require that all software supports both kinds of fonts, because OpenType requires you to support both kinds of fonts.

Of course, we’re forced to ask: if PostScript didn’t become popular, what might have happened instead? It’s worth looking at some other alternatives. The previously mentioned METAFONT didn’t use filled paths. Instead, Knuth, in typical Knuth fashion, rigorously defines in his paper Mathematical Typography the concept of a curve that is “most pleasing”. You specify a number of points, and some algorithm finds the one correct “most pleasing” curve through them. You can stack these paths on top of each other: define such a path as a “pen”, and then “drag the pen” through some other path. Knuth, a computer scientist at heart, managed to introduce recursion to path stroking. Knuth’s thesis student, John Hobby, designed and implemented algorithms for calculating the “most pleasing curve”, the “flattening” of the nesting of paths, and rasterizing such curves. For more on METAFONT, curves, and the history of font technology in general, I highly recommend the detailed reference of Fonts & Encodings, and the papers of John D. Hobby.

Thankfully, the renewed interest in 2D graphics research means that Knuth and Hobby’s splines are not entirely forgotten. While definitely arcane and non-traditional, they recently made their way into Apple’s iWork Suite where they are now the default spline type.

The rise of triangles

Without getting too into the math weeds, at a high-level, we call approaches like Bezier curves and Hobby splines implicit curves, because they are specified as a mathematical function which generates the curve. They are smooth functions which look good at any resolution and zoom level, which happen to be good traits for a 2D image designed to be scalable.

2D graphics started and maintained forward momentum around these implicit curves, by near necessity in their use in modelling human letterforms and glyphs. The hardware and software to compute these paths in real-time was expensive, but since the big industry push for vector graphics came from the printing industry, most of the rest of the existing industrial equipment was already plenty more expensive than the laser printer with the fancy CPU.

3D graphics, however, had a very different route. From the very beginning, the near-universal approach was to use straight-edged polygons, often times manually marked up and entered into the computer by hand. Not all approaches, though. The 3D equivalent of an implicit curve is an implicit surface, made up of basic geometric primitives like spheres, cylinders and boxes. A perfect sphere with infinite resolution can be represented with a simple equation, so for organic geometry, it was a clear winner over the polygon look of early 3D. MAGI was one of a few companies pushing the limits of implicit surfaces, and combined with some clever artistic use of procedural textures, they won the contract with Disney to design the lightcycle sequences for the 1982 film Tron. Unfortunately, though, that approach quickly fell by the wayside. The number of triangles you could render in a scene was skyrocketing due to research into problems like “hidden surface removal” and faster CPUs, and for complex shapes, it was a lot easier for artists to think about polygons and vertices they could click and drag, rather than use combinations of boxes and cylinders to get the look they wanted.

This is not to say that implicit surfaces weren’t used in the modelling process. Tools like Catmull-Clark subdivision were a ubiquitous industry standard by the early 80s, allowing artists to put a smooth, organic look on otherwise simple geometry. Though Catmull-Clark wasn’t even framed as an “implicit surface” that can be computed with an equation until the early 2000s. Back then, it was seen as an iterative algorithm: a way to subdivide polygons into even more polygons.

Triangles reined supreme, and so followed the tools used to make 3D content. Up-and-coming artists for video games and CGI films were trained exclusively on polygon mesh modellers like Maya, 3DS Max and Softimage. As the “3D graphics accelerator” came onto the scene in late-80s, it was designed to accelerate the existing content out there: triangles. While some early GPU designs like the NVIDIA NV1 had some limited hardware-accelerated curve support, it was buggy and quickly dropped from the product line.

This culture mostly extends into what we see today. The dominant 2D imaging model, PostScript, started with a product that could render curves in “real-time”, while the 3D industry ignored curves as they were difficult to make work, relying on offline solutions to pre-transform curved surfaces into triangles.

Implicit surfaces rise from the dead

But why were implicit curves able to be done in real-time on 2D on a printer in the 80s, and yet 3D implicit curves are still too buggy near the early ’00s? Well, one answer is that Catmull-Clark is a lot more complicated than a Bezier curve. Bezier curves do exist in 3D, where they are known as B-splines, and they are computable, but they have the drawback that they limit the ways you can connect your mesh together. Surfaces like Catmull-Clark and NURBS allow for arbitrarily connected meshes to empower artists, but this can lead to polynomials greater than the fourth degree, which tend to have no closed-form solution. Instead, what you get are approximations based on subdividing polygons, like what happens in Pixar’s OpenSubdiv. If someone ever finds an analytic closed-form solution to root-finding either Catmull-Clark or NURBS, Autodesk will pay a lot of money for it, for certain. Compared to these, triangles seem a lot nicer: simply compute three linear plane equations and you have yourself an easy test.

… but what if we don’t need an exact solution? That’s exactly what graphics developer of incredible renown Íñigo Quílez asked when doing research into implicit surfaces again. The solution? Signed distance fields. Instead of telling you the exact intersection point of the surface, it tells you how far away you are from a surface. Analoguous to an analytically computed integral vs. Euler integration, if you have the distance to the closest object, you can “march” through the scene, asking how far away you are at any given point and stepping that distance. Such surfaces have seen a brand new life through the demoscene and places like Shadertoy. A twist on the old MAGI approach to modelling brings us incredible gems like Quílez’s Surfer Boy, calculated in infinite precision like an implicit surface would. You don’t need to find the algebraic roots of Surfer Boy, you just feel it out as you march through.

The difficulty, of course, is that only a legitimate genius like Quílez can create Surfer Boy. There’s no existing tooling for signed-distance field geometry, it’s all code. That said, given the exciting resurgence of implicit surfaces for their organic, curved look, there’s now plenty of interest into the technique. MediaMolecule’s PS4 game Dreams is a content-creation kit built around combining implicit surfaces, requiring them to tear down and reinvent most of traditional graphics in the process. It’s a promising approach, and the tools are intuitive and fun. Oculus Medium and unbound.io are also putting good research into the problem. It’s definitely a promising look into what the future of 3D graphics and next-generation tools might look like.

But some of these approaches are less adaptable to 2D than you might think. Common 3D game scenes tend to have lush materials and textures but low geometry counts, as many critics and snake-oil salesman are quick to point out. This means that we can rely on smaller amounts of anti-aliasing as silhouettes are not as majorly important. Approaches like 4x MSAA might cut the mustard for a lot of games, but for small fonts with solid colors, instead of 16 fixed sample locations, you would much rather compute the exact area under the curve for each pixel, giving you as much resolution as you want.

Rotating the viewport around in a 3D game has the effect of causing effects similar to saccadic masking as your brain re-adjusts to the new view. For a lot of games, this can help hide artifacts in post-processing effects like temporal antialiasing, which Dreams and unbound.io heavily lean on to get good performance of their scene. Conversely, in a typical 2D scene, we don’t have this luxury of perspective, so attempting to use it will make our glyphs and shapes boil and jitter with those artifacts in full glory. 2D is viewed differently, and the expectations are higher. Stability is important as you zoom, pan, and scroll.

None of these effects are impossible to implement on a GPU, but they do show a radical departure from “3D” content, with different priorities. Ultimately, 2D graphics rendering is hard because it’s about shapes — accurate letterforms and glyphs — not materials and lighting, which is mostly a solid color. GPUs, through a consequence of history, chose not to focus on real-time implicit geometry like curves, but instead on everything that goes inside them. Maybe in a world where PostScript didn’t win, we would have a 2D imaging model that didn’t have Bezier as a core realtime requirement. And maybe in a world where triangles were replaced with better geometry representations sooner, we would see content creation tools focus on 3D splines, and GPUs that have realtime curves built right into the hardware. It’s always fun to imagine, after all.