Six Years of

I’ve always had a love for the art in video games. Sure, all mediums have some ability to craft worlds from nothing, but none are so realized, so there, as the worlds in video games. It’s also an example of the interplay between “pure art” and computer technology. And in the case of the GameCube/Wii, technology from 1999, even! Yes, smart engineers played a large role in building game engines and tooling, but artists are very rarely appreciated, yet they are the ones responsible for what you see, what you hear, and often how you feel during a specific section. Not only do they model and texture everything, they control where the postprocessing goes, where the focus is pulled in each shot, and tell the programmers how to tweak the lighting on the hair. A good artist is a “problem solver” in much the same way that a good engineer is. They are the ones who decide how the game looks, from shot and scene composition to materials and lighting. The careful eye of a good art director is what turns a world from a pile of models to something truly special. Their work has touched the lives of so many over the past 30 years of video game history., my side project, is a celebration of the incredible work that video game artists have created. It lets you explore some video game maps with a free camera and see things from a new perspective. It’s blown up to moderate popularity now, and as it’s now reaching 6 years of development from me in some ways, I figured I should say a few words about its story and some of the experiences I’ve had making it.

My first commit to an ancient noclip predecessor, bmdview.js, was made on April 11th, 2013. I was inspired by the works of amnoid’s bmdview and my own experiments with the tool. I had never written any OpenGL or graphics code before, but I managed to stumble through it through trial and error and persistence.

This is the first screenshot I can find of bmdview.js with something recognizable being drawn. It’s the Starship Mario from Super Mario Galaxy 2.

After this resounding initial success, I… put it down, apparently. I had a lot of side projects at the time: learning about GIF/JPEG compression, about the X Window System, and doing some game reverse engineering of my own. I revisited it a few times over the years, but a “complete emulation” always seemed out of reach. There were too many bugs and I really barely knew anything about the GameCube’s GPU.

A few years or so later, I was inspired enough to create an Ocarina of Time viewer, based off the work that xdaniel did for OZMAV. I can’t remember too many details; it’s still live and you can go visit it but I did have to un-bit-rot it for this blog post (JS API breakage; sigh). I really enjoyed this, so I continued with a few more tools, until I had the idea to combine them all. First came Super Mario 64 DS in October of 2016, and then zelview.js was added in 20 days after. I don’t have time to recount all of the incredible people who I either based my work on or have contributed to the project directly; the credits page should list most of them.

Keeping Momentum

Keeping a side-project going for this long requires momentum. When I started the project, I decided that I would try as hard as possible to prevent refactors. Refactoring code kills momentum. You do not want to write more code that will be changed by the refactor, so you stop all your progress while you get it done, and of course, as you change the code for this new idea you have, you encounter resistance and difficulty. You also have to change this. Or maybe you found your new idea doesn’t fit as well in all cases, so the resulting code is still as ugly as when you started. And you get discouraged and it’s not fun any more, so you decide not to work on it for a bit. And then “a bit” slowly becomes a year, and you get less guilty about not touching it ever again. It’s a story that’s happened to me before. *cough*.

In response, I optimized the project around me, and my goals. First up, it’s exciting, it’s thrilling to get a game up on screen. There’s nothing like seeing that initial hints of a model come through, even if it’s all contorted. It’s a drive to keep pushing forward, to know that you’re part of the way there. Arranging the code so I can get that first hint of game up on the screen ASAP is a psychological trick, but it’s an effective one. That rush has never gotten old, even after the 20 different games I’ve added.

Second, I’m probably the biggest user of my own site. Exploring the nooks and crannies of some nostalgic game is something I still do every day, despite having done it for 6 years. Whether it’s trying to figure out how they did a specific effect, or getting a new perspective on past nostalgia gone by, I’m always impressed with the ingenuity and inventiveness of game artists. So when I get things working, it’s an opportunity to explore and play. This is fun to work on not just because other people like it, but because I like it. So, when people ask me if I will add so-and-so game, the answer is: probably not, unless you step up to contribute. If I haven’t played it, it makes it harder for me to connect with it, and it’s less fun for me to work on.

Third, I decided that refactors were off-limits. To help with this, I wanted as little “abstractions” and “frameworks” as possible. I should share code when it makes sense, but never be forced to share code when I do not want it. For this, I took an approach inspired by the Linux kernel and built what they call “helpers” — reusable bits and bobs here and there to help cut down on boilerplate and common tasks, but are used on an as-needed basis. If you need custom code, you outgrow the training wheels, from the helpers to your own thing, perhaps by copy/pasting it, and then customizing it. Both tef and Sandi Metz have explored this idea before: code is easier to write than it is to change. When I need to change an idea that did not work out well, I should be able to do it incrementally — port small pieces of the codebase over to the new idea, and keep the old one around. Or just delete ideas that did not work out without large change to the rest of the code.

As I get older, I realize that the common object-oriented languages make it difficult to share code in better ways and too easily lock you into the constraints of your own API. When this API requires a class, it is less momentum to squeeze your own code to extend a base class rather than creating a new file for an interface, putting what you need in there, making your class inherit the interface, and then try to share code and build the helpers. TypeScript’s structurally typed interfaces, where any class can automatically match the interface, makes it easy to define new ones. And simply by not requiring a separate file for an interface, TypeScript makes them a lot easier to create, and so, you end up with more of them. Small changes to how we perceive something as “light” or “heavy” makes a difference for which one we reach for to solve a problem.

I am not saying that these opinions are the absolute right way to build code. There are tradeoffs involved: the amount of copy/paste code is something I can keep in my head, but someone else trying to maintain it might be frustrated by the amount of different places they might have to fix a bug. What I am saying is that this was originally intended as my side-project site, and so the choices I made were designed to keep me happy in my spare time. A team or a business? Those have different goals, and might choose different tradeoffs for their organization and structure. I do think I have had tremendous success with them.

The Web as a Platform

My hobby is breaking the web. I try to push it to its limits, and build experiences that people thought should never have been possible. This is easier than you might think, because there’s a tremendous amount of low-hanging fruit in web applications. There’s a lot that’s been said on performance culture, premature optimization, and new frameworks that emphasize “programmer velocity” at the expense of performance. That said, my experience as a game developer does crowd my judgement a bit. We’re one of the few industries where we are publicly scrutinized over performance metrics by an incessant and very often rude customer base, over numbers and metrics. Anything less than 30fps and you will be lambasted for “poor optimization”. Compare that to business software, where I’m not even sure my GMail client can reach 10fps. I recently discovered that Slack was taking up 10GB of memory.

A poor performance culture is baked into the web platform itself, through not just the frameworks, but the APIs and new language features as well. Garbage collection exists, but its cost is modelled to be free. Pauses caused by the garbage collector are the number one cause of performance issues that I’ve seen. Objects like Promises can cause a lot of GC pressure, but in most JavaScript libraries found on npm they are created regularly, sometimes multiple times per every API call. The iterator protocol is an absurd notion that creates a new object on every iteration because they did not want to add a hasNext method (too Java-y?) or use Python’s StopIteration approach. Its performance impact can be even worse — I’ve measured up to a 10fps drop just because I used a for…of to iterate over 1,000 objects per frame. We need to hold platform designers accountable for decisions that affect performance, and build APIs that take the garbage collector into account. Why can’t I reuse a Promise object in an object pool? Does this interface need a special dictionary, or can I get away without it?

Writing performant code is not hard. Yes, it takes care and attention and creative problem solving, but aren’t those the things that inspired you to the craft of computer science in the first place? You don’t need more layers, you don’t need more dependencies, you just need to sit down, write the code, and profile it.

Oh, by the way. I recommend TypeScript. It’s one of the few things I truly love about modern web development. Understand where it’s polyfilling for you, and definitely don’t always trust the compiler output without verifying it yourself, but it has earned my trust enough over time for it to be worth the somewhat minor pains it adds to the process.

WebGL as a Platform

OpenGL is dreadful. I say this as a former full-time Linux engineer who still has a heart for open-source. It’s antiquated, full of footguns and traps, and you will fall into them. As of v4.4, OpenGL gives you enough ways to avoid all of the bad ideas, but you still need to know which ideas are bad and why. I might do a follow-up post on those eventually. OpenGL ES 3.0, the specification that WebGL2 is based on, unfortunately has very few of those fixes. It’s an API very poorly suited for underpowered mobile devices, which is where it ended up being deployed.

OpenGL optimization, if you do not understand GPUs, can be a lot like reading the tea leaves. But the long and short of it is “don’t do anything that would cause a driver stall, do your memory uploads as close together as you can, and change state as little as possible”. Checking whether a shader had any errors compiling? Well you’ve just killed any threading the driver had. glUniform? Well, that’s data that’s destined for the GPU — you probably want to use a uniform buffer object instead so you can group all the parameters into one giant upload. glEnable(GL_BLEND)? You just recompiled your whole shader on Apple iDevices.

And, of course, this is to say nothing of the quirks that each driver has. OpenGL on Mac has a large number of issues and bugs, and, yes, those extend to the web too. ANGLE also has its own set of bugs, which, yes, I also have a workaround for.

The hope here is for modern graphics APIs to make their way to the web. Thankfully, this is happening with WebGPU. The goal is to cut down on WebGL overhead and provide something closer to the modern APIs like Vulkan and Metal. But this requires building your application in a way that’s more suitable to modern graphics APIs: thinking about things in terms of buffers, pipelines and draw calls. The one large refactor I allowed myself in the project was porting the whole platform to use my WebGPU-esque API, which I currently run on top of WebGL 2. Though, as an example of how exhausting that refactor was; even though I designed it so that I could work on all the games at separate times, with both the legacy and modern codepaths existing live on the site for months, it took me over 5 months to port all the different games, and I also had to remove zelview.js as a casualty. These things are exhausting, and I didn’t have the energy to push it through. I eventually was able to build on my N64 experience when it came time to build Banjo-Kazooie, and made something much better the second time around.

I ended up with a fairly unique renderer design I’m happy about — it’s similar to recent refactors like the one that showed up in Unreal 4.22. Perhaps in the future, I can make a post about modern game engine renderers, the different passes in play, and how they relate to the underlying platform layer, for those interested in writing your own.

The future

My main side project a few years ago was Xplain, an interactive series about the X Windowing System and its role in graphics history. I’m no longer working on Xplain. Some say I have a talent for clear, explanatative writing, but it takes me a long time to craft a story, lay out all the pieces, and structure it in a way to build intuition by layering pieces. I tried to reboot it a few years ago by reframing it as no longer about X11, hoping that would get me in the mood to write again, but no. It’s attached to a legacy I’m not particularly interested in exploring or documenting any further, and the site’s style and framework are not useful. I’m still interested in writing explanatory writing, but it will be here, like my post on Super Mario Sunshine’s water.

noclip is not my day job, it is my side project. It is a labor of love for me. It’s a website I have enjoyed working on for the past 6 years of my life. I don’t know how much longer it’ll continue beyond that — I’m feeling a bit burned out because of the large amount of tech support, bills to pay, and general… expectations? From people? They want all the games. I get it, I really do. I am ecstatic to hear that people like my work and my tool enough to know that they want their game in there too. But at some point, I’m going to need a bit of a break. To help continue and foster research, I decided to fund it myself for a game I’ve long wanted on the site while I take it a bit easier. If this is succesful, I plan to do more of it. I hope to build a community of contributors to the project. Having a UI designer or frontend developer would be nice. If you want to contribute, please join the Discord! Or, at least show your appreciation in the comments. For more interesting video game snippets, I post pretty frequently on my Twitter now. If you made it all the way to the bottom here, thank you. I hope it was at least interesting.

Deconstructing the water effect in Super Mario Sunshine

Note: The demos below require WebGL2 support. If you are running a browser without WebGL 2 support, user “petercooper” on Hacker News has helpfully recorded a video and GIFs and for me.

One of my hobbies is writing model viewers and graphics toys for games. It’s a good mix of my interests in graphics and rendering, in reverse engineering complex engines, and nostalgia for old video games.

I recently extended my WebGL-based game model viewer to add support for some of Nintendo’s GameCube games, including The Legend of Zelda: The Wind Waker and Super Mario Sunshine. The GameCube, for those unaware, had a novel, almost-programmable, but fixed-function GPU. Instead of developers writing shaders, they programmed in a set of texture combiners similar to the methods used in glTexEnv pipelines, but taken to the extreme. For those used to modern programmable GPUs, it can be quite the mindbending experience to think that complex effects can be done with this thing. And yet, 2002 saw the release of Super Mario Sunshine with some really good looking water for its time. Replicated in WebGL below:

This water effect is loaded into your browser directly from the original game files for Delfino Plaza and placed onto a plane. Let’s take a deeper dive into how this was done, shall we?

Texturing the plane

Believe it or not, the effect actually starts out like this:

The effect can be seen as a fairly complex variant on something super old: “texture scrolling”. It’s a bit more complicated than what’s displayed here, but the fundamentals remain the same. Our plane starts life as this scrolling wave-y texture, which provides us some interesting noise to work with. This is then combined with a second layer of the same texture, but this time only scrolling in one dimension.

This gives us an interesting moire pattern which is the basis for how the water appears to bubble and shift around so naturally. You might even notice some “ghostly”-looking alignment when the two textures meet up. This artifact is visible in the original material, but it appears more like an intentional sunbeam or a ray of light scanning across the water. Hiding artifacts like this with clever material design is a large part of game graphics techniques.

Obviously, the texture isn’t black. Instead of the colors being black and white, they’re blended in with the background, giving us something that looks more transparent.

Now we’re getting somewhere. At this point, the second texture layer is also added in twice as much as the first, which makes it looks especially bright, almost “blooming”. This feature will come in handy later to define highlights in our water.

Going back to the original material, it’s a lot more “dynamic”. That is, as we move the camera around, zoom in and out, the texture seems to morph with it. It’s clear when it’s near us, and also fades out in the distance. Now, in a traditional fixed-function pipeline, this sort of effect is impossible. There’s no possible way this material can know the distance from the camera! However, Nintendo uses a clever abuse of a more traditional feature to implement this sort of thing. Let’s talk about what I like to call “mip trick”.

Building a square mip out of a round hole

Mip-mapping is a traditional graphics optimization. You see, when GPUs apply textures, they want the resulting image to be as smooth as possible, and they want to to be as fast as possible. The texture we’re sampling from here is actually only 64×64 pixels in size (yes, it’s true!), and our browser windows tend to be a lot bigger than that. If you zoom in, especially in our last demo, you can “see the pixels”, and also how they blend together and fade in and out, but keep in mind that GPUs have to compute that for every pixel in the resulting image. Looking from above, in this case the texture is magnified, but when looking at it at an angle, as the plane becomes more squashed from perspective in the distance, and the texture on the screen drops to less than 64×64 in size.

When this happens, the texture is said to be “minified”, and the GPU has to read a lot more pixels in our texture to make the resulting image smooth. This is expensive — the GPU wants to read as few pixels as possible. For this reason, we invented “mip-maps”, which are precomputed smaller versions of each image. The GPU can use these images instead when the texture is minified. So, we have 32×32 versions of our texture, and 16×16 versions of our texture, and the GPU can select which one it wants, and even blend across two versions to get the best image quality. Mipmaps are an excellent example of a time/space tradeoff, and an example of build-time content optimizations.

However, you might have noticed, “as the texture becomes minified”. That happens when it becomes smaller on the screen, which… tends to happen when the texture is… farther away. Are you picking up on the hint here? It’s a way to pick out distance from the camera!

What if, instead of using smaller versions of the same texture, we instead use different textures? Nintendo had the same idea. This is what I call the “mip trick”. The wave texture I showed you above isn’t the full story. In practice, here’s the full wave texture, with all of its mipmap levels shown.

In the largest mipmap level (where the texture is closest to the camera), we don’t have any pixels. This basically removes the water effect in a small radius around the camera — letting the water be clear. This both prevents the water material from getting too repetitive, and also helps during gameplay by showing the player the stuff underwater that is closest to them. Clever! The second mipmap level is actually the texture I’ve been using in the demo up until now, and is “medium-strength”.

The third mipmap level is the brightest, which corresponds to that “band” of bright shininess in the middle. This band, I believe, is a clever way of faking enviornment reflections. At that camera distance, you can imagine we’d mostly being the reflection from our skybox when at a 20 degree angle looking into the water, like our clouds. In Sirena Beach, this band is tinted yellow to give the level a beautiful yellow glow that matches the evening sunset.

Let’s try uploading all of these mipmaps now into our demo.

That’s getting us a lot closer! We’re almost there.

As a quick aside, since the algorithm choosing of which mipmap to use for the texture is hardcoded into the GPU, it does mean it isn’t necessarily portable. The GameCube renders in a resolution 640×548, and the mipmaps here are designed for that size. The Dolphin developers noticed this as well — since Dolphin can render in higher resolutions than what the GameCube can handle, this trick can break unless you are careful about it. Thankfully, modern graphics APIs have ways of applying a bias to the mipmap selection. Using your screen resolution and the knowledge of the original 640×548 design of the GameCube, we can calculate this bias and then use that while sampling.

With that out of the way, it’s time for the final touch. Again, believe it or not, there’s only one thing left to turn our last demo into the final product. A simple function (known as the alpha test) tests “how bright” the resulting pixel is, and if it’s between a certain threshold, the pixel is kicked out entirely. In our case, any pixels between 0.13 and 0.92 are simply dropped on the floor.

This gives us the unique “seran wrap” look for the outer bands of the effect, and in the middle, the water is mostly composed of these brighter pixels, and so the higher threshold lets only the really bright pixels shine through, giving us that empty band and those wonderful highlights!

Forgotten Lore

In today’s days of programmable shaders, PBR pipelines, and increased art budgets, these tricks are becoming more and more like forgotten knowledge. Nintendo’s GameCube-era games have, in my admittedly-biased mind, some of the best artwork done in this era. Even though I mentioned “GameCube”, the Wii was effectively the same hardware, and so these same tricks can be found in the Mario Galaxy games, Super Smash Bros. Brawl, and even The Legend of Zelda: Skyward Sword. It’s impressive that GPU technology from 2001 carried Nintendo all the way through 2012, when the Wii U was released.

Good art direction, a liberal amount of creative design, and intricate knowledge of the hardware can make for some fantastic effects under such constraints. For more fun, try figuring out the glass pane effects in Delfino Hotel or the improvements upon the technique used in Super Mario Galaxy.

The code used for every one of these demos is open-source and available on GitHub. Obviously, all credits for the original artwork go to the incredibly talented artists at Nintendo. Huge special thanks to all of my friends working on the Dolphin team.


This post is different from my usual material. Despite the name, I’m not going to talk about actual coding all that much. This post might be classified under “lament”, or maybe “rant”. I talk about problems, reflect on them, and ultimately offer no solutions. As always, opinions are entirely my own, but are definitely influenced by my employer, my friends, my social status, and whatever ad campaign I saw last week, because that’s how opinions work. Please enjoy.

In May of 2016, a small section of the internet was chasing after a mystery. Someone noticed a mysterious symbol had appeared in two different games, of an eye inside a hand. Both of them had been there, lying in plain sight, for half a year. It’s known as an “Alternate Reality Game”, or “ARG”. A sort of invented Da Vinci’s Code, where mysteries and puzzles unlock clues, blurring the lines between fiction and reality. The game usually ends in a marketing message for something else, nothing more than a “be sure to drink your ovaltine”. The allure of a random symbol being placed in a bunch of games, in secret, and having such a long time before being noticed at all is really cool. So, once discovered, off the “game detectives” went, cracking the code and solving the puzzles that lay before them.

Most games were beaten quickly, by simply cracking open the .exe files and the game’s data, often long before the “proper” method of solving it was done. With the exception of one game. It was the earliest of these symbols placed, in fact. The online game Kingdom of Loathing added the symbol in late 2014. It was the last puzzle in the ARG to be solved. Nobody could crack the code, through datamining or otherwise. The correct answer involved noticing certain items in the game could spell out a secret code: “nlry9htdotgif”. It referred to a file on their servers.

Before the community managed to figure it out, the developers hinted at the solution through their podcast. Their choice of words was, to my ears at the time, interesting.

The other games that are involved in this ARG, almost all of them, the thing that people were looking for, just got datamined out of them because they were just Steam games, and we had the advantage of like, well, “this is a web game, so we have always online DRM!” that makes it so you actually have to solve the puzzle.

I don’t use Spotify. I download MP3 files and buy the albums. I wasn’t always like this. I was super excited when Spotify first came to America. I signed up, explored a lot of music, and found an artist I really enjoyed. The next week, the artist was wiped from the service. I canceled my subscription. Paradoxically, as Netflix and Spotify and Steam grow in popularity, there’s less and less content on it. Artist rates are declining, and everybody wants the 30% cut that the platform owners take. Everybody’s launching their own streaming service, and so, this month, It’s Always Sunny in Philadelphia is leaving Netflix. FOX doesn’t need Netflix anymore, since they have Hulu, and they want your money through Hulu Plus. Want to watch Game of Thrones? HBO NOW will cost you $14.99. Crunchyroll, $11.95. YouTube Red, $9.99. Twitch Prime, $10.99. The dangers of a la carte cable TV seem very real.

Several of my more tech-savvy friends are with me. The guys that waited in line for the first iPhone, and were using Netflix when it sent you DVDs through the mail. There’s a gap in their Blu-Ray collection, starting around 2008. But last year, they’re starting to buy things again. It’s nice to actually own media that won’t expire. Yes, it has DRM — the shitty, encrypted kind. But it doesn’t have web DRM. The disc won’t physically expire because the servers don’t want to send you the file anymore. Programmers can always crack the encryption keys with enough exerted effort. While everybody was afraid of Encrypted Media Extensions in the web browser, Netflix and Spotify were off building something far more ridiculous. Cracking an RSA key feels a lot less intimidating to me now.

Netflix is choosing to continue House of Cards without Kevin Spacey. However, it feels entirely plausible that after the massive wave of recent sexual assault scandals in Hollywood, Netflix might reverse its course and delete the show from their servers forever. It’s now forever “out of print.” After all, the Cosby 77 special was never released. This isn’t a new problem: a lot of TV shows have never seen the light of day after their original broadcast date, except maybe on giant tape reels in old storage rooms somewhere. Every old TV show famously has “the lost episode”. Plenty of old movies are missing forever. But those feel to me like matters of negligent archiving. Netflix scorching an entire show, perhaps even because of public pressure from us, the people, feels a lot more deliberate. And maybe you’re OK with that. Separation of the artist and the work is something that’s becoming more and more difficult to grapple with in today’s society, and perhaps we should just light everything by Bill Cosby and Kevin Spacey up in flames. But the only place left to find anything lost through that will be on the hard drives of people that torrented it.

It feels entirely plausible that after sexual assault allegations about Kevin Spacey, House of Cards might just disappear from the world entirely. Netflix pulls the video files from their app, and that’s that.

And of course I can’t write about this without mentioning subscription software. As we transition from desktop software to web services, it’s very rare to find a “pay-once” kind of deal like you used to. Adobe’s Creative Cloud started that trend by pushing their entire suite of apps, including Photoshop, to a monthly subscription, and it was quickly followed up by Autodesk and QuickBooks. If you cancel your subscription, you lose the ability to use the apps entirely. Web DRM was so successful that we’re now using it for standard industry tools.

Gadgets are having the same issues. Companies releasing internet-enabled devices rarely think about the longevity of any of it. Logitech had no empathy for bricking customers’ devices until they were called out. And Sony TVs from five ago can’t run the YouTube app; Google broke their devices. YouTube doesn’t need Sony. It’s more effective for them to move fast and break things, leaving a pile of consumer angst in the wake.

There’s a common saying: “nothing ever gets lost on the internet”. Digital culture is supposed to be the prime time for extremely nitpicky nerds. Everything is recorded, analyzed, copied. As storage, hosting, bandwidth costs go down, more and more things are supposed to be preserved. But this couldn’t be further from the truth. The fundamental idea of the web is that anything can link to anything — people can explore and share and copy with nothing but a URL. But the average “half-life” of a link is two years. This post has 49 links. If you’re reading this in 2019, it’s likely only around 24 of them will actually point where I wanted them to point.

“How much knowledge has been lost because it only exists in a now-reaped imageshack upload embedded in a forum post?”. By 2019, I expect this user’s Twitter profile to have gone private, or deleted entirely, or Twitter changing their URL structure and breaking links everywhere.

Publishing a movie on YouTube is no longer as expensive as publishing a DVD in your local FYE. Costs have gone down. This has enabled an explosive level of amazing creativity and enabled so many projects and endeavors it hasn’t before. Being a musician doesn’t require signing to a label. Upload anything to SoundCloud, YouTube, and Bandcamp and you’re now a musician. Web 2.0, as corny as the term is, is primarily about so-called “user-generated content”.

As a creator, this can be a blessing and a curse. I probably wouldn’t have had a voice 30 years ago, since I barely have anything interesting or original to say. Today, I have a voice, but so do 20,000 other people. Some say we’re in an attention economy: that there’s so much being created, that people are overwhelmed. Yes, there’s now 20,000 more musicians, but the number of people listening stays the same. Your struggle isn’t necessarily to be heard, it’s to be heard for more than five seconds. Google Analytics tells me that the average time reading any one of my posts, the so-called “time on page”, is 37 seconds. 90% of my readers have clicked Back in their browser long before reading this sentence.

I don’t believe in Idiocracy. The population isn’t getting dumber. The population’s IQ (whatever you think about it as a metric for measuring intelligence), has been going up. Plenty of people are still reading and learning — Wikipedia is the fifth most popular site in the world, after all.

What I believe is happening is that our reading is getting less expensive. All of the links I’ve posted here are to free sources, except for one. Do you have a Wall Street Journal account? I don’t. I used one weird trick to bypass it. It’s horrible, and I don’t like that I did it. As a society, we’re not paying for the things we used to. Stuff we totally should be paying for. Prices for entertainment, for news, for media, have nosedived in the past 20 years. Why pay for the Wall Street Journal when someone from Bloomberg or the Huffington Post will summarize the article for me, for free?

Some people are disappointed by the fact BuzzFeed now has a seat at the White House. But perhaps BuzzFeed’s more attention-grabby parts are simply the price we pay to fund its Pulitzer-prize winning journalism.

30 years ago, this article might have been published as an article in a newspaper, its grammar and style thoroughly edited by someone whose job it was to do nothing but that, and we’d both get paid for it. Today, this blog costs me money to host and I don’t make any money from it. Music albums that used to cost $20 now cost $5.99. But in terms of large-scale productions, they cost more than ever. TV shows take millions more than they once did to make: as expectations and fidelity go up, so do production costs. Sets, props, and visual effects need to be crafted more carefully than ever to appeal to high definition TV screens. Gamers seeking thrills demand higher frame rates, bigger polygons, and more pixels. YouTube beats this by offering lower-budget productions. iOS beats this by offering cheaper, “indie” titles.

I now work for a company that makes mobile applications. The price of a mobile application is $0.99. And you can still expect 90% of Android users to pirate it. This is, to say the least, unsustainable. Mobile games need to make money not from app sales, but from in-app purchases fueled by psychology.

Nintendo, the top dog of “triple-A” video games studios, was recently skewered by investors for daring to release a mobile game featuring Mario… for $10. It did not meet their sales predictions. Their newest mobile game, which is free-to-play and features in-app purchases, seems to be fairing a bit better.

On closer inspection though, there’s something funky about those numbers.

Atul Goyal, a senior analyst at Jefferies, told CNBC’s “Squawkbox” that he expected 500 million downloads of the Super Mario Run app on the Apple app store by March 2017.”

But according to analyst Tom Long of BMO Capital Markets, there are 715 million iPhones in use. That gives us two answers: either Tom Long is wrong, or Atul Goyal is. Two out of every three iPhone users is an unreasonable target for a Nintendo game.

A total of 1 billion downloads of the app are expected across operating systems, he added.

I don’t claim to be a senior analyst. But I also don’t claim that 13% of the world’s population will have downloaded a Mario game. This feels to me like an unrealistic growth target. As people pay less and less individually for games, you need to make things up in volume.

The low cost of production, the low cost of consumption, the attention economy, web DRM aren’t new ideas or new problems. We’re going to need to find a way out of this. published a fairly influential article (warning: might be unsuitable for work) on this subject back in 2010. David Wong’s term is “Forced ARTifical Scarcity” (“FARTS” for short. Har har. The article did come out in 2010, after all). His main argument is that we’ve switched mediums: things that were previously paid for by the cost of shipping a physical disk or pieces of paper are now effectively free. Business models built on ratios of supply and demand failed to take into account what would happen when supply is now infinite.

But there’s a crucial mistake hiding in there.

Remember the debut of Sony’s futuristic Matrix-style virtual world, PlayStation Home? There was a striking moment when the guys at Penny Arcade logged in and found themselves in a virtual bowling alley… standing in line. Waiting for a lane to open up. In a virtual world where the bowling alley didn’t actually exist. It’s all just ones and zeros on a server–the bowling lanes should be effectively infinite, but where there should have been thousands of lanes for anybody who wanted one, there was only FARTS.

Servers aren’t free, David. They’re physical things, hooked into a physical wire. They only have so much power and so much capacity. They go down, they overheat, they break, just like any other machine. There’s electricity to pay. This scarcity might be forced, but probably isn’t. Left to their own devices, people will hack and cheat. A badly programmed server might allow you to bowl on someone else’s lane. The same ingenuity that cracks open DRM also shatters fair play. Fixing bugs, applying security updates all take programmers, and money.

The servers go down when the money coming in doesn’t match the money going out.

People tend to think the internet is free and fair, but it’s anything but. I’m not talking simply about net neutrality rules, which do worry me, but about peering and transit. In 2014, this culminated in a public explosion between Netflix, Cogent, and Verizon, and the details are a lot more interesting and subtle than originally meet the eye. Bandwidth is expensive and there are unwritten, long-standing de facto rules about who pays for it. Fiber optic cable is expensive and fragile, costing upwards of $80,000 per mile. The hacker community can dream of a free internet, but unless someone eats that cost it’s not happening.

The Right to Read feels more and more realistic every day. It’s troubling. But I think the reason it feels realistic is because of everything I just described. When free digital copying upends 200 years of economic ideas and stability, the first impulse would be to stop it, or delay it until we can figure out what all of this means. DRM, to me, is an evil, but it’s a necessary and hopefully temporary one. It feels like there’s a growing deluge of water held back by a rickety dam. The people with the money go and rebuild it every 5 years, but it’s not going to hold that much longer. The pressure keeps building until the DRM can’t sustain the raw torrent of mayhem that will break it open. You’re now flooded and half the world’s underwater. Better hope you have a boat.

No, I don’t know what the boat is in this metaphor either.

People look to crowdfunding as a way to solve these problems, but I think people massively underestimate how much money at a raw level it takes to build an actual production. Kickstarter’s own lists of the most funded projects lists three campaigns for the Pebble watch, a company that got bought out by Fitbit this year after running out of money, the COOLEST COOLER, which appears to have gone south, and the OUYA, a games console which is probably best described to a link to the Crappy Games Wiki. OUYA, Inc. was later bought out by RAZER after, well, running out of money. Even the $8 million raised through Kickstarter had to be followed up with $25 million more dollars of private investor money.

$8 million might seem like a lot of money, but it quickly dries up when running an actual production. Next time you see a movie, or play a game, stare closely at the credits. Think about each one of those people there, their salary, and how much they worked on the final product. And then think about the countless uncredited cast and crew, and subcontractors of subcontractors who barely get so much as a Special Thanks.

Upload anything to SoundCloud, YouTube, and Bandcamp and you’re now a musician.

Funny story, that. SoundCloud takes servers and electricity, too. SoundCloud almost went out of business this year, but it was kept alive by investors trying to save the company. In two years, SoundCloud will likely die, because it couldn’t make money to keep the servers running. Or maybe it will get bought by Google as part of an “acqui-hire”. Your prize is your songs, your followers, your playlists all go away, replaced with an email thanking you for taking part in their incredible journey.

Apple’s iTunes Music Store, according to rumors, likely won’t be a music store in the near future. Even Spotify… let me repeat that, Spotify, everyone’s darling music service, can’t figure out how to make money. Hell, YouTube still isn’t profitable, but Google runs it at a loss anyway. The hope is eventually it will pay off.

Bandcamp, which offers premium album downloads and DRM-free content, is profitable.

Perhaps Web DRM isn’t as lucrative as we thought.


If you asked software engineers some of their “least hated” things, you’ll likely hear both UTF-8 and TCP. TCP, despite being 35 years old, is rock-solid, stable infrastructure that we take for granted today; it’s hard to sometimes realize that TCP was man-made, given how well it’s served us. But within every single TCP packet lies a widely misunderstood, esoteric secret.

Look at any diagram or breakdown of the TCP segment header and you’ll notice a 16-bit field called the “Urgent Pointer”. These 16 bits exist in every TCP packet ever sent, but as far as I’m aware, no piece of software understands them correctly.

This widely misunderstood field has caused security issues in multiple products. As far as I’m aware, there is no fully correct documentation on what this field is actually supposed to do. The original RFC 793 actually contradicts itself on the field’s exact value. RFC 1011 and RFC 1122 try to correct the record, but from my reading of the specifications, they seem to also describe the field incorrectly.

What is, exactly, the TCP URG flag? First, let’s try to refer to what RFC 793, the document describing TCP, actually says.

… TCP also provides a means to communicate to the receiver of data that at some point further along in the data stream than the receiver is currently reading there is urgent data. TCP does not attempt to define what the user specifically does upon being notified of pending urgent data, but the general notion is that the receiving process will take action to process the urgent data quickly.

The objective of the TCP urgent mechanism is to allow the sending user to stimulate the receiving user to accept some urgent data and to permit the receiving TCP to indicate to the receiving user when all the currently known urgent data has been received by the user.

From this description, it seems like the idea behind the urgent flag is to send some message, some set of bytes as “urgent data”, and allow the application to know “hey, someone has sent you urgent data”. Perhaps, you might even imagine, it makes sense for the application to read this “urgent data packet” first, as an out-of-band message.

But! TCP is designed to give you two continuous streams of bytes between computers. TCP, at the application layer, has no concept of datagrams or packetized messages in that stream. If there’s no “end of message”, it doesn’t make sense to define the URG packet to be different. This is what the 16-bit Urgent Pointer is used for. The 16-bit Urgent Pointer specifies a future location in the stream where the urgent data ends:

This mechanism permits a point in the data stream to be designated as the end of urgent information.

Wait. Where the urgent data ends? Then where does it begin? Most early operating systems assumed that this implied that there was one byte of urgent data located at the Urgent Pointer, and allowed clients to read it independently of the actual stream of data. This is the history and rationale behind the flag MSG_OOB, part of the Berkley Sockets API. When sending data through a TCP socket, the MSG_OOB flag sets the URG flag and points the Urgent Pointer at the last byte in the buffer. When a packed is received with the URG flag, the kernel buffers and stores the byte at that location. It also signals the receiving process that there is urgent data available with SIGURG. When receiving data with recv(), you can pass MSG_OOB to receive this single byte of otherwise inaccessible out-of-band data. During a normal recv(), this byte is effectively removed from the stream.

This interpretation, despite being used by glibc and even Wikipedia, is wrong based on my reading of the TCP spec. When taking into account the “neverending streams” nature of TCP, a more careful, subtle, and intentional meaning behind these paragraphs is revealed. One made clearer by the next sentence:

Whenever this point is in advance of the receive sequence number (RCV.NXT) at the receiving TCP, that TCP must tell the user to go into “urgent mode”; when the receive sequence number catches up to the urgent pointer, the TCP must tell user to go into “normal mode”…

Confusing vocabulary choices such as “urgent data” implies that there is actual data explicitly tagged as urgent, but this isn’t the case. When a TCP packet is received with an URG flag, all data currently in the socket is now “urgent data”, up until the end pointer. The urgent data waiting for you up ahead isn’t marked explicitly and available out-of-band, it’s just somewhere up ahead and if you parse all the data in the stream super quickly you’ll eventually find it. If you want an explicit marker for what the urgent data actually is, you have to put it in the stream yourself — the notification is just telling you there’s something waiting up ahead.

Put another way, urgency is an attribute of the TCP socket itself, not of a piece of data within that stream.

Unfortunately, several foundational internet protocols, like Telnet, are fooled by this misunderstanding. In Telnet, the idea is that if you have a large amount of data waiting in the buffer for a “runaway process”, it’s hard for your commands to make it through. From the Telnet specification:

To counter this problem, the TELNET “Synch” mechanism is introduced. A Synch signal consists of a TCP Urgent notification, coupled with the TELNET command DATA MARK. The Urgent notification, which is not subject to the flow control pertaining to the TELNET connection, is used to invoke special handling of the data stream by the process which receives it…

… The Synch is sent via the TCP send operation with the Urgent flag set and the [Data Mark] as the last (or only) data octet.

In a TCP world, this idea of course makes no sense. There’s no “last data octet” in a TCP stream, because the stream is continuous and goes on forever.

How did everyone get confused and start misunderstanding the TCP urgent mechanism? My best guess is that the broken behavior is actually more useful than the one suggested by TCP. Even a single octet of out-of-band data can actually signal quite a lot, and it can be more helpful than some “turbo mode” suggestion. Additionally, despite the availability of POSIX functionality like SO_OOBINLINE and sockatmark, there remains no way to reliably test whether the TCP socket is in “urgent mode”, as far as I’m aware. The Berkley sockets API started this misunderstanding and provides no easy way to get the correct behavior.

It’s incredible to think that 35 years of rock-solid protocol has had such an amazing mistake baked into it. You can probably count the number of total TCP packets sent in the trillions, if not more, yet 16 bits are dedicated to a field that nothing more than a handful of software has ever sent.

I don’t know who the Web Audio API is designed for

WebGL is, all things considered, a pretty decent API. It’s not a great API, but that’s just because OpenGL is also not a great API. It gives you raw access to the GPU and is pretty low-level. For those intimidated by something so low-level, there are quite a few higher-level engines like three.js and Unity which are easier to work with. It’s a good API with a tremendous amount of power, and it’s the best portable abstraction we have for a good way to work with the GPU on the web.

HTML5 Canvas is, all things considered, a pretty decent API. It has plenty of warts: lack of colorspace, you can’t directly draw DOM elements to a canvas without awkwardly porting it to an SVG, blurs are strangely hidden from the user into a “shadows” API, and a few other things. But it’s honestly a good abstraction for drawing 2D shapes.

Web Audio, conversely, is an API I do not understand. The scope of Web Audio is hopelessly huge, with features I can’t imagine anybody using, core abstractions that are hopelessly expensive, and basic functionality basically missing. To quote the specification itself: “It is a goal of this specification to include the capabilities found in modern game audio engines as well as some of the mixing, processing, and filtering tasks that are found in modern desktop audio production applications.”

I can’t imagine any game engine or music production app that would want to use any of the advanced features of Web Audio. Something like the DynamicsCompressorNode is practically a joke: basic features from a real compressor are basically missing, and the behavior that is there is underspecified such that I can’t even trust it to sound correct between browsers. More than likely, such filters would be written using asm.js or WebAssembly, or ran as Web Workers due to the rather stateless, input/output nature of DSPs. Math and tight loops like this aren’t hard, and they aren’t rocket science. It’s the only way to ensure correct behavior.

For people that do want to do such things: compute our audio samples and then play it back, well, the APIs make it near impossible to do it in any performant way.

For those new to audio programming, with a traditional sound API, you have a buffer full of samples. The hardware speaker runs through these samples. When the API thinks it is about to run out, it goes to the program and asks for more. This is normally done through a data structure called a “ring buffer” where we have the speakers “chase” the samples the app is writing into the buffer. The gap between the “read pointer” and the “write pointer” speakers is important: too small and the speakers will run out if the system is overloaded, causing crackles and other artifacts, and too high and there’s a noticeable lag in the audio.

There’s also some details like how many of these samples we have per second, or the “sample rate”. These days, there are two commonly used sample rates: 48000Hz, in use by most systems these days, and 44100Hz, which, while a bit of a strange number, rose in popularity due to its use in CD Audio (why 44100Hz for CDDA? Because Sony, one of the organizations involved with the CD, cribbed CDDA from an earlier digital audio project it had lying around, the U-matic tape). It’s common to see the operating system have to convert to a different sample rate, or “resample” audio, at runtime.

Here’s an example of a theoretical, non-Web Audio API, to compute and play a 440Hz sine wave.

const frequency = 440; // 440Hz A note.
 // 1 channel (mono), 44100Hz sample rate
const stream =, 44100);
stream.onfillsamples = function(samples) {
    // The stream needs more samples!
    const startTime = stream.currentTime; // Time in seconds.
    for (var i = 0; i < samples.length; i++) {
        const t = startTime + (i / stream.sampleRate);
        // samples is an Int16Array
        samples[i] = Math.sin(t * frequency) * 0x7FFF;

The above, however, is nearly impossible in the Web Audio API. Here is the closest equivalent I can make.

const frequency = 440;
const ctx = new AudioContext();
// Buffer size of 4096, 0 input channels, 1 output channel.
const scriptProcessorNode = ctx.createScriptProcessorNode(4096, 0, 1);
scriptProcessorNode.onaudioprocess = function(event) {
    const startTime = ctx.currentTime;
    const samples = event.outputBuffer.getChannelData(0);
    for (var i = 0; i < 4096; i++) {
        const t = startTime + (i / ctx.sampleRate);
        // samples is a Float32Array
        samples[i] = Math.sin(t * frequency);
// Route it to the main output.

Seems similar enough, but there are some important distinctions. First, well, this is deprecated. Yep. ScriptProcessorNode has been deprecated in favor of Audio Workers since 2014. Audio Workers, by the way, don’t exist. Before they were ever implemented in any browser, they were replaced by the AudioWorklet API, which doesn’t have any implementation in browsers.

Second, the sample rate is global for the entire context. There is no way to get the browser to resample dynamically generated audio. Despite the browser requiring having fast resample code in C++, this isn’t exposed to the user of ScriptProcessorNode. The sample rate of an AudioContext isn’t defined to be 44100Hz or 48000Hz either, by the way. It’s dependent on not just the browser, but also the operating system and hardware of the device. Connecting to Bluetooth headphones can cause the sample rate of an AudioContext to change, without warning.

So ScriptProcessorNode is a no go. There is, however, an API that lets us provide a differently sampled buffer and have the Web Audio API play it. This, however, isn’t a “pull” approach where the browser fetches samples every once in a while, it’s instead a “push” approach where we play a new buffer of audio every so often. This is known as BufferSourceNode, and it’s what emscripten’s SDL port uses to play audio. (they used to use ScriptProcessorNode but then removed it because it didn’t work good, consistently)

Let’s try using BufferSourceNode to play our sine wave:

const frequency = 440;
const ctx = new AudioContext();
let playTime = ctx.currentTime;
function pumpAudio() {
    // The rough idea here is that we buffer audio roughly a
    // second ahead of schedule and rely on AudioContext's
    // internal timekeeping to keep it gapless. playTime is
    // the time in seconds that our stream is currently
    // buffered to.

    // Buffer up audio for roughly a second in advance.
    while (playTime - ctx.currentTime < 1) {
        // 1 channel, buffer size of 4096, at
        // a 48KHz sampling rate.
        const buffer = ctx.createBuffer(1, 4096, 48000);
        const samples = buffer.getChannelData(0);
        for (let i = 0; i < 4096; i++) {
            const t = playTime + Math.sin(i / 48000);
            samples[i] = Math.sin(t * frequency);

        // Play the buffer at some time in the future.
        const bsn = ctx.createBufferSource();
        bsn.buffer = buffer;
        // When a buffer is done playing, try to queue up
        // some more audio.
        bsn.onended = function() {
        // Advance our expected time.
        // (samples) / (samples per second) = seconds
        playTime += 4096 / 48000;

There’s a few… unfortunate things here. First, we’re basically relying on floating point timekeeping in seconds to keep our playback times consistent and gapless. There is no way to reset an AudioContext’s currentTime short of constructing a new one, so if someone wanted to build a professional Digital Audio Workstation that was alive for days, precision loss from floating point would become a big issue.

Second, and this was also an issue with ScriptProcessorNode, the samples array is full of floats. This is a minor point, but forcing everybody to work with floats is going to be slow. 16 bits is enough for everybody and for an output format it’s more than enough. Integer Arithmetic Units are very fast workers and there’s no huge reason to shun them out of the equation. You can always have code convert from a float to an int16 for the final output, but once something’s in a float, it’s going to be slow forever.

Third, and most importantly, we’re allocating two new objects per audio sample! Each buffer is roughly 85 milliseconds long, so every 85 milliseconds we are allocating two new GC’d objects. This could be mitigated if we could use an existing, large ArrayBuffer that we slice, but we can’t provide our own ArrayBuffer: createBuffer creates one for us, for each channel we request. You might imagine you can createBuffer with a very large size and play only small slices in the BufferSourceNode, but there’s no way to slice an AudioBuffer object, nor is there any way to specify an offset into the corresponding with a AudioBufferSourceNode.

You might imagine the best solution is to simply keep a pool of BufferSourceNode objects and recycle them after they are finished playing, but BufferSourceNode is designed to be a one-time-use-only, fire-and-forget API. The documentation helpfully states that they are “cheap to create” and they “will automatically be garbage-collected at an appropriate time”.

I know I’m fighting an uphill battle here, but a GC is not what we need during realtime audio playback.

Keeping a pool of AudioBuffers seems to work, though in my own test app I still see slow growth to 12MB over time before a major GC wipes, according to the Chrome profiler.

What makes this so much more ironic is that a very similar API was proposed by Mozilla, called the Audio Data API. It’s three functions: setup(), currentSampleOffset(), and writeAudio(). It’s still a push API, not a pull API, but it’s very simple to use, supports resampling at runtime, doesn’t require you to break things up into GC’d buffers, and doesn’t have any.

Specifications and libraries can’t be created in a vacuum. If we instead got the simplest possible interface out there and let people play with it, and then took some of the more slow bits people were implementing in JavaScript (resampling, FFT) and put them in C++, I’m sure we’d see a lot more growth and usage than what we do today. And we’d have actual users for this API, and real-world feedback from users using it in production. But instead, the biggest user of Web Audio right now appears to be emscripten, who obviously won’t care much for any of the graph routing nonsense, and already attempts to work around the horrible APIs themselves.

Can the ridiculous overeagerness of Web Audio be reversed? Can we bring back a simple “play audio” API and bring back the performance gains once we see what happens in the wild? I don’t know, I’m not on these committees, I don’t even work in web development other than fooling around on nights and weekends, and I certainly don’t have the time or patience to follow something like this through.

But I would really, really like to see it happen.

Introduction to HTML Components

HTML Components (HTC), introduced in Internet Explorer 5.5, offers a powerful new way to author interactive Web pages. Using standard DHTML, JScript and CSS knowledge, you can define custom behaviors on elements using the “behavior” attribute. Let’s create a behavior for a simple kind of “image roll-over” effect. For instance, save the following as “”:

<PUBLIC:ATTACH EVENT="onmouseover" ONEVENT="rollon()" />
<PUBLIC:ATTACH EVENT="onmouseout" ONEVENT="rollout()" />
tmpsrc = element.src;
function rollon() {
    element.src = tmpsrc + "_rollon.gif"
function rollout() {
    element.src = tmpsrc + ".gif";

This creates a simple HTML Component Behavior that swaps the image’s source when the user rolls over and rolls off of the mentioned image. You can “attach” such a behavior to any element using the CSS attribute, “behavior”.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<IMG STYLE="behavior: url(" SRC="logo">

The benefit of HTML Components is that we can apply them to any element through simple CSS selectors. For instance:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
.RollImg {
  behavior: url(;
<IMG CLASS="RollImg" SRC="logo">
<IMG CLASS="RollImg" SRC="home">
<IMG CLASS="RollImg" SRC="about">
<IMG CLASS="RollImg" SRC="contact">

This allows us to reuse them without having to copy/paste code. Wonderful! This is known as an Attached Behavior, since it is directly attached to an element. Once you’ve mastered these basic Attached Behaviors, we can move onto something a bit more fancy, Element Behaviors. With Element Behaviors, you can create custom element types and create custom programmable interfaces, allowing us to build a library of custom components, reusable between pages and projects. Like before, Element Behaviors consist of an HTML Component, but now we have to specify our component in <PUBLIC:COMPONENT>.

<PUBLIC:ATTACH EVENT="onmouseover" ONEVENT="rollon()" />
<PUBLIC:ATTACH EVENT="onmouseout" ONEVENT="rollout()" />
<img id="imgtag" />
img = document.all['imgtag'];
function rollon() {
    img.src = element.basesrc + "_rollon.gif";
function rollout() {
    img.src = element.basesrc + ".gif";

I’ll get to the implementation of ROLLIMG in a bit, but first, to use a custom element, we use the special <?IMPORT> tag which allows us to import a custom element into an XML namespace.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

The ROLLIMG fully encapsulates the behavior, freeing the user of having to “know” what kind of element to use the Attached Behavior on! The implementation of the Custom Element Behavior might seem a bit complex, but it’s quite simple. When Internet Explorer parses a Custom Element, it synchronously creates a new HTML Component from this “template” and binds it to the instance. We also have two “magic global variables” here: “element” and “document”. Each instance of this HTML Component gets its own document, the children of which are reflowed to go inside the custom element. “element” refers to the custom element tag in the outer document which embeds the custom element. Additionally, since each custom element has its own document root, that means that it has its own script context, and its own set of global variables.

We can also set up properties as an API for the document author to use when they use our custom element.

Here, we use an img tag as a “template” of sorts, add it to our custom element’s document root.

After IE puts it together, the combined DOM sort of looks like this:

    <IMG ID="imgtag" SRC="logo.gif">

    <IMG ID="imgtag" SRC="home.gif">


Unfortunately, this has one final flaw. Due to the natural cascading nature of CSS Stylesheets, such “implementation details” will leak through. For instance, if someone adds a <STYLE>IMG { background-color: red; }</STYLE>, this will affect our content. While this can sometimes be a good thing if you want to develop a styleable component, it often results in undesirable effects. Thankfully, Internet Explorer 5.5 adds a new feature, named “Viewlink”, which encapsulates not just the implementation of your HTML Component, but the document as well. “Viewlink” differs from a regular component in that instead of adding things as children of our element, we instead can provide a document fragment which the browser will “attach” to our custom element in a private, encapsulated manner. The simplest way to do this is to just use our HTML Component’s document root.

<PUBLIC:ATTACH EVENT="onmouseover" ONEVENT="rollon()" />
<PUBLIC:ATTACH EVENT="onmouseout" ONEVENT="rollout()" />
<img id="imgtag" />
defaults.viewLink = document;
var img = document.all['imgtag'];
function rollon() {
    img.src = element.basesrc + "_rollon.gif";
function rollout() {
    img.src = element.basesrc + ".gif";

Using the “defaults.viewLink” property, we can set our HTML Component’s private document fragment as our viewLink, rendering the children but without adding them as children of our element. Perfect encapsulation.

*cough* OK, obviously it’s 2017 and Internet Explorer 5.5 isn’t relevant anymore. But if you’re a Web developer, this should have given you some pause for thought. The modern Web Components pillars: Templates, Custom Elements, Shadow DOM, and Imports, were all features originally in IE5, released in 1999.

Now, it “looks outdated”: uppercase instead of lowercase tags, the “on”s everywhere in the event names, but that’s really just a slight change of accent. Shake off the initial feeling that it’s cruft, and the actual meat is all there, and it’s mostly the same. Sure, there’s magic XML tags instead of JavaScript APIs, and magic globals instead of callback functions, but that’s nothing more than a slight change of dialect. IE says tomato, Chrome says tomato.

Now, it’s likely you’ve never heard of HTML Components at all. And, perhaps shockingly, a quick search at the time of this article’s publishing shows nobody else does at all.

Why did IE5’s HTML Components never quite catch on? Despite what you might think, it’s not because of a lack of open standards. Reminder, a decent amount of the web API, today, started from Internet Explorer’s DHTML initiative. contenteditable, XMLHttpRequest, innerHTML were all carefully, meticulously reverse-engineered from Internet Explorer. Internet Explorer was the dominant platform for websites — practically nobody designed or even tested websites for Opera or Netscape. I can remember designing websites that used IE-specific features like DirectX filters to flip images horizontally, or the VML

And it’s not because of a lack of evangelism or documentation. Microsoft was trying to push DHTML and HTML Components hard. Despite the content being nearly 20 years old at this point, documentation on HTML Components and viewLink is surprisingly well-kept, with diagrams and images, sample links and all, archived without any broken links. Microsoft’s librarians deserve fantastic credit on that one.

For any browser or web developer, please go read the DHTML Dude columns. Take a look at the breadth of APIs available on display, and go look at some example components on display. Take a look at the persistence API, or dynamic expression properties. Besides the much-hyped-but-dated-in-retrospect XML data binding tech, it all seems relatively modern. Web fonts? IE4. CSS gradients? IE5.5. Vector graphics? VML (which, in my opinion, is a more sensible standard than SVG, but that’s for another day.)

So, again I ask, why did this never catch on? I’m sure there are a variety of complex factors, probably none of which are technical reasons. Despite our lists of “engineering best practices” and “blub paradoxes“, computer engineering has, and always will be, dominated by fads and marketing and corporate politics.

The more important question is a bigger one: Why am I the first one to point this out? Searching for “HTML Components” and “Viewlink” leads to very little discussion about them online, past roughly 2004. Microsoft surely must have been involved in the Web Components Working Group. Was this discussed at all?

Pop culture and fads pop in and fade out over the years. Just a few years ago, web communities were excited about Object.observe before React proved it unnecessary. Before node.js’s take on “isomorphic JavaScript” was solidified, heck, even before v8cgi / teajs, an early JavaScript-as-a-Server project, another bizarre web framework known as Aptana Jaxer was doing it in a much more direct way.

History is important. It’s easier to point and laugh and ignore outdated technology like Internet Explorer. But tech, so far, has an uncanny ability to keep repeating itself. How can we do a better job paying attention to things that happened before us, rather than assuming it was all bad?

New Xplain: Basic 2D Rasterization

Hi. I just published a new Xplain article about basic 2D rasterization. Since I left Endless and the Linux world behind, I haven’t felt as motivated to document the details of the X11 Window System, but I still feel very motivated to teach the basics and foundations of graphics and other systems. Xplain seems to be my place for interactive demo explanations, so on there it goes.

Take care.

Take care.

Today was my last day at Endless.

Most of you know me for my Linux, GNOME, and free software work. It might be shocking or surprising for you guys to know that I’m choosing, willingly!, to go on and be one of the nameless faces working on commercial software.


  • At Endless, my last year was almost exclusively spent working on proprietary software. And I was happier.
  • I’m typing this in Visual Studio Code, running on Windows 10. I haven’t run any variant of Linux on my main desktop computer for almost 5 years.
  • I took a pay cut for the new position.

I’ll post about my experiences working on Linux and open-source software professionally soon. After that, this blog will die. I’ll still keep it up and running, but I won’t be posting any more.

Take care.


I spend a lot of time explaining the Linux Graphics Stack to various people online. One of the biggest things I’ve come across is that people have a hard time differentiating between certain acronyms like “DRI”, “DRM” and “KMS”, and where they fit in the Linux kernel, in Xorg, and in Wayland. We’re not the best at naming things, and sometimes we choose the wrong name. But still, let’s go over what these mean, and where they (should) be used.

You see, a long time ago, Linux developers had a bunch of shiny new GPUs and wanted to render 3D graphics on them. We already had an OpenGL implementation that could do software rendering, called mesa. We had some limited drivers that could do hardware rendering in the X server. We just needed to glue it all together: implement hardware support in Mesa, and then put the two together with some duct tape.

So a group of developers much, much older than I am started the “Direct Rendering Infrastructure” project, or “DRI” for short. This project would add functionality and glue it all together. So, the obvious choice when naming a piece of glue technology like this is to give it the name “DRI”, right?

Well, we ended up with a large number of unrelated things all effectively named “DRI”. It’s double the fun when new versions of these components come around, e.g. “DRI2” can either refer to a driver model inside mesa, or an extension to the X server.

Yikes. So let’s try to untangle this a bit. Code was added to primarily three places in the DRI project: the mesa OpenGL implementation, the Xorg server, and the Linux kernel. The code does these three things: In order to get graphics on-screen, mesa needs to allocate a buffer, tell the kernel to render into it, and then pass that buffer over to the X Server, which will then display that buffer on the screen.

The code that was added to the kernel was in the form of a module called the “Direct Rendering Manager” subsystem, or “DRM”. The “DRM” subsystem takes care of controlling the GPU hardware, since userspace does not have the permissions to poke at the raw driver directly. Userspace uses these kernel devices by opening them through a path in “/dev/dri”, like “/dev/dri/card0”. Unfortunately, through historical accident, the device nodes had “DRI” in them, but we cannot change it for backwards-compatibility reasons.

The code that was added to mesa, to allocate and then submit commands to render inside those buffers, was a new driver model. As mentioned, there are two versions of this mesa-internal driver model. The differences aren’t too important. If you’ve ever looked inside /usr/lib/dri/ to see /usr/lib/dri/ and such, this is the DRI that’s being named here. It’s telling you that these libraries are mesa drivers that support the DRI driver model.

The third bit, the code that was added to the X server, which was code to allocate, swap, and render to these buffers, is a protocol extension known as DRI. There are multiple versions of it: DRI1, DRI2 and DRI3. Basically, mesa uses these protocol extensions to supply its buffers to the X server so it can show them on screen when it wants to.

It can be extraordinarily confusing when both meanings of DRI are in a single piece of code, like can be found in mesa. Here, we see a piece of helper code for the DRI2 driver model API that helps implement a piece of the code to work with the DRI3 protocol extension, so we end up with both “DRI2” and “DRI3” in our code.

Additionally, to cut down on the shared amount of code between our X server and our mesa driver when dealing with buffer management, we implemented a simple userspace library to help us out, and we called it “libdrm”. It is mostly a set of wrappers around the kernel’s DRM API, but it can have more complex behavior for more complex kinds of buffer management.

The DRM kernel API also has another, separate API inside it, sometimes known as “DRM mode”, and sometimes known as “KMS”, in order to configure and control display controllers. Display controllers don’t render things, they just take a buffer and show it on an output like an HDMI TV or a laptop panel. Perhaps we should have given it a different name and split it out even further. But the DRM mode API is another name for the KMS API. There is some work ongoing to split out the KMS API from the generic DRM API, so that we have two separate devices nodes for them: “render nodes” and “KMS nodes”.

You can also sometimes see the word “DRM” used in other contexts in userspace APIs as well, usually referring to buffer sharing. As a simple example, in order to pass buffers between Wayland clients and Wayland compositors, the mesa implementation of this uses a secret internal Wayland protocol known as wl_drm. This protocol is eerily similar to DRI3, actually, which goes to show that sometimes we can’t decide on what something should be named ourselves.

Why I’m excited for Vulkan

I’ve stopped posting here because, in some sense, I felt I had to be professional. I have a lot of half-written drafts I never felt were good enough to publish. Since a lot of eyes were on me, I only posted when I felt I had something I was really proud to share. For anyone who has met me in real-life, you know I can talk a lot about a lot of things, and more than anything else, I’m excited to teach and share. I felt stifled by having a platform to say a lot, and only feeling I could say something really complete and polished, even though I have a lot I want to say.

So expect half-written thoughts on things from here on out, a lot more frequently. I’ll still try to keep it technical and interesting to my audience.

What’s Vulkan

In order to program GPUs, we have a few APIs: Direct3D and OpenGL are the most popular ones, currently. OpenGL has the advantage of being implemented independently by most vendors, and is generally platform-agnostic. The OpenGL API and specification is managed by the standards organization Khronos. Note that in closed environments, you can find many others. Apple has Metal for their own set of PVR-based GPUs. In the game console space, Sony had libgcm on the PS3, GNM on the PS4, and Nintendo has the GX API for the Gamecube and Wii, and GX2 for the Wii U. Since it wasn’t expected that GPUs were swappable by consumers like on the PC platform, these APIs were extremely low-level.

OpenGL was originally started back in the mid-80s as a library called Graphics Layer, or “GL”, for SGI’s internal use on their own hardware and systems. They then released it as a product, “IRIS GL”, allowing customers to render graphics on SGI workstations. As a strategic move by SGI, SGI allowed third-parties to implement the API and opened up the specifications, transferring it from “IRIS GL” to “OpenGL”.

In the 30+ years since GL was started, computing has grown a lot, and OpenGL’s model has grown outdated. Vulkan is the first attempt at a cross-platform, vendor-neutral low-level graphics API. Low-level APIs are similar to what has been seen in the console space for close to a decade, offering higher levels of performance, but instead of tying itself to a GPU vendor, it allows any vendor to implement it for its own hardware.


People have already written a lot about why Vulkan is exciting. It has a lower overhead on the CPU, leading to much improved performance, especially on CPU-constrained platform like mobile. Instead of being a global implicit state machine, it’s very explicit, allowing for better multithreaded performance.

These are all true, and they’re all good things that people should be excited for. But I’m not going to write about any of these. Instead, I’m going to talk about a more important point which I don’t think has been written about much: the GPU vendor cannot cheat.

You see, there’s been an awkward development in high-level graphics APIs over the last few years. During the early 2000s, the two major GPU vendors, ATI and NVIDIA, effectively had an arms race. They noticed that certain programs and games were behaving “foolishly”.

The code for a game might look like this:

// Clear to black.

// Start drawing triangles.
glVertex3f(-1, -1, 0);
glVertex3f(-1, 1, 0);
glVertex3f( 1, 1, 0);
// ...

(I’m writing in OpenGL, because that’s the API I know, but Direct3D mirrors a very similar API, and has a similar problem)

The vendors noticed that games were clearing the entire screen to black when they really didn’t need to. So they started figuring out whether the game “really” needed to clear the screen, by simply setting a flag that the game wanted a clear, and then not doing it if the triangles painted over it.

Vendors shipped these updated drivers which had better performance. In a perfect world, these tricks would simply improve performance. But competition is a nasty thing, and once one competitor starts playing dirty, you have to follow along to compete.

As another example, the driver vendors noticed that games uploaded textures they didn’t always use. So the drivers started to only upload textures when games actually drew them.

But uploading textures isn’t cheap. When a new texture first appears in a game, it would stall a little bit. And customers got mad at the game developers for having “unoptimized” games, when it was really the vendor’s fault for not implementing the API correctly! Gamers praised the driver vendor for making everything fast, without realizing that performance is a trade-off.

So game developers found another trick: they would draw rectangles with each texture once while the level loaded, to trick the driver into actually uploading the texture. This is the sort of “folklore knowledge” that tends to be passed around from game development company to game development company, that just sort of exists within the industry. This isn’t really documented anywhere, since it’s not a feature of the API, it’s just secret knowledge about how OpenGL really works in practice.

Bigger game developers know all of these, and they tend to have support contracts with the driver vendors who help them solve issues. I’ve heard several examples from game developers where they were told to draw 67 triangles at a time instead of 64 triangles. And that speeds up NVIDIA, but the magic number might be 62 on AMD. Most game engines that I know of, when using “OpenGL in practice”, actually have different paths depending on the OpenGL vendor in use.

I could go on. NVIDIA has broken Chromium because it patched out the “localtime” function. The Dolphin project has hit bugs because having an executable named “Dolphin.exe”. We were told by an NVIDIA employee that there was a similar internal testing tool that used the API wrong, and they simply patched it up themselves. A very popular post briefly touched on “how much game developers get wrong” from an NVIDIA-biased perspective, but having talked to these developers, they’re often told to remove such calls for performance, or because it causes strange behavior because of driver heuristics. It’s common industry knowledge that most drivers ship with hand-compiled or optimized forms of shaders used in popular games as well.

You might have heard of tricks like “AZDO”, or “approaching zero driver overhead”. Basically, since game developers were asking for a slimmer, simpler OpenGL, NVIDIA added a number of extensions to their driver to support more modern GPU usage. The general consensus across the industry was a resounding sigh.

A major issue in shipping GLSL shaders in games is that since there is no conformance test suite for GLSL, different drivers accept different variants of GLSL. For a simple example, see page 85 Glyphy slides for examples of complex shaders in action.

NVIDIA has cemented themselves as the “king of video games” simply by having the most tricks. Since game developers optimize for NVIDIA first, they have an entire empire built around being dishonest. The general impression among most gamers is that Intel and AMD drivers are written by buffoons who don’t know how to program their way out of a paper bag. OpenGL is hard to get right, and NVIDIA has millions of lines of code invested in that. The Dolphin Project even concludes that NVIDIA’s OpenGL implementation is the only one to really work.

How does one get out of that?


In early 2013, AMD released the Mantle API, a cross-platform, low-overhead API to program GPUs. They then donated this specification to the Khronos OpenGL committee, and waited. At the same time, AMD worked with Microsoft engineers to design a low-overhead Direct3D 12 API, primarily for the next version of the Xbox, in response to Sony’s success with libgcm.

A year later, the “gl-next” effort was announced and started. The committee, composed of game developers and mobile vendors, quickly hacked through the specification, rounding off the corners. Everyone was excited, but more than anything else, game developers were happy to have a comfortable API that didn’t feel like they were wrestling with the driver. Mobile developers were happy that they had a model that mapped very well to their hardware.

Microsoft got word about gl-next, and quickly followed with Direct3D 12. Another year passed, and the gl-next API was renamed to “Vulkan”.

I have been told through the grape vine that NVIDIA was not very happy with this — they didn’t want to lose the millions they invested in their driver, and their marketing and technical edge, but they couldn’t go against momentum.

Pulling a political coup wasn’t easy — it was tried in the mid-2000s as “OpenGL 3.0”, but since there were less graphics vendors in the day, and since game developers were not allowed as Khronos members, NVIDIA was able to wield enough power to maintain the status quo.


Those of you who have seen the Vulkan API (and there are plenty of details on the open web, even if the specs are currently behind NDA), you know that there isn’t any equivalent to glClear or similar. The designs of Vulkan are that you control a modern GPU from start to finish. You control all of these steps, you control what gets scheduled and when.

The games industry has had a term called “dev-to-triangle time” when describing API complexity and difficulty: take an experienced programmer, put him in a room with a brand new SDK he’s never used before, and wait until he gets a single triangle up on the screen. How long does it take?

I’ve always heard the PS2 as having two weeks to a month of dev-to-triangle time, but according to a recent Sony engineer, it was around 3 to 6 months (I think that’s exaggerated, personally). The PS2 made you wrestle with two vector coprocessors, the VU0 and VU1, the Graphics Synthesizer, which ran the equivalent of today’s pixel shaders, along with a dedicated floating-point unit. Getting an engine up on the PS2 required writing code for these four devices, and then writing a process to pass data from one to the other and plug them all together. It’s sort of like you’re writing a driver!

The upside, of course, was that once you put in this required effort, expanding the engine is fairly easy, and you have a fairly good understanding of how everything works and where the boundaries are.

Direct3D and OpenGL, once you wrestle out a few driver issues, consistently has one to two days. The downside, of course, is that complex actions require complex techniques like draw call batching and using atlases to prevent texture switches, or the more complex AZDO techniques detailed above. Some of these can involve a major restructure of engine code. So the subtleties of high-level APIs are only discovered late in development.

Vulkan chooses to opt for the PS2-like approach: game developers are in charge of building command buffers, submitting them to the GPU, waiting on fences, and swapping the front and back buffers and submitting them to the window system themselves.

This means that the driver layer is fairly thin. An ImgTec engineer mentioned that dev-to-triangle time on Vulkan was likely two weeks to a month.

But what you get in return is all that you got on the PS2, and in particular, you get something that hasn’t been possible on the PC so far: accountability. Since the layer is so thin, there’s no place for the driver vendor to cheat. The graphics performance of games is as much as what the developer puts into it. For once, the people gamers often blame — the game developer — will actually be at fault.