Tell me if this sounds familiar. You want to write a renderer. OpenGL seems beginner-friendly, and there’s lots of tutorials for you to get started, but you also know it’s deprecated, kind of dead, and mocked; not for “real graphics programmers”, whatever that means. Its replacement, Vulkan, seems daunting and scary to learn, tutorials seem to spin for hours without seeing a single pixel on the screen, and by the time you’re done, you’re no closer to making that low-latency twitchy multiplayer FPS game you’ve always wanted to make.
You might have heard things like “the boilerplate is mostly front-loaded. Sure, it takes 5,000 lines of code to draw one triangle, but then it only takes 5,100 lines of code to draw a giant city full of animated NPCs with flying cars and HDR and bloom and god rays and …”
Except, if you’ve tried it, especially having learned from a tutorial, that’s really not quite true. Technically, there might only still be 5,100 lines of code, but you need to change them in scary and brittle ways.
Now, I’ll be clear: as a graphics programmer, I like the newer graphics APIs like Vulkan (and Metal and Direct3D 12). But they are definitely not easy to learn. Some of this is just some flat out missing documentation about how to use them — the authoritative source of graphics API documentation are delivered as sterile and descriptive info-dumps about what the implementation is supposed to do. But the biggest missing piece, in my mind, is that there’s a level of structural planning above just the graphics APIs — ways to think about rendering and graphics that are bigger than just glEnable() and vkCmdCopyBuffer(). The biggest difference between something like OpenGL and a modern API like Vulkan is that OpenGL is forgiving at letting you stumble around until you realize how to write a renderer, while Vulkan makes you have a blueprint from the start. The messy boilerplate of modern APIs like Vulkan only become not so bad when you come in with this plan from the start.
You might be wondering, if this boilerplate is going to be the same every time, why not put it in the API. And that’s a great question; future updates to Vulkan and Direct3D 12 are still trying to figure out exactly where to split this line. Programmers might say they want low-level control for the best performance, but be scared and overwhelmed by what that looks like in practice. That said, while the rough shape of this plan might look similar, the details of how everything is implemented change. There will be 5,000 lines of boilerplate, but the exact nature and structure of that boilerplate will depend massively on the specific details of your project and what goals you want to meet.
Traditionally, building up this intuition comes with a lot of practice, and often times as well, a lot of institutional knowledge, the kind you get by working on renderers built by people before you. I’ve worked on many renderers for shipping games, and I’m hoping this post will give you some insight into how I think about building a renderer, and what the different pieces are. While I might link specifics of the individual APIs, this post is intended to more be a combination of high-level structural ideas you can use across any APIs, modern or not.
At the end of the day, there are really only a handful of fundamental things any graphics platform API has to do: you need to be able to give some data to the GPU, you need to be able to render out to various render targets, and you need to make some draw calls. Let’s tackle these in reverse order.
Pedants Note: There are really only so many words to go around, and like many fields, we sometimes use generic terms in multiple different contexts. Be aware of that, and know that when I talk about a “render pass”, it might have nothing to do with some other renderer’s idea of what a “pass” might be.
Draw Calls
Let’s start with maybe the simplest thing here, but it might be unobvious if you’re new to graphics programming: the importance of draw calls. OpenGL is commonly described as a giant state machine, and there are elements of its API design that are like that, but one pretty big key is that the state only applies when you make a draw call*. In modern OpenGL, a draw call is an API call like either glDrawArrays/glDrawElements, or some of the fancier or more absurd variants. Also, for the purposes of this post, compute dispatches fall under the “draw call” banner, they just have smaller amounts of state.
* For OpenGL users, I recommend checking out the footnote at the bottom of this post.
Binding textures, vertex arrays, uniforms, shaders, and setting things like blend state and such, the only way this will eventually have an effect is when you go to make a draw call. As such, one of first goals is to think about “what draw calls are we making, and what state would we like them to have”. For instance, if I know I want to have a cube with a transparent texture on it, I know I’ll need to make a draw call with this state bound:
- Vertex bindings to the cube mesh
- Some bindings to the transparent texture
- A shader that samples the texture and returns that as the color
- Blend state to turn the object transparent
- Depth-stencil state to make sure the object doesn’t write to the depth buffer
By thinking about what state the draw call needs to be in, we can separate out the concerns of it independently of any sort of surrounding code for the renderer or the passes. There’s a few different ways this can manifest into code, but one common way is to just literally build a structure for a draw call. This approach was taken when Epic Games rewrote their drawing pipeline for Unreal Engine 4.22 from “a complicated, template mess” (their words) to something they felt was far more flexible. Different engines might have different levels of granularity for what they put in their draw call structure, and there are plenty of tradeoffs here. For instance, do you pre-bake your PSOs, or remain flexible and cache them in the backend? How granular are your uniform updates? While bgfx exposes an imperative-like API, internally it’s implemented with a similar-looking “draw call struct”, though it definitely has different answers to these questions.
That said, you don’t have to directly implement draw call structs, and many do not. But at the very least, there should be a goal to separate out the gameplay-level structures from the rendering structures; the two might not even have a 1:1 mapping. A very common problem we need to solve in graphics is transparency sorting. If I have two transparent cubes on screen, then the way it works is that the draw call that’s made later is the one that renders on top; this means that we need to look at both cubes, and then sort them from distance from the camera in order for the sort to roughly be correct. It might be tempting to sort the list of Cube objects directly, but then, what if want to place something inside the Cube — maybe the Cube is actually some sort of floating crystal with our princess inside? We now have a tree where the Cube has a Princess inside it, and trying to sort this tree quickly becomes a pain. A flat list of things to render that you can sort tends to be a better idea.
Not to mention, it’s common, and expected, for a single “object” to require multiple draw calls in the same frame. For instance, we might have an object for a single window, but that window might have its frame made out of a wood material going into the opaque pass, while the glass pane goes into the transparent pass. Both of these are going to want different textures, and different states for blend and depth/stencil, and possibly different shaders, too, so it should be obvious that this kind of a mesh is going to require at least two draw calls. To add in shadows, you’ll also need to render our little window object to the shadow map as well, totalling up three draw calls.
A very easy beginner’s mistake that I see is an architecture I call “void Player::render()
renderers”, as they tend to have render methods directly attached to gameplay-level objects that set up some state, maybe bind a shader and some uniforms, and make their draw calls inline. When trying to extend this to support multiple draw calls, they tend to fall down, because now you need to call the render() method once when drawing it in opaque mode, and then again when drawing it in transparent mode, and then again as well when adding more advanced rendering features.
To avoid this, try to design for this, and make an architecture to make it easy for an object like Player to map to potentially output many different draw calls, and to be able to “route” these draw calls to different places, too. So, perhaps instead of one giant list of draw calls, you might make many different lists for different purposes. You probably want to sort your opaque objects differently than your transparent objects, and you almost certainly want lists for other things like the different shadow map passes.
But now that we’ve made our draw calls, what do we do with them?
Render Passes
So far, we’ve been talking about drawing some triangles to the screen with some textures and shaders. And that kind of renderer can get you quite far. But you might have seen that in recent years, rendering contains a lot more stuff than just rendering triangles. This coincided with the rise of techniques like deferred shading and advanced post-processing on consumer GPUs.
OpenGL hides this fact from you a little bit, but when you render onto “the screen”, you’re just rendering onto a texture. There’s no magic there, it’s just a normal old texture, and your desktop environment is in charge of taking that texture, and combining it the textures for all the other windows on screen to form the user interface. The little window previews in Alt-Tab are using the same exact kinds of textures that are used on the transparent cube, but the textures are just from other windows instead. Due to an unfortunate API limitation of OpenGL, you can’t actually access this texture, but it’s still there, and it’s just like any other texture.
The API design of Direct3D 11 makes this a little bit more obvious, by not giving the user a way to render directly to the screen without first creating and binding a render target texture to draw to. And a render target texture is just a texture with a special flag attached.
Additionally, there’s a feature on modern GPUs called “multiple render target”, or MRT, for short. Shaders and draw calls don’t have to output to only one render target texture at a time, they can actually output to up to 8 different render target textures at a time. The shader has syntax to output a color to render target 0, and a different color to render target 1, etc. all the way up to 8. Most games don’t need that many render target textures, though, 4 is probably the most you’ll see in practice.
With this in mind, at the lowest level, a render pass is a collection of draw calls all writing to the same set of render target textures, that all execute around roughly the same time. For instance, implement post-processing, you might first have a pass that renders your game scene to a render target texture, and then bind that texture as an input to a different draw call which applies a shader to it. In simple cases, these shaders can be things like color correction, vignettes, film grain.
There’s one more restriction though, which is that we often need to feed the results of one pass into another. It’s generally not possible to render into a render target texture at the same we have it bound for rendering in the shader, as this would create a feedback loop, so either we switch to a different render target entirely, or we often need to make a copy for ourselves.
A more complex example would be an effect like bloom. Bloom effects often work by repeatedly scaling down the input image to blur it, and adding them back together to achieve a Gaussian-like filter. Since each “step” in the scaling down here renders to a different render target (as the render target is getting smaller), each step would be its own pass.
Deferred shading pipelines become even more complicated here. In this case, the render target textures outputted by each pass are not just colors to be tweaked by a post-process pipeline, but specially crafted series of textures that allow future passes in the renderer to apply lighting and shading, shadows, decals, and all sorts of different core rendering techniques. If this is unfamiliar to you, my video on deferred shading in Breath of the Wild might be instructive here, but you can also find many different graphics breakdowns on games using deferred shading in the wild, one example being this breakdown of the indie hit Slime Rancher.
As more and more deferred-like approaches take over rendering, render passes and render pass management have start to become one of the main central things to think about when designing a renderer. When all your render passes are put together, they form a graph — one pass creates the render target that’s consumed by a different pass. When I think about render passes, I think about the dataflow across the entire frame; passes produce and consume render targets, and inside of each pass is a list of draw calls. And as renderers grow to hundreds of passes, keeping track of passes manually can be tricky. By using the insight that the dataflow in our frame is an acyclic graph, we start to end up with new tools like Frostbite’s FrameGraph, which uses graph theory to try to simplify the pain of writing a lot of code to set up different passes and hook them together. This idea has been so popular that it’s actually quite standard and commonplace to see across many different kinds of renderers now, and there are several implementations of the concept, like AMD’s Render Pipeline Shaders, which even adds a little DSL language for you to specify all the passes in your frame.
In older APIs like Direct3D 11 and OpenGL, passes were implicit — you could switch render targets whenever you wanted, and the driver was expected to juggle things for you. But for this new breed of APIs, like Direct3D 12, Vulkan, and Metal, render passes became a first-class concept. Direct3D 12 has the simplest API design here, with BeginRenderPass just taking a number of render targets to render to, though Direct3D 12’s render passes are actually optional. Vulkan has a whole complicated way of specifying render passes, but it also has a newer extension that basically implements the Direct3D 12 API instead.
The main impetus for adding support for render passes directly to modern graphics APIs was better support for mobile GPUs, because mobile GPUs are built around something called “tile memory”, and use a very different algorithm to render compared to desktop GPUs. I’d rather not go into a big info-dump here on mobile GPUs, as I want to try to keep this post about the high-level concepts, but I will recommend ARM’s Mali GPU Training series if you would like to learn more about mobile GPU architecture, and in particular, the Frame Construction video is fairly relevant for what we’re talking about here. I especially love their use of graphviz to visualize the passes in a frame.
But even besides areas like post-processing, multiple render passes are also helpful when drawing our 3D scene as well. Much like with draw calls, it’s also common and expected for a single object to end up in multiple passes in the same frame as well. A common technique in rendering is what’s called a “depth prepass” or “Z-prepass”. The idea is if rendering a full material for the object is expensive, we can prevent a lot of overdraw and wasted pixels by first rendering out a much simpler version of the object that will only write to the depth buffer, although this is only for the parts of the object that are opaque. As you increase the number of objects you want to support, you might to have an object render into both the depth prepass and the main color pass.
Not to mention, the opaque parts of your object might also want to go into other special passes, like our aforementioned shadow map passes. And even just for the case of our window from before, while the opaque and transparent draw calls do both use the same render targets, it’s pretty common to split these into different passes as well, since we probably want to run other post-processing passes like ambient occlusion or depth-of-field after we render all the opaque draw calls, but before we render the transparent draw calls.
As games keep becoming bigger and bigger, it’s also common to see different parts of the scene render at different resolutions as a performance tactic. You might want to render opaque objects at full-resolution, render transparent objects at half-resolution, and then render UI back at full-resolution again. The order and dataflow of passes is something you really want to keep flexible; it’s common for this order to change over the course of development, or even sometimes dynamically at runtime.
Proper render pass management is one of the biggest differences between a renderer written by someone who’s just starting out, and a renderer designed with years of experience. A big stumbling block I’ve often seen in people who are starting out is trying to retrofit shadow map passes into their renderer — shadow map passes are special, because the camera is in a different spot, you’re not writing to any color textures, you might want to use a different shader, and usually some of the rasterizer state of the draw call has to change to add on things like depth biases. A void Player::render()
-style renderer is going to struggle with this. It has no holistic or central concept of different passes, since it’s only thinking about things at the object level. It’s pretty common to accidentally couple your rendering code to the assumptions of the opaque pass, and when you’re trying to add in shadow maps, this is your first exposure with multiple passes, and you probably aren’t thinking about them yet, nor how passes might connect together.
For one extra hint here, most engines that want to draw the same object into multiple passes usually attach some metadata to the concept of the pass, like the location and settings of the camera, what kind of pass it is (shadow map vs. depth prepass vs. opaque vs. transparent), how and where it should submit its draw calls, and other miscellaneous state, like any special state it should apply. This gives the code making the draw calls the right context to know whether it should draw anything in this pass, and how it should do it, and which shaders it should use.
Render Passes and Synchronization
GPUs are designed to go fast. Really fast. A lot of different tricks are used to make GPUs go fast, but the main one is parallelism; it is possible for GPUs to render more than once triangle at once. They can also render more than one draw call at once, and in some very extreme cases, GPUs can render multiple passes in parallel, too. However, that causes a problem for us, if one pass is used to feed into another. How does the GPU know that it’s allowed to run two passes in parallel? In older APIs like Direct3D 11 and OpenGL, the graphics driver was responsible for noticing that one pass’s render target becomes the next pass’s texture, and told the GPU not to run them in parallel. Similarly, we have the same problem the other way, too: if the next pass after that starts using that texture again as a render target, then we also need to make sure not to overlap there, too. This requires tracking every single texture and render target used in every single draw call during a pass, and comparing them with previous passes, a process known as as “automatic hazard tracking”. As with any automagic process, it can and does go wrong — false positives are not uncommon, meaning that two passes that could overlap in theory, don’t in practice, because the driver was too cautious during its hazard tracking.
As an analogy here, imagine some sort of “automatic” parallelizing system on the CPU that guaranteed that any two jobs touching the same chunk of memory wouldn’t overlap. Tracking the “current owners” of every single possible byte address is clearly wasteful, so we have to pick a bigger granularity. The granularity here determines the probability of false positives — picking something like 8MB chunks, then any two jobs that just so happen to access memory in the same 8MB chunk would be treated as hazardous, even if their memory accesses weren’t overlapping. This is the kind of false dependency that we would like to prevent.
Not to mention, newer features like multi-threaded rendering, multi-queue, indirect drawing, and bindless resource access make it much more difficult to do automatic hazard tracking. These are increasingly becoming the building blocks of a modern renderer as we move to GPU-driven world, and as a result, Direct3D 12 and Vulkan decided to remove automatic hazard tracking in favor of making you spell out the dependencies yourself. It’s worth noting that actually Metal kept a limited form of automatic hazard tracking which you can use if you want, however, it does limit your ability to use these newer rendering features, and also brings back the potential false positives.
Rather than directly specifying that certain passes can or can’t overlap, the design of Vulkan, Direct3D 12 and Metal all use a system known as “barriers” on individual resources. Before one can use a texture as a render target, you first must place a barrier telling it “I want to use this as a render target now”; and when you want to change it back to being used as a texture, you must place another barrier saying “use me as a texture”. This design also solves some other problems — another way that GPUs go fast is by having a lot of caches, and in order to “push” a texture along to the next stage, you really want to clear such caches. Another related problem is known as “image layouts”; rendering into a texture might want to lay out the texture data differently in memory than reading from it later inside a shader, as the memory access patterns can be different. So Vulkan has the concept of “image layouts”, which tell the driver what rough mode it should be in, with options like COLOR_ATTACHMENT_OPTIMAL (color attachment being the Vulkan name for “render target”), SHADER_READ_ONLY_OPTIMAL (for when reading from it in a shader), and so on.
Specifying the proper barriers can be tricky to do, and some naive implementations might “over-barrier”, which can cause performance issues. Since render passes and synchronization are so directly intertwined, it’s extremely common to integrate this kind of tracking directly into the design of the render graph. In fact, one of the original motivations for Frostbite’s FrameGraph mechanism was to make these barriers a lot simpler and less error-prone. The team behind AMD’s Render Pipeline Shaders mentioned that switching games to their framework reduced over-barrier errors, and improved GPU performance overall. For some very intricate details on barriers and what they’re doing at the hardware level, I recommend checking out MJP’s series on Breaking Down Barriers.
Data, Uploads, and Synchronization
The last major thing that I think about when I think about a renderer is data management. Part of this is “boring” stuff like allocation: in order for a texture to exist on the GPU, it needs to be somewhere in the GPU’s memory. Previously, this sort of data allocation was very simple — in OpenGL or Direct3D 11, you would ask the driver for a texture, tell it how big you wanted it, and what kinds of pixels were inside, and it would give you back a resource, which you could then supply with data. This same process happens with all kinds of data: not just textures, but also vertex buffers for storing vertex data, index buffers, and so on.
This seems simple, but the details matter greatly here. In the old APIs, you could upload new between draw calls, and it was all just supposed to work. You could upload a mesh to a vertex buffer, make a draw call, and then upload different data to the same vertex buffer, and everything would render just fine. Now, GPUs are pretty impressive pieces of hardware, but they’re not magic. If the draw calls are supposed to be running at the same time, how can we overwrite the data in a buffer while it’s in use? In practice, there’s three possible answers:
- The draws, actually, do not overlap. The GPU will wait for the first draw call to be done, then it updates the buffer, then runs the second draw call.
- The driver makes a copy of the buffer, and then uploads your new data to it. The first draw call gets the original buffer, the second draw call gets the copy.
- There’s some other wacky hardware magic in there to have “versions” the buffer data across different draw calls.
The third answer does sound nice, and it’s indeed what some older graphics hardware used to do, but those days are long gone — as the number of vertices and polygons and textures and amount of data increased, it wasn’t worth it. In practice, drivers use some combination of the first two solutions, using heuristics to decide which to use at any given time. In OpenGL, when updating your buffer data with glBufferData, the idea was that GL_STATIC_DRAW would get you option #1 and stall the GPU, while using GL_DYNAMIC_DRAW would get you the silent copy behavior, a technique known as “buffer renaming”. In practice, I think a lot of OpenGL drivers just stalled between draw calls, leading to the somewhat supersitutous practice of “buffer orphaning”, or replacing the buffer entirely between draw calls, effectively forcing a buffer rename. This optimization stretches back a long way, though. As early as Direct3D 8, buffer renaming was an officially supported optimization for updating vertex buffers.
This problem gets even thornier when considering OpenGL’s uniforms (known as constants in Direct3D). While it’s a bit of an odd case to update a single vertex buffer between draws, these uniforms often contain details like the position and rotation of the object you’re drawing, and are expected to change with every single draw call. Under the hood, they’re actually implemented as giant buffers of memory on the GPU, just like vertex buffers, a feature which later versions of OpenGL expose more directly. On the other hand, Direct3D 11 switched away from exposing the invidiual values, making the user upload constant buffers. For games with a lot of draw calls, these small buffer changes can add up quickly, and to implement this efficiently, buffer renaming is a must. So, when updating constant buffers, if the driver detects the constant buffer was already in use, they’d allocate a new chunk of memory, copy the old buffer into the new buffer, and then apply your updates.
Ultimately, while conceptually simple, the model requires considerable effort to use optimally in practice, and many flags and special paths had to be added to both OpenGL and Direct3D 11 to keep things running smoothly. Despite the API seeming simple, a large amount of heuristics and effort were needed behind the scenes, both of which were often unpredictable and unreliable for game developers.
Following the pattern so far, you might be able to guess what the more modern graphics APIs chose to clean up this mess. You are now in charge of allocating your own buffers, uploading to them, and crucially, you aren’t allowed to update them between draw calls anymore. If you have 100 draws that all use a different uniform value, you are in charge of allocating the 100 copies of that buffer that are needed and and upload to them yourself; though there is a small 128-byte area of constants that can be swapped out (“push constants” in Vulkan, “root constants” in Direct3D 12), usually just as a way to supply an index or two that points into a bigger chunk of buffer parameters.
In fact, not only are you not allowed to upload data between draw calls, you also need to make sure that if the buffer is currently in use by the GPU, then you shouldn’t touch it, or you risk bad things happening. Knowing when something is in use by the GPU or not might seem a bit involved, but in practice, there’s a very simple solution. The GPU can write an integer to a special location in memory every time it finishes some work, called a “fence” (Direct3D 12) or “timeline semaphore” (Vulkan), and the CPU can read this integer back. So, every frame, the CPU increments its “current frame” index, then after it submits all the passes to the GPU, it tells the GPU to write that frame index to the current frame. If you know that the last time you used this buffer was on frame 63, then once the GPU writes fence value greater than or equal to 63, you know it’s done with the buffer, and you can finally reuse it.
For some more details of the tradeoffs involved in fast, efficient, buffer management, I recommend reading through MJP’s article on GPU Memory Pools, which also links to some other great articles and resources. Additionally, if you want a Direct3D 11-style experience, AMD has fantastic open-source memory allocation libraries for both Direct3D 12 and Vulkan.
My one recommendation to help get a grip on this is that you try to architect your renderer in a way that you can do pretty much all of your data uploads at the beginning of the frame. For textures and vertex data, this isn’t too tricky, but uniform data can be trickier, since it often comes glued together with the draw calls. One common way to help with this is to build out a linear allocator — effectively, this is just a growable array of memory. When your renderer is walking the list of objects to generate draw calls for, it receives a pointer to the linear allocator too, and the code for rendering out each object allocates a chunk of memory, saves off the offset, and writes the uniform data it wants into the buffer. When it comes time to submit to the GPU, it is relatively easy to upload the whole buffer first thing, and use dynamic uniform buffer offsets to send the right region of data for each draw call; Direct3D 12 and Metal have similar offsets in their APIs when you bind them to the draw call.
Other stuff I didn’t cover
Modern graphics is quite complicated; and I only scratched the surface of how all these parts interact. “Platform” work like this is often a full-time job (actually, often multiple these days!), and it’s not uncommon to rewrite a graphics platform layer over time. But I hope this gave you some high-level overview for how I think about rendering, and how I map it to all the different primitives provided by both old-school graphics APIs like Direct3D 11 and OpenGL, and also modern APIs like Vulkan, Direct3D 12, and Metal.
I didn’t really cover anything about the high-level techniques, like how to do lighting and shading at a large scale, or how to practically write deferred rendering or post-processing, or how to build a material system. Partly because this is something that should carry over without too much modification between OpenGL and Vulkan, but also because I think there’s a wealth of other fantastic resources out there for that sort of thing. But I’ll consider writing some more if there’s enough interest…
I also didn’t cover anything about PSOs because I feel I already covered that well enough in my last article, but I’ll give one more practical tip: don’t be too ashamed of doing things at the last minute. For instance, if you use draw call structs, you might choose to loosely put state like blend state, rasterizer state, and which shader you’re using as loose-leaf objects in your draw call struct. Most engines I am aware of do something I’ve seen called “hash-n-cache” where you then use that to build a PSO description, and use that to look it up in a hash-map. If you already have a copy of it, you can use it directly, but if not, you can create one on the fly, and then cache it for later uses. The idea is that the states will not change too much from frame to frame, so once you’re past the initial few frames of creating PSOs, pretty much everything from there should be on the hot path. In general, the general hash-n-cache technique, and caching in general can get you surprisingly far towards getting something working right now. That said, in the extreme case, it can run into scalability issues, requiring detailed solutions.
* A quick rant about OpenGL’s bind-to-modify system
Above, I said “the state only applies when you make a draw call”. This is not strictly true in OpenGL. Due to 30 years of legacy and unfortunate API design, doing anything with most objects in OpenGL first requires binding them. Want to upload new texture data? glBindTexture. Want to upload new vertex data? glBindBuffer. Some really roundabout API design also resulted in the whole glBindBuffer, glVertexAttribPointer dance, where the currently bound array buffer is “locked in” when you call glVertexAttribPointer. To be clear; this is not a constraint because of how GPUs work, this is a constraint because OpenGL consistently lacked future foresight when designing its original APIs. The original OpenGL only had support for a single texture at a time, and glBindTexture was added when it was realized that having more than one would be a good idea. Similarly, glVertexAttribPointer originally took a pointer into the CPU’s memory space (that’s why the last argument has to awkwardly be cast to a void*!), and glBindBuffer was added later once it was realized that having a handle to a chunk of GPU memory was useful.
So, in OpenGL, binding is really used for two purposes — the first is binding the object so you can modify, query, or just manipulate it in some way, and the latter is to actually bind it to be used in the draw call. The latter is all we should be using binding for. In modern desktop GL, there is a feature called Direct State Access, or DSA, which bit the bullet and introduced a large number of new entrypoints to eliminate the dependency on bind-to-modify style APIs, but this feature never made it to OpenGL ES or WebGL, and there are also platforms in wide deployment that still only support up to OpenGL 3.1.
The bind-to-modify API design ends up confusing a lot of people into how graphics and GPUs work, and also leads to a lot of cases where state is correctly set up for a draw call, but “accidentally” changed at the last minute — for instance, I’ve had difficult-to-find bugs in my OpenGL applications because I called glBindBuffer to reupload some data while I had a VAO bound at the time. If you are using OpenGL, I highly suggest using Direct State Access if you can. While not perfect, it helps eliminate a large class of errors. If it’s not available on your platform, I recommend writing your own state tracking layer on top that helps gives you “DSA-like” functionality, and helps track and look out for any ambiently bound state that it knows might certain kinds of modify operations. It doesn’t have to be a huge heavyweight layer, it just has to watch out for and untangle some of the trickier state-related hazards.