Shellshock will happen again

Posted on October 5, 2014 by Jasper St. Pierre

As usual, I’m a month late, the big Bash bug known as Shellshock has come and gone, and the world was confused as to why this ever happened in the first place. It’s been fixed for a few weeks now. The questions have started: Why has nobody spotted this earlier? Can we can prevent it? Are the maintainers overworked and underfunded? Should we donate to the FSF? Should we switch to another shell by default? Can we ever trust bash again?

During the whole thing, there’s a big piece of evidence that I didn’t see anybody point out. And I think it helps answer all of these questions. So here it is.

I present to you the upstream git log for bash: http://git.savannah.gnu.org/cgit/bash.git/log/

Every programmer who has just clicked that link is now filled with disgust and disappointment.

It’s all crystal clear now: Nobody would have spotted this earlier. No, we can’t really prevent it. No, the maintainers aren’t overworked and underfunded. No, we shouldn’t donate to the FSF. Perhaps we should switch to another shell. No, we cannot trust bash. Not until a serious change in its management comes along.

For those of you who aren’t programmers, you might be staring at that page, not quite understanding what it all means. And that’s OK. Let me help explain it to you.

There’s a saying in the open-source development community: “With enough eyeballs, all bugs are shallow”. I don’t believe in it as strongly as I used to, but I think there’s some truth to it. It can be found in other disciplines as well: in science, it’s known as “peer-review”, where all papers and discoveries should be rigorously double-checked by peers to make sure you didn’t make any mistakes. In other sorts of writing, the person reviewing it is the editor. Basically, “have somebody else double-check your work”.

The issue with this, though, is that you need enough eyeballs to double-check your work. And while it was assumed before that all open-source software had enough eyeballs, that doesn’t seem to be the case. That said, you can certainly design and maintain a software project in certain ways to attract enough eyeballs. And unfortunately, bash isn’t doing these things.

First of all, you can see that there’s zero eyeballs on the code: one person, Chet Ramey, is writing the code, and nobody double-checks it. Because there’s only one developer, we might assume that there’s no big motivation to do code cleanups or try to make it accessible to anybody other than Chet, since nobody is working on it. And that’s true. This makes the eyeballs wander elsewhere.

But, in fact, that isn’t even true: Florian Weimer of the Red Hat Security Team has developed multiple fixes for the Shellshock bug, but his work was included in bash uncredited. Developers really need to be credited for their work. This makes the eyeballs wander elsewhere.

The code isn’t really that actively developed. Down the bottom of that page, we see dates and times that are from 2012. It seems like nobody actually cares about this code anymore, and nobody is really trying to fix bugs and make it modern. This makes the eyeballs wander elsewhere.

There are no detailed descriptions of what changed between versions, and Which commits in that log are ones that are serious and fixed CVEs, and which might just fix minor documentation bugs? It’s impossible to tell. This makes the eyeballs wander elsewhere.

And even with the corresponding code change, it can be difficult to tell whether a specific commit is an important security fix, a new feature, or a minor bug fix. There’s no explanation in the commit message for why this change was mode. or any sort of changelog, which makes it hard for people redistributing and patching bash to know what fixes are important, and which aren’t. The eyeballs will wander elsewhere.

In comparison, look at the commit log for the Linux kernel. There’s a large number of different people contributing, and all of them explain what changes they make and why they’re making them. To use a recent example (at the time of this writing), this NFS change describes in perfect detail why the change was made (for compatibility with Solaris hosts), and includes a link to a bug report with further information from debugging. As a result, even though bash is more commonly used and included in more things than the Linux kernel itself, the Linux kernel has more eyeballs and more developers.

So, what should we do? How should we fix this situation? I don’t really know. Moving to a new shell isn’t really a solution, and neither is a fork of bash. The best case scenario would be for bash to indeed change its development practices to be more like the Linux kernel, and adopt a thriving community of its own. I don’t have enough power or motivation to enact such a change. I can only hope that I can convince enough people of the right way to maintain a project.

Perhaps, Chet, if you’re out there, do you want to talk about it? This is a discussion we really should be having about the future of your project, and your responsibilities.

Hanging up the hat

Posted on August 21, 2014 by Jasper St. Pierre

Hello. It’s been quite a while. I’ve been meaning to post for a while, but I’ve been too busy trying to get GNOME 3.14 finished up, with Wayland all done for you. I also fixed the last stability issue in GNOME, and now both X11 and Wayland are stable as a rock. If you’ve ever had GNOME freeze up on you when switching windows or Alt-Tabbing, well, that’s fixed, and that was actually the same bug that was crashing Wayland. This was my big hesitation in shipping Wayland, and with that out of the way, I’m really happy. Please try out GNOME 3.13.90 on Wayland and let me know how it goes.

I promise to post a few more Xplain articles before the end of the year, and I have another blog post coming up about GPU rendering that you guys are going to enjoy. Promise. Even though, well…

Changes

I have a new job. Next Tuesday, the 26th, is my final day at Red Hat, and after that I’m going to be starting at Endless Mobile. Working at Red Hat has been a wonderful, life-changing experience, and it was a really hard decision to leave the incredible team that made GNOME what it is today. Thank you, each and every one of you. All of you are incredible, and I hope I keep working with you every single day. It would be an absolute shame if I wasn’t.

Endless Mobile is a fantastic new startup that is focused on shipping GNOME to real end users all across the world, and that’s too exciting of an opportunity to pass by. We, the GNOME community, can really improve the lives of people in developing countries. Let’s make it happen.

I’ll still be around on IRC, mailing lists, reddit, blogging, all the usual places. I’m not planning on leaving the GNOME community. If you have any questions at all, feel free to ask.

—

Cheers,
Jasper

xdg-shell

Posted on June 13, 2014 by Jasper St. Pierre

Wayland 1.5 is released. It’s a pretty exciting release, with plenty of features, but the most exciting thing about it is that we can begin work on Wayland 1.6!

… No, I’m serious. Wayland 1.6’s release schedule matches up pretty well with GNOME’s. Wayland 1.6 will be released in the coming weeks before GNOME 3.14, the first version of GNOME with full Wayland support out of the box.

Since development is opening again, we can resume work on xdg-shell, the new desktop shell protocol to replace wl_shell. I alongside Kristian Hoegsberg have been prototyping and implementing this in toolkits and Wayland compositors. We’re extremely happy with our current revision of the bare-bones protocol, so it’s at this point that we want to start evangelizing and outreaching to other communities to make sure that everybody can use it. We’ve been working closely with and taking input from the Wayland community. That means that we’ve been working with the Qt/KDE and Enlightenment/EFL Wayland teams, but anybody who isn’t paying close attention to the Wayland community is out of the loop. This needs to change.

Ironically, as the main Wayland developer for GNOME, I haven’t talked too much about the Wayland protocol. My only two posts on Wayland were a user post about the exciting new features, and one about the legacy X11 backwards compatibility mode, XWayland.

Let’s start with a crash course in Wayland protocols.

wl_surface

As odd as it sounds, Wayland doesn’t have a built-in way to get something like a desktop window system, with draggable, resizable windows. As a next-generation display server, Wayland’s protocol is meant to be a bit more generic than that. Wayland can already be found on mobile devices as part of SailfishOS through the hard work of Jolla and other companies. Engineers at Toyota and Jaguar/Land Rover use Wayland for media centers in cars, as part of a custom Linux distribution called GENIVI. I’m also told that LG’s webOS as used in its smart TVs are investigating Wayland for a display server as well. Dragging and resizing tiny windows from on a phone, or inside a car, or on a TV just isn’t going to be a great experience. Wayland was designed, from the start, to be flexible enough to support a wide variety of use cases.

However, that doesn’t mean that Wayland is all custom protocols: there’s a common denominator between all of these cases. Wayland has a core protocol object called a wl_surface on which clients can show some pixels for output, and retrieve various kinds of input on. This is similar to the concept of X11’s “windows”, which I explain in Xplain. However, the wl_surface isn’t simply a subregion of the overall front buffer. Instead of owning parts of the screen, Wayland clients instead create their own pixel buffers, draw to them, and then “attach” them to the wl_surface, causing a new pixel buffer to be displayed. The wl_surface concept is fairly versatile, and is used any time we need a “live surface” to play around with. For instance, the mouse cursor is done simply by providing the Wayland compositor with a wl_surface. The same thing is done for drag-and-drop icons as well.

An interesting aside is that the model taken by Wayland with wl_surface can actually require less copies and be more efficient than X11 with modern systems. More and more GPUs have more interesting and fancy hardware at scanout time. With the rise of low-power phones that require rich graphics, we’re seeing a resurgence in fixed-function alpha blending and compositing hardware when doing scanout, similar to what game consoles like the NES and SNES had (but they called them “sprites“). X11’s model of a giant front buffer that apps draw to means that we must copy all contents to the front buffer eventually from the CPU, while Wayland’s model means that applications can simply hand us their pixel buffers, and we can choose to show it as an overlay, which removes any copy. And if an application is full-screen, we can simply tell the GPU to scan out from that application’s buffer directly, instead of having to copy.

xdg-shell

OK, so I’ve talked about wl_surface. How does this relate to xdg-shell? Since a wl_surface can be used for lots of different purposes, like cursors, simply creating the wl_surface and attaching a buffer doesn’t put it on the screen. Instead, first, we need to let the Wayland compositor know that this wl_surface is intended to be a desktop-style window that can be dragged and resized around. It should appear in Alt-Tab, and clicking on it should give it keyboard focus, etc.

Wayland’s approach here is a bit odd, but to give a wl_surface a role, we construct a new wrapper object which has all of our desktop-level protocol functions, and then hand it the wl_surface. In this case, the protocol that we use to create this role is known as “xdg-shell”, and the wrapper object is known as an “xdg_surface”. The name is a reference to the FreeDesktop Group, an open mailing list where cross-desktop standards are discussed between all the different desktops. For historical reasons, it’s abbreviated as “XDG”. Members from the XDG community have all been contributing to xdg-shell.

The approach of a low-level structure with a high-level role is actually fairly similar to the approach taken in X11. X11 simply provides a data structure called a “window”, as I explained in Xplain: a tool that you can use to construct your interface by pushing pixels here, and getting input there. An external process called a “window manager” turns this window from a simple region of the front buffer into a window with a title and icon that the user can move around, resize, minimize and maximize with keyboard shortcuts and a taskbar. The window manager and the client applications both agree to cooperate and follow a series of complex standards like the ICCCM and EWMH that allow you to provide this “role”. Though I’ve never actually worked on any environments other than traditional desktops, I’d imagine that in more special-case environments, different protocols are used instead, and the concept of the window manager is completely dropped.

X11 has no easy, simple way of creating protocol extensions. Something as simple as a new request or a new event requires a bunch of byte-marshalling code in client libraries, extra support code added in the server, in toolkits, and a new set of APIs. It’s a pain, trust me. Instead, X11 does provide a generic way to send events to clients, and a series of key/value pairs on windows called “properties”, so standards like these often use the generic mechanisms rather than building an actual new protocol, since the effort is better spent elsewhere. It’s an unfortunate way that X11 was developed.

Wayland makes it remarkably easy to create a new protocol extension involving new objects and custom methods. You write up a simple XML description of your protocol, and an automatic tool, wayland-scanner, generates server-side and client-side marshalling code for you. All that you need to do now is write the implementation side of things. On the client, that means creating the object and calling methods on it. Because it’s so easy to write custom extensions in Wayland, we haven’t even bothered creating a generic property or event mechanism. Things with a structure allow us a lot more stability and rigidity.

wl_shell_surface

Long-time users or developers of Wayland might notice this sounds similar to an older protocol known as wl_shell or wl_shell_surface. The intuition is correct: xdg-shell is a direct replacement for wl_shell. wl_shell_surface had a number of frustrating limitations, and due to its inclusion in the Wayland 1.0 core, it is harder to change and make better. As Fred Brooks told us, “write one to throw away”.

xdg-shell can be seen as a replacement for wl_shell_surface, and it solves a number of fundamental issues and race conditions which I’d prefer not to go into here (but if you ask nicely in the comments, I might oblige!), but I guess you’ll have to trust me when I say that they were highly visible user bugs and frustrations about some weird lagginess or flickering when using Wayland. We’re happy that these are gone.

A call to arms

The last remaining ticket item I have to work in xdg-shell is related to “window geometry”, as a way of communicating where the user’s concept of the edge of the window is. This requires significant reworking of the code in weston, mutter, and GTK+. After that, it will serve the needs of GTK+ and GNOME perfectly.

Does it serve the needs of your desktop? The last thing we want to do is to solidify the xdg-shell protocol, only to find that a few months later, it doesn’t quite work right for tiling WMs or for EFL or KDE applications. We’re always experimenting with things like this to make sure everything can work, but really, we’re only so many people, and others testing it out and porting their toolkits and compositors over can’t ever hurt.

So, any and all help is appreicated! If you have any feedback on xdg-shell so far, or need any help understanding this, feel free to post about it to the Wayland mailing list or poke me on IRC (#wayland on Freenode, my nick in there is Jasper).

As always, if anybody has any questions, no matter how dumb or stupid, please let me know! Comments are open, and I always try to reply.

Xplain: Advanced Window Techniques

Posted on May 14, 2014 by Jasper St. Pierre

Hey! I promised I’d try to blog once a month, and it’s getting overdue now. I just went ahead and published a new article in the Xplain series: Advanced Window Techniques.

It contains information on the window tree, and its (initial) use in building complex systems. I wanted to write more about window managers and wrap up the window tree before moving onto other stuff, but this article was getting a bit long already, so I just cut it in two. I know a lot of people wanted to see the WM part of things, and I’m sorry. I’ll try and get the next article out by the end of June.

As before, I’d prefer all feedback to be emailed to me directly at my email, jstpierre@mecheye.net, but I’ll leave comments enabled this blog post as well. You guys had some excellent feedback on the first article, so a big thanks goes out to everyone who asked questions, sent in typo fixes, and told me they loved it! You guys rock!

Xwayland

Posted on April 6, 2014 by Jasper St. Pierre

Last week I wrote about Wayland in 3.12 and promised that I’d be writing again soon. I honestly didn’t expect it to be so soon!

But first, a quick notice. Some people let me know that they were having issues with running Wayland on top of F20 with the GNOME 3.12 COPR. I’ve been testing on rawhide, and since it worked fine for me, I thought the same would be true for the GNOME 3.12 COPR. It seems this isn’t the case. I tried last night to get a system to test with and failed. I’m going to continue to investigate, but I first have to get a system up and running to test with. That may take some time.

Sorry that this happened. I know about it, though, and I’ll get to the bottom of this one way or another. And hey, maybe it will be magically solved by…

A new Xwayland

Last night something very, very exciting happened. Xwayland landed in the X server. I’m super thrilled to see this land; I honestly thought it would be at least another year before we’d see it upstream. Keep in mind, it’s been the works for three years now.

So, why did it succeed so fast? To put it simply, Xwayland has been completely rearchitected to be leaner, cleaner, faster, and better than ever before. It’s not done yet; direct rendering (e.g. games using OpenGL) and by extension 2D acceleration aren’t supported yet, but it’s in the pipeline.

I also talked about this somewhat in the last blog post, and in The Linux Graphics Stack, but since it’s the result of a fairly recent development, let’s dive in.

The new architecture

Traditionally, in the Xorg stack, even within the FOSS graphics stack, there are a number of different moving parts.

The X server codebase is large, but it’s somewhat decently structured. It houses several different X servers for different purposes. The one you’re used to and is the one you log into is Xorg, and it’s in the hw/xfree86 directory. It’s named like that for legacy reasons. There’s also Xnest and Xephyr, which implement a nested testing environment. Then there’s the platform adaptions like hw/xwin and xquartz, which are Win32 and OS X servers which are designed to be seamless: the X11 windows that are popped up look and behave like any other window on your system.

There’s plenty of code that can be shared across all the different servers. If somebody presses a key on their keyboard, the code to calculate the key press event, do keysym translation, and then send it to the right application should be shared between all the different servers. And it is. This code is shared in a part of the source tree called Device-Independent X, or “DIX” for short. A lot of common functionality related to implementing the protocol are done in here.

The different servers, conversely, are named “Device-Dependent X”s, or “DDX”en, and that’s what the hw/ directory path means. They hook into the DIX layer by installing various function pointers in different structs, and exporting various public functions. The architecture isn’t 100% clean; there’s mistakes here and there, but for a codebase that’s over 30 years old, it’s fairly modern.

Since the Xorg server is what most users have been running on, it’s the biggest and most active DDX codebase by far. It has a large module system to have hardware-specific video and input drivers loaded into it. Input is a whole other topic, so let’s just talk about video drivers today. These video drivers have names like xf86-video-intel, and plug directly into Xorg in the same way: they install function pointers in various structs that override default functionality with something hardware-specific.

(Sometimes we call the xf86- drivers themselves the “DDX”en. Technically, these are the parts of the Xorg codebase that actually deal with device-dependent things. But really, the nomenclature is just there as a shorthand. Most of us work on the Xorg server, not in Xwin, so we say “DDX” instead of “xf86 video driver”, because we’re lazy. To be correct, though, the DDX is the server binary, e.g. Xorg, and its corresponding directory, e.g. hw/xfree86.)

What do these video drivers actually do? They have two main responsibilities: managing modesetting and doing accelerated rendering.

Modesetting is the responsibility of setting the buffer to display on every monitor. This is one of those things that you would think would be simple and standardized quite a long time ago, but for a few reasons that never happened. The only two standards here are the VESA BIOS Extensions, and its replacement, the UEFI Graphics Output Protocol. Unfortunately, both of these aren’t powerful enough for the features we need to build a competitive display server, like an event for when the monitor has vblanked, or flexible support for hardware overlays. Instead, we have a set of hardware-specific implementations in the kernel, along with a userspace API. This is known as “KMS”.

The first responsibility has now been killed. We can simply use KMS as a hardware-independent modesetting API. It isn’t perfect, of course, but it’s usable. This is what the xf86-video-modesetting driver does, for instance, and you can get a somewhat-credible X server and running that way.

So now we have a pixel buffer being displayed on the monitor. How do we get the pixels into the pixel buffer? While we could do this with software rendering with a library like pixman or cairo, it’s a lot better if we can use the GPU to its fullest extent.

Unfortunately, there’s no industry standard API for accelerated 2D graphics, and there likely never will be. There’s plenty of options: in the web stack we have Flash, CSS, SVG, VML, <canvas>, PostScript, PDF. On the desktop side we have GDI, Direct2D, Quartz 2D, cairo, Skia, AGG, and plenty more. The one attempt to have a hardware-accelerated 2D rendering standard is OpenVG, which ended in disaster. NVIDIA is pushing a more flexible approach that is integrated with 3D geometry better: NV_path_rendering.

Because of the lack of an industry standard, we created our own: the X RENDER extension. It supplies a set of high-level 2D rendering operations to applications. Video drivers often implement these with hardware fast paths. Whenever you hear talk about EXA, UXA or SNA, this is all they’re talking about: complex, sophisticated implementations of RENDER.

As we get newer and newer hardware up and running under Linux, and as CPUs are getting faster and faster, it’s getting less important to write a fast RENDER implementation in your custom video driver.

We also do have an industry standard for generic hardware-accelerated rendering: OpenGL. Wouldn’t it be nice if we could take the hardware-accelerated OpenGL stack we created, and use that to create a credible RENDER implementation? And that’s exactly what the glamor project is about: an accelerated RENDER implementation that works for any piece of hardware, simply by hoisting it on top of OpenGL.

So now, the two responsibilities of an X video driver have been moved to other places in the stack. Modesetting has been moved to the kernel with KMS. Accelerated 2D rendering has been pushed to use the 3D stack we already have in place anyway. Both of these are reusable components that don’t need custom drivers, and we can reuse in Wayland and Xwayland. And that’s exactly what we’re going to do.

So, let’s add another DDX to our list. We arrive at hw/xwayland. This acts like Xwin or Xquartz by proxying all windows through the Wayland protocol. It’s almost impressive how small the code is now. Seriously, compare that to hw/xfree86.

It’s also faster and leaner. A large part of Xorg is stuff related to display servers: reading events from raw devices, modesetting, VT switching and handling. The old Xwayland played tricks with the Xorg codebase to try and get it to stop doing those things. Now we have a simple, clean way to get it to stop doing those things: to never run the code in the first place!

In the old model, things like modesetting were done in the video driver, unfortunately. In the old model, we simply patched Xorg with a special magical mode to tell video drivers not to do anything too tricky. For instance, the xf86-video-intel driver had a special branch for Xwayland support. For generic hardware support, we wrote a generic, unaccelerated driver that stubbed out most of the functions we needed. With the new approach, we don’t need to patch anything at all.

Unfortunately, there are some gaps in this plan. James Jones from NVIDIA recently let us know they were expecting to use their video driver in Xwayland for backwards-compatibility with legacy applications. A few of us had a private chat afterwards about how we can move along with here. We’re still forming a plan, and I promise I’ll tell you guys about them when they’re more solidified.It’s exciting to hear that NVIDIA is on board!

And while I can’t imagine that custom xf86-video-* drivers are ever going to go away completely, I think it’s plausible that the xf86-video-modesetting video driver could add glamor support, and the rest of the FOSS DDXes die out in favor.

OK, so what does this mean for me as a user?

The new version of Xwayland is hardware-independent. In Fedora, we only built xf86-video-intel with Wayland support. While there was a generic video driver, xf86-video-wayland, we never built it in Fedora, and that meant that you couldn’t try out Wayland on non-Intel GPUs. This was a Fedora bug, not a fundamental issue with Wayland or a GNOME bug as I’ve seen some try to claim.

It is true, however, that we mostly test on Intel graphics. Most engineers I know of that develop GNOME run on Lenovo ThinkPads, and those tend to have Intel chips inside.

Now, this is all fixed, and Xwayland can work on all hardware regardless of which video drivers are built or installed. And for the record, the xf86-video-wayland driver is now considered legacy. We hope to ship these as updates in the F20 COPR, but we’re still working out the logistics of packaging all this.

I’m still working hard on Wayland support everywhere I go, and I’m not going to slow down. Questions, comments, everything welcome. I hope the next update can come as quickly!

Wayland in 3.12, and beyond

Posted on March 29, 2014 by Jasper St. Pierre

First, an apology. At the end of the last post introducing Xplain, I promised I’d have a new article out once a month. I had a lot of fun working on it, but Xplain has taken a backseat to real work. It’s on one of those “indefinite hiatuses” you hear about which are really just codenames for “eh, I should finish it, but I’ve got so much to do and staring quizzingly at the XKB protocol specification is not the best use of my spare time. I’m just going to let it die.” Yeah…

Sorry.

Ahem. “Real work?”

Wayland

For most of GNOME’s development, I’ve mainly been working on GNOME Shell and mutter, our desktop shell and window manager systems. I’ve always had somewhat of an interest in Wayland and graphics plumbing. So when we started to ramp up our Wayland efforts in GNOME, I jumped at the opportunity to work on it.

Over the course of the past nine months, I’ve been busy full-time on our Wayland support in GNOME. I haven’t spoken about Wayland much except in footnotes of Xplain and The Linux Graphics Stack. I’ve mostly tried to stay out of the political battles surrounding display server technologies and instead tried to focus on working hard building something awesome.

So that’s why I’m really excited about GNOME 3.12. We built something awesome.

For the first time, we now have an easily accessible Wayland session for testing. And now that I’ve broken my vow of silence, I’m going to try to blog a lot more about the progress we’ve made for Wayland in GNOME.

Sounds exciting! Now how do I test it‽

You want to try out Wayland? Awesome! Note, though, that it’s still very a preview, so lots of stuff won’t work. That said, lots of extensive testing is the only way we’re going to be on top of all the bugs and all the crashers. Your help testing and filing bugs will make GNOME on Wayland a success!

Wayland session support has been added to gdm. Once you have a distribution that ships GNOME 3.12, it should be fairly simple to try a Wayland session by simply selecting the new GNOME on Wayland option at the login screen.

Wayland GDM Integration

It’s supposed to behave mostly like a regular GNOME session! If there’s any crashes, hangs, rendering artifacts, behavior differences, missing features, etc. you encouter, please, please file a bug. I personally look at every single piece of bugmail I get. Even if it seems obvious and dumb, file it. It takes us two seconds to mark a bug as a duplicate, so don’t feel bad about reporting too many bugs.

You might notice that the transition from gdm to Wayland isn’t perfect: the screen goes black, and there’s a blinking cursor for a few seconds when you log in. This is, at least for now, intentional. With the Linux APIs we’re given, at least as far as I’m aware, there isn’t a way to prevent this console showing up without also locking up your machine entirely if something goes wrong and nothing starts: the switch that controls whether the console should be shown is also the same switch that prevents the default VT switching keybindings from working.

For 3.14, we’ll be using logind to do session switching, so hopefully a lot of these issues we’re having with VTs will just go away.

But I tried Wayland during 3.10 and it crashed, burned my house down, and killed my pet Bengal tiger Leonard! And there’s no remote display support! Wayland sucks!

If GNOME 3.10’s Wayland preview was crashing for you beforehand, please give it another shot. I worked very hard making the Wayland port in GNOME 3.12 fairly stable, and I’ve used it for half a day without being able to make it crash. Obviously, I’m not perfect, so if your session goes down or does something funny, again, please file a bug and I promise I’ll take care of it.

Also, there is remote display support.

What we’ve done

There’s been a full six months of effort in between GNOME 3.10 and GNOME 3.12 in order to make it polished and complete. There really isn’t a major list of features I can point you to as a reason Wayland is better in 3.12. Here’s an incomplete list of things we’ve done, though:

We’ve been working on a more flexible, desktop-specific Wayland protocol, xdg-shell, built to replace the old wl_shell protocol. This new protocol provides atomic state change events, so that you don’t ever see a flicker when maximizing or full-screening a window. This protocol is still under heavy development, and we have plenty of ideas about how to make it better and more featureful than ever. We hope to have a first version of it stabilized for other application developers to port to midway through the GNOME 3.14 release cycle.

We’ve pushed ahead on GTK+’s Wayland backend to make it stronger than ever. It’s been surprisingly smooth: Wayland really is a fairly well-designed protocol that’s a good fit for a modern application toolkit.

Plenty of small bugs since 3.10 have been fixed: windows no longer flicker when they resize. Typing and input method support has gotten a lot stronger. Xwayland windows no longer hang or freeze when you drag them around. There’s no longer an “invisible box” around windows when you drag them around; they line up to the edges of the screen. Keybindings now work. Menus finally stay on top and don’t get occasionally stuck. Submenus are placed in the right spot. I could go on. It’s all the little things that account for 90% of the frustration and crashes.

Why Wayland matters

Our dedication towards Wayland has pushed us to build a cleaner architecture overall. What used to be a proliferation of X-specific video and input drivers is mostly culminating in centralized, standardized code. For input, we have libinput, which we’re using from Weston, mutter, and Xorg as well. What used to be a collection of chipset-specific video plugins for doing accelerated rendering have now been replaced by glamor, a credible chipset-independent acceleration architecture. What used to be large monolithic components heavily tied to Xorg and the Xorg input and video architectures have now been split out into separate, easily-reusable libraries with separate, easily-maintainable codebases. New, experimental features can be prototyped faster than ever before.

David Herrmann’s work on logind session support mentioned above was originally envisioned for Wayland, but it turned out to be such a great idea that my colleague Hans de Goede ran with the idea and integrated them into the X server, allowing us to run the X server without root privileges, a huge win for security.

Our work on GTK+ is also very much applicable to Win32, Quartz, and Broadway backends as well. A lot of the bugs that we’ve had to fix were bugs on other platforms. GTK+, somewhat unfortunately, was primarily developed as an X11 toolkit and that means our APIs often match up to X11, rather than a more modern system. When redesigning our APIs to support Wayland, we often find that X11 is the special case, and the new code is more portable and flexible than before.

The spoils of Wayland reach far and wide.

What we’re doing now

It can’t stop here.

For 3.14, we’re going to continue with Wayland development, and we’re all hoping to give a solid Wayland session that can be used for real work. No promises; six months is not a lot of time, and we have plenty to get done. Your testing will make it go faster, so if you get the opportunity, please start finding those bugs!

GTK+ is still missing lots of features in the Wayland backend. Most notably, we’re still lagging behind on drag-and-drop and clipboard support. Unfortunately, the APIs we currently provide to developers for these things are old and rusty and aren’t that great. A set of new APIs to make clipboard and drag-and-drop support more sane in GTK+ has been in the works since 2011, and Wayland is a great motivation to finish it off.

mutter’s codebase is almost 20 years at this point, and it’s somewhat surprising that it still has the same code structure as it did when Havoc Pennington started it as an X11 window manager. We’ve bolted additional features and functionality on: an X11 compositor here, a display server there, a Wayland protocol implementation sure why not, a nested testing mode, it goes on. I’ve already started hacking on a large-scale refactoring of mutter, “NWO”, that will restructure the entire program to help fix small nits in behavior and make it easier to add new features.

Right now, my current project and goal is that by the end of April, I’m hoping to ship mutter as a Wayland compositor without any X11 code running at all.

It’s pretty exciting stuff! I’m really looking forward to sharing more and more of our progress over the next few months. I’ll heading to San Fransisco in a few weeks for the West Coast Hackfest where I expect a lot of Wayland hacking will be done. Feel free to stop by if you’re in the area; we’d love to see you there!

As always, questions about our plans, or Wayland tech / Linux graphics tech in general are welcome. Feel free to leave a comment.

Also, excuse me while I get back to watching The Genius. You should go watch that too.

Xplain

Posted on December 28, 2013 by Jasper St. Pierre

Merry Christmas, folks! I hope everything is well for you. It’s been a while, hasn’t it…

So, last year, you might remember that I wrote The Linux Graphics Stack. I’ve since wanted to do a followup for that, as more of a detailed history lesson on X11, but there’s a lot to explain. I’ve been working on it off and on since January, and I got burned out on it.

Last month, I was thinking about finally publishing it, so I opened it up, and read the words “last year, I wrote an article on The Linux Graphics Stack”. Obviously, I needed to publish it before it went out of date, and that was enough motivation to just get it done. I’ve been working hard on this for the last month, doing research, coding demos, and writing prose. It’s a lot of fun, and quite rewarding when you sit down and write a bunch of code to let the user drag windows around in your mock-Xlib equivalent, and it works first-try.

As of now, I only have a page or so of text about very basic window behavior, but my eventual plan is to go over the window tree, how WMs work, COMPOSITE, RENDER, focus management, input grabs, selections, everything.

Because I wrote interactive demos so that people can play around, I need to have a bit more control over the layout and content, so I chose not to do this as a series of blog posts, but instead as a series of articles that I’m hosting these over on GitHub. Don’t worry about missing anything, though. Whenever I publish a new article, I’ll write a blog post about it over here.

So, without further ado:

(For those of you on RSS readers that strip images, the link to the articles are this way)

If you have any questions, I’d prefer if you email me at jstpierre@mecheye.net. I’d really like a margin comment system like Medium or Gawker have, but I couldn’t find a free service like Disqus that integrates into my articles like that. If anybody knows of any, please shoot me an email and I’ll try to investigate it.

I plan on releasing at least one of these a month. I’m currently on track to finishing the next article for the third week of January, about the window tree and how WMs work. If you have any ideas for what you’d like me to write about for February’s article, again, please shoot me an email. I guess I should also take this time to announce that I’ve spent some time to create a work-oriented Twitter for the GNOME/Linux audience over at @JasperGNOME.

Let’s hope I write more blog posts in the coming year. Besides this, I plan on doing a lot more regular blogging about new GNOME features, and Wayland progress (oh yeah, I should talk about that at some point, shouldn’t I?). See you around!

Barriers

Posted on March 19, 2013 by Jasper St. Pierre

GNOME 3.8 just entered hard code freeze, and this is a release I’m very optimistic about. A little while ago, we all went to some place in Brussels to talk about GNOME, the GNOME Developer Experience, and where GNOME would be going forward. This release sets the stage for everything exciting to come in the near future, like new developer tools (some of which I hacked on), a newer and better Glade, etc. If you’ve tried writing a GNOME application before and was frustrated, we want your feedback. File bugs in Bugzilla, talk to people on IRC, and email the GNOME mailing lists. Please believe me when I say I sympathize with every one of you.

Anyway. I’m also excited because GNOME 3.8 is great for users, too. There’s lots of exciting new features. Mattias Clasen already blogged about some of them over there, but there’s one specifically that I want to mention, well, because I wrote it, and I’m proud of it.

Pressure Barriers!

If you’ve used GNOME 3.6, you may have been annoyed by the message tray behavior: it doesn’t pop up when you want it to, and pops up when you don’t want it to. I always felt that this hurt GNOME 3.6 more than it helped, and there are a large number of bugs related to changing the timeout, or disabling it altogether. The truth is, this was never the plan. It was a last-minute compromise that came about because we couldn’t implement the design we wanted, because it required changes to the X server.

In GNOME 3.8, we’ve implemented those changes, upstream, in the X server, so any other desktop environment and program can use them as well. Now, to open the message tray, you simply need to press against the bottom of the screen for a bit of time to open the message tray. The Android ripoff visualizer bar is something I added for the screenshot, but I’m looking into packaging it as a GNOME Shell Extension — it’s a fair bit noisy, so we’re not going to include it by default.

It was a fun project, and I learned a lot in the process writing code for the X server. Peter Hutterer has a blog post about the technical details over here. Big thanks to Peter for helping me out every step of the way, along with Christopher Halse Rogers of Canonical for doing a set of prototype patches used in Unity.

For those of you who read this blog for tech articles like The Linux Graphics Stack, I’m sorry for a boring GNOME update. I do have a giant article planned, though. I don’t want to reveal my entire hand yet, though.

Oh, and thanks to Red Hat and the GNOME Foundation for sponsoring me to go to the DX hackfest. You’re all pretty awesome people.

Update: Made the visualizer into an extension over here.

Bytecode

Posted on December 9, 2012 by Jasper St. Pierre

What is the most commonly used bytecode language in the world? Java (JVM Bytecode)? .NET (CLI)? Flash (AVM1/AVM2)? Nope. There’s a few that you use every day, simply by turning on your computer, or tablet, or even phone. You don’t even have to start an application or visit a webpage.

ACPI

The most obvious is the large, gargantuan specification known as “ACPI”. The “Advanced Configuration and Power Interface” specification lives up to its name, with the most recent specification being a mammoth document that weighs in at almost 1000 pages. And yes, operating systems are expected to implement this. The entire thing. The bytecode part is hidden deep, but it’s seen in chapter 20, under “ACPI Machine Language”, describing a semi-register VM with all the usuals: Add, Subtract, Multiply, Divide, standard inequalities and equalities, but then throws in other fun things like ToHexString and Mid (substring). Look even further and you’ll see a full object model, system properties, as well as an asynchronous signal mechanism so that devices are notified about when those system properties change.

Most devices, of course, have a requirement of nothing less than a full implementation of ACPI, so of course all this code is implemented in your kernel, running at early boot. It parallels the complexity of a full JavaScript environment with its type system and system bindings, with the program code supplied directly over the wire from any device you plug in. Because the specification is so complex, an OS-independent reference implementation was created by Intel, and this is the implementation that’s used in the Linux kernel, the BSDs (including Mac OS X), and the fun toy ReactOS, HaikuOS kernels. I don’t know if it’s used by Windows or not. Since the specification’s got Microsoft’s name on it, I assume their implementation was created long before ACPICA.

Fonts

After that, want to have a graphical boot loader? Simply rendering an OpenType font (well, only OpenType fonts with CFF glyphs, but the complexities of the OpenType font format is a subject for another day) requires parsing the Type 2 Glyph Format, which indeed involves a custom bytecode format to establish glyphs. This one’s even more interesting: it’s a real stack-based interpreter, and it even has a “random” opcode to make random glyphs at runtime. I can’t imagine this ever be useful, but it’s there, and it’s implemented by FreeType, so I can only assume it’s used by some fonts from in the real world. This bytecode interpreter also contained at one time a stack overflow vulnerability which was what jailbroke the iPhone in JailbreakMe.com v2.0, with the OTF file being loaded by Apple’s custom PDF viewer.

This glyph language is based on and is a stripped down version of PostScript. Actual PostScript involves a full turing-complete register/stack-based hybrid virtual machine based on Forth. The drawbacks of this system (looping forever, interpreting the entire script to draw a specific page because of complex state) were the major motivations for the PDF format — while based on PostScript, it doesn’t have much shared document state, and doesn’t allow any arbitrary flow control operations. In this model, someone (even an automated program) could easily verify that a graphic was encapsulated, not doing different things depending on input, and that it terminated at some point.

And, of course, since fonts are complicated, and OpenType is complicated, OpenType also includes all of TrueType, which includes a bytecode-based hinting model to ensure that your fonts look correct at all resolutions. I won’t ramble on about it, but here’s the FreeType implementation. I don’t know of anything interesting happening to this. Seems there was a CVE for it at one time.

To get this article to display on screen, it’s very likely that thousands of these tiny little microprograms ran, once for each glyph shape in each font.

Packet filtering

Further on, if you want to capture a network packet with tcpdump or libpcap (or one of its users like Wireshark), it’s being filtered through the Berkeley Packet Filter, a custom register-based bytecode. The performance impact of this at one time was too large for people debugging network issues, so a simple JIT compiler was put into the kernel, under an experimental sysctl flag.

As a piece of historical interest, an earlier version of the BPF code was part of the code claimed to be infringing part of the SCO lawsuits (page 15), but was actually part of BSD4.3 code that was copied to the Linux kernel. The original BSD code was eventually replaced with the current architecture, known as the Linux Socket Filter, in Linux 2.2 (which I can’t easily link to, as there’s no public repository of the Linux kernel code with pre-git history, as far as I know).

What about it?

The popularity of bytecode as a general and flexible solution to problems is alluring, but it’s not without its complexities and faults, with such major security implications (an entire iPhone jailbreak from incorrect stack overflow checking!) and insane implementation requirements (so much that we only have one major implementation of ACPI used across all OSes that we can check).

The four examples also bring out something interesting: the wildly different approaches that can be taken to a bytecode language. In the case of ACPI, it’s an interesting take on what I can only imagine is scope creep on an originally declarative table specification, bringing it to the mess today. The Type 1 Glyph and TrueType Hinting languages are basic stack-based interpreters, showing their PostScript heritage. And BPF is a register-based interpreter, which ends up with a relatively odd register-based language that can really only do simple operations.

Note, though, that all of these implementations above have had security issues in their implementations, with numerous CVEs for each one, because bytecode interpreter implementations are hard to get right. So, to other hackers: do you know of any other low-level, esoteric custom bytecode specifications like these? And to spec writers: did you really need that flexibility?

The Linux Graphics Stack

Posted on June 16, 2012 by Jasper St. Pierre

137

This is an introductory overview post for the Linux Graphics Stack, and how it currently all fits together. I initially wrote it for myself after having conversations with people like Owen Taylor, Ray Strode and Adam Jackson about this stack. I had to go back to them every month or so and learn the stuff from the ground up all over again, as I had forgotten every single piece. I asked them for a good high-level overview document so I could stop bothering them. They didn’t know of any. I started this one. It has been reviewed by Adam Jackson and David Airlie, both of whom work on this exact stack.

Also, I want to point out that a large amount of this stack applies only to the free software drivers. That means that a lot of what you read here may not apply for the AMD Catalyst and NVIDIA proprietary drivers. They may have their own implementations of OpenGL, or an internal fork of Mesa. I’m describing the stack that comes with the free radeon, nouveau and Intel drivers.

If you have any questions, or if at any point things were unclear, or if I’m horribly, horribly wrong about something, or if I just botched a sentence so that it’s incomprehensible, please ask me or let me know in the comments section below.

To start us off, I’m going to paste the entire big stack right here, to let you get a broad overview of where every piece fits in the stack. If you do not understand this right away, don’t be scared. Feel free to refer to this throughout the post. Here’s a handy link.

So, to be precise, there are two different paths, depending on the type of rendering you’re doing.

3D rendering with OpenGL

Your program starts up, using “OpenGL” to draw.
A library, “Mesa”, implements the OpenGL API. It uses card-specific drivers to translate the API into a hardware-specific form. If the driver uses Gallium internally, there’s a shared component that turns the OpenGL API into a common intermediate representation, TGSI. The API is passed through Gallium, and all the device-specific driver does is translate from TGSI into hardware commands,
libdrm uses special secret card-specific ioctls to talk to the Linux kernel
The Linux kernel, having special permissions, can allocate memory on and for the card.
Back out at the Mesa level, Mesa uses DRI2 to talk to Xorg to make sure that buffer flips and window positions, etc. are synchronized.

2D rendering with cairo

Your program starts up, using cairo to draw.
You draw some circles using a gradient. cairo decomposes the circles into trapezoids, and sends these trapezoids and gradients to the X server using the XRender extension. In the case where the X server doesn’t support the XRender extension, cairo draws locally using libpixman, and uses some other method to send the rendered pixmap to the X server.
The X server acknowledges the XRender request. Xorg can use multiple specialized drivers to do the drawing.
1. In a software fallback case, or in the case the graphics driver isn’t up to the task, Xorg will use pixman to do the actual drawing, similar to how cairo does it in its case.
2. In a hardware-accelerated case, the Xorg driver will speak libdrm to the kernel, and send textures and commands to the card in the same way.

As to how Xorg gets things on the screen, Xorg itself will set up a framebuffer to draw into using KMS and card-specific drivers.

X Window System, X11, Xlib, Xorg

X11 isn’t just related to graphics; it has an event delivery system, the concept of properties attached to windows, and more. Lots of other non-graphical things are built on top of it (clipboard, drag and drop). Only listed in here for completeness, and as an introduction. I’ll try and post about the entire X Window System, X11 and all its strange design decisions later.

X11: The wire protocol used by the X Window System
Xlib: The reference implementation of the client side of the system, and a host of tons of other utilities to manage windows on the X Window System. Used by toolkits with support for X, like GTK+ and Qt. Vanishingly rare to see in applications today.
XCB: Sometimes quoted as an alternative to Xlib. It implements a large amount of the X11 protocol. Its API is at a much lower level than Xlib, such that Xlib is actually built on XCB nowadays. Only mentioning it because it’s another acronym.
Xorg: The reference implementation of the server side of the system.

I’ll try to be very careful to label something correctly. If I say “the X server”, I’m talking about a generic X server: it may be Xorg, it may be Apple’s X server implementation, it may be Kdrive. We don’t know. If I say “X11”, or the “X Window System”, I’m talking about the design of the protocol or system overall. If I say “Xorg”, that means that it’s an implementation detail of Xorg, which is the most used X server, and may not apply to any other X servers at all. If I ever say “X” by itself, it’s a bug.

X11, the protocol, was designed to be extensible, which means that support for new features can be added without creating a new protocol and breaking existing old clients. As an example, xeyes and oclock get their fancy shapes because of the Shape Extension, which provide support for non-rectangular windows. If you’re curious how this magic functionality can just appear out of nowhere, the answer is that it doesn’t: support for the extension has to be added to both the server and client before it can be used. There is functionality in the core protocol itself so that clients can ask the server what extensions it has support for, so it knows what functionality it can or cannot use.

X11 was also designed to be “network transparent”. Most importantly it means that we cannot rely on the X server and any X client being on the same machine, so any talking between the two must go over the network. In actuality, a modern desktop environment does not work out of the box in this scenario, as a lot of inter-process communication goes through systems other than X11, like DBus. Over the network, it’s quite chatty, and results in a lot of traffic. When the server and client are on the same machine, instead of going over the network, they’ll go over a UNIX socket, and the kernel doesn’t have to copy data around.

We’ll get back to the X Window System and its numerous extensions in a little bit.

cairo

cairo is a drawing library used either by applications like Firefox directly, or through libraries like GTK+, to draw vector shapes. GTK+3’s drawing model is built entirely on cairo. If you’ve ever used an HTML5 <canvas>, cairo implements practically the same API. Although <canvas> was originally developed by Apple, the vector drawing model is well-known, as it’s the PostScript vector drawing model, which has found support in other vector graphics technologies and standards such as PDF, Flash, SVG, Direct2D, Quartz 2D, OpenVG, and lots more than I can possibly give an exhaustive list for.

cairo has support to draw to X11 surfaces through the Xlib backend.

cairo has been used in toolkits like GTK+. Functionality was added in GTK+ 2 to optionally use cairo in GTK+ 2.8. The drawing model in GTK+3 requires cairo.

XRender Extension

X11 has a special extension, XRender, which adds support for anti-aliased drawing primitives (X11’s existing graphics were aliased), gradients, matrix transforms and more. The original intention was that drivers could have specialized accelerated code paths for doing specific drawing. Unfortunately, it seems to be the case that software rasterization is just as fast, for unintuitive reasons. Oh well. XRender deals in aligned trapezoids – rectangles with an optional slant to the left and right edges. Carl Worth and Keith Packard came up with a “fast” software rasterization method for trapezoids. Trapezoids are also super easy to decompose into two triangles, to align ourselves with fast hardware rendering. Cairo includes a wonderful show-traps utility that might give you a bit of insight into the tessellation into trapezoids it does.

Here’s a simple red circle we drew. This is decomposed into two sets of trapezoids – one for the stroke, and one for the fill. Since the diagrams that show-traps gives you by default aren’t very enlightening, I hacked up the utility to give each trapezoid a unique color. Here’s the set of trapezoids for the black stroke.

Psychedelic.

pixman

Both the X server and cairo need to do pixel level manipulation at some point. cairo and Xorg had separate implementations of things like basic rasterization algorithms, pixel-level access for certain kinds of buffers (ARGB32, RGB24, RGB565), gradients, matrices, and a lot more. Now, both the X server and cairo share a low-level library called pixman, which handles these things for it. pixman is not supposed to be a public API, nor is it a drawing API. It’s not really any API at all; it’s just a solution to some code duplication between various parts.

OpenGL, Mesa, gallium

Now comes the fun part: modern hardware acceleration. I assume everybody already knows what OpenGL is. It’s not a library, there will never be one set of sources to a libGL.so. Each vendor is supposed to provide its own libGL.so. NVIDIA provides its own implementation of OpenGL and ships its own libGL.so, based on its implementations for Windows and OS X.

If you are running open-source drivers, your libGL.so implementation probably comes from Mesa. Mesa is many things, but one of the major things it provides that it is most famous for is its OpenGL implementation. It is an open-source implementation of the OpenGL API. Mesa itself has multiple backends for which it provides support. It has three CPU-based implementations: swrast (outdated and old, do not use it), softpipe (slow), llvmpipe (potentially fast). Mesa also has hardware-specific drivers. Intel supports Mesa and has built a number of drivers for their chipsets which are shipped inside Mesa. The radeon and nouveau drivers are also supported in Mesa, but are built on a different architecture: gallium.

gallium isn’t really anything magical. It’s a set of components to make implementing drivers a lot easier. The idea is that there are state trackers that implement some form of API (OpenGL, GLSL, Direct3D), transform that state to an intermediate representation (known as Tungsten Graphics Shader Infrastructure, or TGSI), and then the backends take that intermediate representation and convert it to the operations that would be consumed by the hardware itself.

Sadly, the Intel drivers don’t use Gallium. My coworkers tell me it’s because the Intel driver developers do not like having a layer between Mesa and their driver.

A quick aside: more acronyms

Since there’s a lot of confusing acronyms to cover, and I don’t want to give them each an H3 and write a paragraph for each, I’m just going to list them here. Most of them don’t really matter in today’s world, they’re just an easy reference so you don’t get confused.

GLES: OpenGL has several profiles for different form factors. GLES is one of them, standing for “GL Embedded System” or “GL Embedded Subset”, depending on who you ask. It’s the latest approach to target the embedded market. The iPhone supports GLES 2.0.
GLX: OpenGL has no concept of platform and window systems on its own. Thus, bindings are needed to translate between the differences of OpenGL and something like X11. For instance, putting an OpenGL scene inside an X11 window. GLX is this glue.
WGL: See above, but replace “X11” with “Windows”, that is, that one Microsoft Operating System.
EGL: EGL and GLES are often confused. EGL is a new platform-agnostic API, developed by Khronos Group (the same group that develops and standardizes OpenGL) that provides facilities to get an OpenGL scene up and running on a platform. Like OpenGL, this is vendor-implemented; it’s an alternative to bindings like WGL/GLX and not just a library on top of them, like GLUT.
fglrx: fglrx is former name for AMD’s proprietary Xorg OpenGL driver, now known as “Catalyst”. It stands for “FireGL and Radeon for X”. Since it is a proprietary driver, it has its own implementation of libGL.so. I do not know if it is based on Mesa. I’m only mentioning it because it’s sometimes confused with generic technology like AIGLX or GLX, due to the appearance of the letters “GL” and “X”.
DIX, DDX: The X graphics parts of Xorg is made up of two major parts, DIX, the “Driver Independent X” subsystem, and DDX, the “Driver Dependent X” subsystem. When we talk about an Xorg driver, the more technically accurate term is a DDX driver.

Xorg Drivers, DRM, DRI

A little while back I mentioned that Xorg has the ability to do accelerated rendering, based on specific pieces of hardware. I’ll also say that this is not implemented by translating from X11 drawing commands into OpenGL calls. If the drivers are implemented in Mesa land, how can this work without making Xorg dependent on Mesa?

The answer was to create a new piece of infrastructure to be shared between Mesa and Xorg. Mesa implements the OpenGL parts, Xorg implements the X11 drawing parts, and they both convert to a set of card-specific commands. These commands are then uploaded to the kernel, using something called the “Direct Rendering Manager”, or DRM. libdrm uses a set of generic, private ioctls with the kernel to allocate things on the card, and stuffs the commands and textures and things it needs in there. This ioctl interface comes in two forms: Intel’s GEM, and Tungsten Graphics’s TTM. There is no good distinction between them; they both do the same thing, they’re just different competing implementation. Historically, GEM was designed and proudly announced to be a simpler alternative to TTM, but over time, it has quietly grown to about the same complexity as TTM. Welp.

This means that when you run something like glxgears, it loads Mesa. Mesa itself loads libdrm, and that talks to the kernel driver directly using GEM/TTM. Yes, glxgears talks to the kernel driver directly to show you some spinning gears, and sternly remind you of the benchmarking contents of said utility.

If you poke in ls /usr/lib64/libdrm_*, you’ll note that there are hardware-specific drivers. For cases when GEM/TTM aren’t enough, the Mesa and X server drivers will have a set of private ioctls to talk to the kernel, which are encapsulated in here. libdrm itself doesn’t actually load these.

The X server needs to know what’s happening here, though, so it can do things like synchronization. This synchronization between your glxgears, the kernel, and the X server is called DRI, or more accurately, DRI2. “DRI” stands for “Direct Rendering Infrastructure”, but it’s sort of a strange acronym. “DRI” refers to both the project that glued Mesa and Xorg together (introducing DRM and a bunch of the things I talk about in this article), as well as the DRI protocol and library. DRI 1 wasn’t really that good, so we threw it out and replaced it with DRI 2.

KMS

As a sort of aside, let’s say you’re working on a new X server, or maybe you want to show graphics on a VT without using an X server. How do you do it? You have to configure the actual hardware to be able to put up graphics. Inside of libdrm and the kernel, there’s a special subsystem that does exactly that, called KMS, which stands for “Kernel Mode Setting”. Again, through a set of ioctls, it’s possible to set up a graphics mode, map a framebuffer, and so on, to display directly on a TTY. Before, there were (and still are) hardware-specific ioctls, so a shared library called libkms was created to give a shared API. Eventually, there was a new API at the kernel level, literally called the “dumb ioctls”. With the new dumb ioctls in place, it is recommended to use those and not libkms.

While it’s very low-level, it’s entirely possible to do. Plymouth, the boot splash screen integrated into modern distributions, is a good example of a very simple application that does this to set up graphics without relying on an X server.

The “Expose” model, Redirection, TFP, Compositing, AIGLX

I’ve used the term “compositing window manager” without really describing what it means to composite, or what a window manager does. You see, back in the 80s when the X Window System was designed on UNIX systems, a lot of random other companies like HP, Digital Equipment Corp., Sun Microsystems, SGI, were also developing products based on the X Window System. X11 intentionally didn’t mandate any basic policy on how windows were to be controlled, and delegated that responsibility to a separate process called the “window manager”.

As an example, the popular environment at the time, CDE, had a system called “focus follows mouse”, which focused windows when the user moved their mouse over the window itself. This is different from the more popular model that Windows and Mac OS X use by default, “click to focus”.

As window managers started to become more and more complex, documents started to appear describing interoperability between the many different environments. It, too, wouldn’t really mandate any policy either like “Click to Focus” either.

Additionally, back in the 80s, many systems did not have much memory. They could not store the entire pixel contents of all window contents. Windows and X11 solve this issue in the same way: each X11 window should be lossy. Namely, a program will be notified that a window has been “exposed”.

Imagine a set of windows like this. Now let’s say the user drags GIMP away:

The area in the dark gray has been exposed. An ExposeEvent will get sent to the program that owns the window, and it will have to redraw contents. This is why hung programs in versions of Windows or Linux would go blank after you dragged a window over them. Consider the fact that in Windows, the desktop itself is just another program without any special privileges, and can hang like all the others, and you have one hell of a bug report.

So, now that machines have as much memory as they do, we have the opportunity to make X11 windows lossless, by a mechanism called redirection. When we redirect a window, the X server will create backing pixmaps for each window, instead of drawing directly to the backbuffer. This means that the window will be hidden entirely. Something else has to take the opportunity to display the pixels in the memory buffer.

The composite extension allows a compositing window manager, or “compositor”, to set up something called the Composite Overlay Window, or COW. The compositor owns the COW, and can paint to it. When you run Compiz or GNOME Shell, these programs are using OpenGL to display the redirected windows on the screen. The X server can give them the window contents by a GL extension, “Texture from Pixmap“, or TFP. It lets an OpenGL program use an X11 Pixmap as if it were an OpenGL Texture.

Compositing window managers don’t have to use TFP or OpenGL, per se, it’s just the easiest way to do so. They could, if they wanted to, draw the window pixmaps onto the COW normally. I’ve been told that kwin4 uses Qt directly to composite windows.

Compositing window managers grab the pixmap from the X server using TFP, and then render it on to the OpenGL scene at just the right place, giving the illusion that the thing that you think you’re clicking on is the actual X11 window. It may sound silly to describe this as an illusion, but play around with GNOME Shell and adjust the size or position of the window actors (Enter global.get_window_actors().forEach(function(w) { w.scale_x = w.scale_y = 0.5; }) in the looking glass), and you’ll quickly see the illusion break down in front of you, as you realize that when you click, you’re poking straight through the video player, and to the actual window underneath (Change 0.5 to 1.0 in the above snippet to revert the behavior).

With this information, let’s explain one more acronym: AIGLX. AIGLX stands for “Accelerated Indirect GLX”. As X11 is a networked protocol, this means that OpenGL should have to work over the network. When OpenGL is being used over the network, this is called an “indirect context”, versus a “direct context” where things are on the same machine. The network protocol used in an “indirect context” is fairly incomplete and unstable.

To understand the design decision behind AIGLX, you have to understand the problem that it was trying to solve: making compositing window managers like Compiz fast. While NVIDIA’s proprietary driver had kernel-level memory management through a custom interface, the open-source stack at this point hadn’t achieved that yet. Pulling a window texture from the X server into the graphics hardware would have meant that it would have to be copied every time the window was updated. Slow. As such, AIGLX was a temporary hack to implement OpenGL in software, preventing the copy into hardware acceleration. As the scene that compositors like Compiz used wasn’t very complex, it worked well enough.

Despite all the fanfare and Phoronix articles, AIGLX hasn’t been used realistically for a while, as we now have the entire DRI stack which can be used to implement TFP without a copy.

As you can imagine, copying (or more accurately, sampling) the window texture contents so that it can be painted as an OpenGL texture requires copying data. As such, most window managers have a feature to turn redirection off for a window that’s full-screen. It may sound a bit silly to describe this as unredirection, as that’s the initial state a window is in. But while it may be the initial state for a window, with our modern Linux desktops, it’s hardly the usual state. The logic here is that if a window would be covering the COW anyway and compositing features aren’t necessary, it can safely be unredirected. This feature is designed to give high performance to programs like games, which need to run with high performance at 60 frames per second.

Wayland

As you can see, we’ve split out quite a large bit of the original infrastructure from X’s initial monolithic behavior. This isn’t the only place where we’ve tore down the monolithic parts of X: a lot of the input device handling has moved into the kernel with evdev, and things like device hotplug support has been moved back into udev.

A large reason that the X Window System stuck around for now was just because it was a lot of effort to replace it. With Xorg stripped down from what it initially was, and with a large amount of functionality required for a modern desktop environment provided solely by extensions, well, how can I say this, X is overdue.

Enter Wayland. Wayland reuses a lot of existing infrastructure that we’ve built up. One of the most controversial things about it is that it lacks any sort of network transparency or drawing protocol. X’s network transparency falls flat in modern times. A large amount of Linux features are hosted in places like DBus, not X, and it’s a shame to see things like Drag and Drop and Clipboard support being by and large hacks with the X Window System solely for network support.

Wayland can use almost the entire stack as detailed above to get a framebuffer on your monitor up and running. Wayland still has a protocol, but it’s based around UNIX sockets and local resources. The biggest drastic change is that there is no /usr/bin/wayland binary running like there is a /usr/bin/Xorg. Instead, Wayland follows the modern desktop’s advice and moves all of this into the window manager process instead. These window managers, more accurately called “compositors” in Wayland terms, are actually in charge of pulling events from the kernel with a system like evdev, setting up a frame buffer using KMS and DRM, and displaying windows on the screen with whatever drawing stack they want, including OpenGL. While this may sound like a lot of code, since these subsystems have moved elsewhere, code to do all of these things would probably be on the order of 2000-3000 SLOC. Consider that the portion of mutter just to implement a sane window focus and stacking policy and synchronize it with the X server is around 4000-5000 SLOC, and maybe you’ll understand my fascination a bit more.

While Wayland does have a library that implementations of both clients and compositors probably should use, it’s simply a reference implementation of the specified Wayland protocol. Somebody could write a Wayland compositor entirely in Python or Ruby and implement the protocol in pure Python, without the help of a libwayland.

Wayland clients talk to the compositor and request a buffer. The compositor will hand them back a buffer that they can draw into, using OpenGL, cairo, or whatever. The compositor is at the discretion to do whatever it wants with that buffer – display it normally because it’s awesome, set it on fire because the application is being annoying, or spin it on a cube because we need more YouTube videos of Linux cube spinning.

The compositor is also in control of input and event handling. If you tried out the thing with setting the scale of the windows in GNOME Shell above, you may have been confused at first, and then figured out that your mouse corresponded to the untransformed window. This is because we weren’t *actually* affecting the X11 window itself, just changing how it gets displayed. The X server keeps track of where windows are, and it’s up to the compositing window manager to display them where the X server thinks it is, otherwise, confusion happens.

Since a Wayland compositor is in charge of reading from evdev and giving windows events, it probably has a much better idea of where a window is, and can do the transformations internally, meaning that not only can we spin windows on a cube temporarily, we will be able to interact with windows on a cube.

Summary

I still hear a lot today that Xorg is very monolithic in its implementation. While this is very true, it’s less true than it was a long while ago. This isn’t due to the incompetence on the part of Xorg developers, a large amount of this is due to baggage that we just have to support, like the hardware-accelerated XRender protocol, or going back even further, non-anti-aliased drawing commands like XPolyFill. While it’s very apparent that X is going to go away in favor of Wayland some time soon, I want to make it clear that a lot of this is happening with the acknowledgement and help of the Xorg and desktop developers. They’re not stubborn, and they’re not incompetent. Hell, for dealing with and implementing a 30-year-old protocol plus history, they’re doing an excellent job, especially with the new architecture.

I also want to give a big shout-out to everybody who worked on the stuff I mentioned in this article, and also want to give my personal thanks to Owen Taylor, Ray Strode and Adam Jackson for being extremely patient and answering all my dumb questions, and to Dave Airlie and Adam Jackson for helping me technically review this article.

While I went into some level of detail in each of these pieces, there’s a lot more here where you could study in a lot more detail if it interests you. To pick just a few examples, you could study the geometry algorithms and theories that cairo exploits to convert arbitrary shapes to trapezoids. Or maybe the fast software rendering algorithm for trapezoids by Carl Worth and Keith Packard and investigate why it’s fast. Consider looking at the design of DRI2, and how it differs from DRI1. Or maybe you’re interested in the hardware itself, and looking at graphics card architecture, and looking at the data sheets to see how you would program one. And if you want to help out in any of these areas, I assume all the projects listed above would be more than happy to have contributions.

I’m planning on writing more of these in the future. A large amount of the stacks used in the Linux and GNOME community today don’t have a good overview document detailing them from a very high level.