“DRI”

I spend a lot of time explaining the Linux Graphics Stack to various people online. One of the biggest things I’ve come across is that people have a hard time differentiating between certain acronyms like “DRI”, “DRM” and “KMS”, and where they fit in the Linux kernel, in Xorg, and in Wayland. We’re not the best at naming things, and sometimes we choose the wrong name. But still, let’s go over what these mean, and where they (should) be used.

You see, a long time ago, Linux developers had a bunch of shiny new GPUs and wanted to render 3D graphics on them. We already had an OpenGL implementation that could do software rendering, called mesa. We had some limited drivers that could do hardware rendering in the X server. We just needed to glue it all together: implement hardware support in Mesa, and then put the two together with some duct take.

So a group of developers much, much older than I am started the “Direct Rendering Infrastructure” project, or “DRI” for short. This project would add functionality and glue it all together. So, the obvious choice when naming a piece of glue technology like this is to give it the name “DRI”, right?

Well, we ended up with a large number of unrelated things all effectively named “DRI”. It’s double the fun when new versions of these components come around, e.g. “DRI2” can either refer to a driver model inside mesa, or an extension to the X server.

Yikes. So let’s try to untangle this a bit. Code was added to primarily three places in the DRI project: the mesa OpenGL implementation, the Xorg server, and the Linux kernel. The code does these three things: In order to get graphics on-screen, mesa needs to allocate a buffer, tell the kernel to render into it, and then pass that buffer over to the X Server, which will then display that buffer on the screen.

The code that was added to the kernel was in the form of a module called the “Direct Rendering Manager” subsystem, or “DRM”. The “DRM” subsystem takes care of controlling the GPU hardware, since userspace does not have the permissions to poke at the raw driver directly. Userspace uses these kernel devices by opening them through a path in “/dev/dri”, like “/dev/dri/card0”. Unfortunately, through historical accident, the device nodes had “DRI” in them, but we cannot change it for backwards-compatibility reasons.

The code that was added to mesa, to allocate and then submit commands to render inside those buffers, was a new driver model. As mentioned, there are two versions of this mesa-internal driver model. The differences aren’t too important. If you’ve ever looked inside /usr/lib/dri/ to see /usr/lib/dri/i915_dri.so and such, this is the DRI that’s being named here. It’s telling you that these libraries are mesa drivers that support the DRI driver model.

The third bit, the code that was added to the X server, which was code to allocate, swap, and render to these buffers, is a protocol extension known as DRI. There are multiple versions of it: DRI1, DRI2 and DRI3. Basically, mesa uses these protocol extensions to supply its buffers to the X server so it can show them on screen when it wants to.

It can be extraordinarily confusing when both meanings of DRI are in a single piece of code, like can be found in mesa. Here, we see a piece of helper code for the DRI2 driver model API that helps implement a piece of the code to work with the DRI3 protocol extension, so we end up with both “DRI2” and “DRI3” in our code.

Additionally, to cut down on the shared amount of code between our X server and our mesa driver when dealing with buffer management, we implemented a simple userspace library to help us out, and we called it “libdrm”. It is mostly a set of wrappers around the kernel’s DRM API, but it can have more complex behavior for more complex kinds of buffer management.

The DRM kernel API also has another, separate API inside it, sometimes known as “DRM mode”, and sometimes known as “KMS”, in order to configure and control display controllers. Display controllers don’t render things, they just take a buffer and show it on an output like an HDMI TV or a laptop panel. Perhaps we should have given it a different name and split it out even further. But the DRM mode API is another name for the KMS API. There is some work ongoing to split out the KMS API from the generic DRM API, so that we have two separate devices nodes for them: “render nodes” and “KMS nodes”.

You can also sometimes see the word “DRM” used in other contexts in userspace APIs as well, usually referring to buffer sharing. As a simple example, in order to pass buffers between Wayland clients and Wayland compositors, the mesa’s implementation of this uses a secret internal Wayland protocol known as wl_drm. This protocol is eerily similar to DRI3, actually, which goes to show that sometimes we can’t decide on what something should be named ourselves.

Why I’m excited for Vulkan

I’ve stopped posting here because, in some sense, I felt I had to be professional. I have a lot of half-written drafts I never felt were good enough to publish. Since a lot of eyes were on me, I only posted when I felt I had something I was really proud to share. For anyone who has met me in real-life, you know I can talk a lot about a lot of things, and more than anything else, I’m excited to teach and share. I felt stifled by having a platform to say a lot, and only feeling I could say something really complete and polished, even though I have a lot I want to say.

So expect half-written thoughts on things from here on out, a lot more frequently. I’ll still try to keep it technical and interesting to my audience.

What’s Vulkan

In order to program GPUs, we have a few APIs: Direct3D and OpenGL are the most popular ones, currently. OpenGL has the advantage of being implemented independently by most vendors, and is generally platform-agnostic. The OpenGL API and specification is managed by the standards organization Khronos. Note that in closed environments, you can find many others. Apple has Metal for their own set of PVR-based GPUs. In the game console space, Sony had libgcm on the PS3, GNM on the PS4, and Nintendo has the GX API for the Gamecube and Wii, and GX2 for the Wii U. Since it wasn’t expected that GPUs were swappable by consumers like on the PC platform, these APIs were extremely low-level.

OpenGL was originally started back in the mid-80s as a library called Graphics Layer, or “GL”, for SGI’s internal use on their own hardware and systems. They then released it as a product, “IRIS GL”, allowing customers to render graphics on SGI workstations. As a strategic move by SGI, SGI allowed third-parties to implement the API and opened up the specifications, transferring it from “IRIS GL” to “OpenGL”.

In the 30+ years since GL was started, computing has grown a lot, and OpenGL’s model has grown outdated. Vulkan is the first attempt at a cross-platform, vendor-neutral low-level graphics API. Low-level APIs are similar to what has been seen in the console space for close to a decade, offering higher levels of performance, but instead of tying itself to a GPU vendor, it allows any vendor to implement it for its own hardware.

Dishonesty

People have already written a lot about why Vulkan is exciting. It has a lower overhead on the CPU, leading to much improved performance, especially on CPU-constrained platform like mobile. Instead of being a global implicit state machine, it’s very explicit, allowing for better multithreaded performance.

These are all true, and they’re all good things that people should be excited for. But I’m not going to write about any of these. Instead, I’m going to talk about a more important point which I don’t think has been written about much: the GPU vendor cannot cheat.

You see, there’s been an awkward development in high-level graphics APIs over the last few years. During the early 2000s, the two major GPU vendors, ATI and NVIDIA, effectively had an arms race. They noticed that certain programs and games were behaving “foolishly”.

The code for a game might look like this:


// Clear to black.
glClearColor(0x000000);
glClear();

// Start drawing triangles.
glBegin(GL_TRIANGLES);
glVertex3f(-1, -1, 0);
glVertex3f(-1, 1, 0);
glVertex3f( 1, 1, 0);
// ...
glEnd(GL_TRIANGLES);

(I’m writing in OpenGL, because that’s the API I know, but Direct3D mirrors a very similar API, and has a similar problem)

The vendors noticed that games were clearing the entire screen to black when they really didn’t need to. So they started figuring out whether the game “really” needed to clear the screen, by simply setting a flag that the game wanted a clear, and then not doing it if the triangles painted over it.

Vendors shipped these updated drivers which had better performance. In a perfect world, these tricks would simply improve performance. But competition is a nasty thing, and once one competitor starts playing dirty, you have to follow along to compete.

As another example, the driver vendors noticed that games uploaded textures they didn’t always use. So the drivers started to only upload textures when games actually drew them.

But uploading textures isn’t cheap. When a new texture first appears in a game, it would stall a little bit. And customers got mad at the game developers for having “unoptimized” games, when it was really the vendor’s fault for not implementing the API correctly! Gamers praised the driver vendor for making everything fast, without realizing that performance is a trade-off.

So game developers found another trick: they would draw rectangles with each texture once while the level loaded, to trick the driver into actually uploading the texture. This is the sort of “folklore knowledge” that tends to be passed around from game development company to game development company, that just sort of exists within the industry. This isn’t really documented anywhere, since it’s not a feature of the API, it’s just secret knowledge about how OpenGL really works in practice.

Bigger game developers know all of these, and they tend to have support contracts with the driver vendors who help them solve issues. I’ve heard several examples from game developers where they were told to draw 67 triangles at a time instead of 64 triangles. And that speeds up NVIDIA, but the magic number might be 62 on AMD. Most game engines that I know of, when using “OpenGL in practice”, actually have different paths depending on the OpenGL vendor in use.

I could go on. NVIDIA has broken Chromium because it patched out the “localtime” function. The Dolphin project has hit bugs because having an executable named “Dolphin.exe”. We were told by an NVIDIA employee that there was a similar internal testing tool that used the API wrong, and they simply patched it up themselves. A very popular post briefly touched on “how much game developers get wrong” from an NVIDIA-biased perspective, but having talked to these developers, they’re often told to remove such calls for performance, or because it causes strange behavior because of driver heuristics. It’s common industry knowledge that most drivers ship with hand-compiled or optimized forms of shaders used in popular games as well.

You might have heard of tricks like “AZDO”, or “approaching zero driver overhead”. Basically, since game developers were asking for a slimmer, simpler OpenGL, NVIDIA added a number of extensions to their driver to support more modern GPU usage. The general consensus across the industry was a resounding sigh.

A major issue in shipping GLSL shaders in games is that since there is no conformance test suite for GLSL, different drivers accept different variants of GLSL. For a simple example, see page 85 Glyphy slides for examples of complex shaders in action.

NVIDIA has cemented themselves as the “king of video games” simply by having the most tricks. Since game developers optimize for NVIDIA first, they have an entire empire built around being dishonest. The general impression among most gamers is that Intel and AMD drivers are written by buffoons who don’t know how to program their way out of a paper bag. OpenGL is hard to get right, and NVIDIA has millions of lines of code invested in that. The Dolphin Project even concludes that NVIDIA’s OpenGL implementation is the only one to really work.

How does one get out of that?

Honesty

In early 2013, AMD released the Mantle API, a cross-platform, low-overhead API to program GPUs. They then donated this specification to the Khronos OpenGL committee, and waited. At the same time, AMD worked with Microsoft engineers to design a low-overhead Direct3D 12 API, primarily for the next version of the Xbox, in response to Sony’s success with libgcm.

A year later, the “gl-next” effort was announced and started. The committee, composed of game developers and mobile vendors, quickly hacked through the specification, rounding off the corners. Everyone was excited, but more than anything else, game developers were happy to have a comfortable API that didn’t feel like they were wrestling with the driver. Mobile developers were happy that they had a model that mapped very well to their hardware.

Microsoft got word about gl-next, and quickly followed with Direct3D 12. Another year passed, and the gl-next API was renamed to “Vulkan”.

I have been told through the grape vine that NVIDIA was not very happy with this — they didn’t want to lose the millions they invested in their driver, and their marketing and technical edge, but they couldn’t go against momentum.

Pulling a political coup wasn’t easy — it was tried in the mid-2000s as “OpenGL 3.0”, but since there were less graphics vendors in the day, and since game developers were not allowed as Khronos members, NVIDIA was able to wield enough power to maintain the status quo.

Accountability

Those of you who have seen the Vulkan API (and there are plenty of details on the open web, even if the specs are currently behind NDA), you know that there isn’t any equivalent to glClear or similar. The designs of Vulkan are that you control a modern GPU from start to finish. You control all of these steps, you control what gets scheduled and when.

The games industry has had a term called “dev-to-triangle time” when describing API complexity and difficulty: take an experienced programmer, put him in a room with a brand new SDK he’s never used before, and wait until he gets a single triangle up on the screen. How long does it take?

I’ve always heard the PS2 as having two weeks to a month of dev-to-triangle time, but according to a recent Sony engineer, it was around 3 to 6 months (I think that’s exaggerated, personally). The PS2 made you wrestle with two vector coprocessors, the VU0 and VU1, the Graphics Synthesizer, which ran the equivalent of today’s pixel shaders, along with a dedicated floating-point unit. Getting an engine up on the PS2 required writing code for these four devices, and then writing a process to pass data from one to the other and plug them all together. It’s sort of like you’re writing a driver!

The upside, of course, was that once you put in this required effort, expanding the engine is fairly easy, and you have a fairly good understanding of how everything works and where the boundaries are.

Direct3D and OpenGL, once you wrestle out a few driver issues, consistently has one to two days. The downside, of course, is that complex actions require complex techniques like draw call batching and using atlases to prevent texture switches, or the more complex AZDO techniques detailed above. Some of these can involve a major restructure of engine code. So the subtleties of high-level APIs are only discovered late in development.

Vulkan chooses to opt for the PS2-like approach: game developers are in charge of building command buffers, submitting them to the GPU, waiting on fences, and swapping the front and back buffers and submitting them to the window system themselves.

This means that the driver layer is fairly thin. An ImgTec engineer mentioned that dev-to-triangle time on Vulkan was likely two weeks to a month.

But what you get in return is all that you got on the PS2, and in particular, you get something that hasn’t been possible on the PC so far: accountability. Since the layer is so thin, there’s no place for the driver vendor to cheat. The graphics performance of games is as much as what the developer puts into it. For once, the people gamers often blame — the game developer — will actually be at fault.

Xplain: Regional Geometry

*cough* *cough* Is this thing still on?

I don’t write much here anymore, partly because I don’t see it as a platform where I have much voice or volume, and also because the things I most want to write about don’t fit in this blog thematically.

But a few years ago when I first released Xplain, I promised everyone that when I updated my Xplain series, since it didn’t naturally have an RSS feed, I’d write something here instead. I have released a new article on Xplain, and as such, I’m here to fill up your feed reader with a link telling you to go look elsewhere.

I’m particularly happy with the way this article came out, and for those of you still watching this space, I’d really appreciate it if you read it. Thank you.

Xplain: Regional Geometry

Endless

Six months ago, I left Red Hat to join a small little company on the other side of the country to help them launch a product based on GNOME. I haven’t had much to say in that time, but rest assured, I’ve been very busy.

Today, it has all come real. The small team here has built something amazing. During the next 30 days, you can have the opportunity to own one. To help seed sales and build awareness, we’ve launched a Kickstarter for our product.

Endless

We have much more planned for release, including a site for developers, but we’re swamped with responding to the Kickstarter today. Our source code is available on GitHub.

If you have any questions, feel free to leave a comment, or contact us through Kickstarter. I’m one of the people responding to Kickstarter directly.

Thank you.

Why Package Managers are not my Ideal Software Distribution Mechanism

Those who have spoken to me know that I’m not a big fan of packages for shipping software. Once upon a time, I was wowed that I could simply emerge blender and have a full 3D modelling suite running in a few minutes, without the fuss of wizards or unchecking boxes seeing the README. But today, iOS and Android have redefined the app installation experience, and packages seem like a step backwards.

I’m not alone in this. If you’ve seen recent conversations about the systemd team’s proposal for shipping Linux software differently, they’re effects of the same lunchtime conversations and gripes on IRC.

My goal here is to explain the problems we’ve seen, map out some goals for a new solution to supercede packages, and open up an avenue for discussion about this.

As a user

Dealing with packages as a normal user can be really frustrating. Just last week I had the frustrating experience of trying to upgrade my system when Debian decided to stop in the middle, ask me a question about which sshd configuration file I wanted to keep out of the two. I left it like that and went to lunch, and when I got back I accidentally hit the power strip with my feet. After much cursing, I eventually had to reinstall the OS from scratch.

It should never be possible to completely hose your OS by turning it off during normal operation, and I should be able to upgrade my OS without having the computer ask me incomprehensible questions I don’t understand.

And on my Fedora laptop, I can’t upgrade my system because Blender used an older libjpeg than my system. It gave me some error about packages conflicting and then aborted. And today, as I’m writing this, I’m on an old, insecure Fedora installation because upgrading it takes too much manual effort.

Today’s package managers do not see the OS independently from the applications that make it up: all packages are just combined to create one giant filesystem tree. This scheme works great when you have a bunch of open-source apps you can rebuild at every ABI break, but it’s not great when trying to build a world-class OS.

It’s also partially because package installations aren’t reproducible. Installing package A and then package B does not guarantee the same filesystem tree as installing package B, then A.

Packages are effectively composed of three parts: metadata about the package (its name, version, dependencies, and plenty of other information), a bunch of files to place in the filesystem tree (known as the “payload”), and a set of scripts to run when installing, uninstalling and upgrading the package (known as the “triggers”). It’s because of these scripts that packages are dangerous.

It would be great if developers could ship their apps directly to users. But, unfortunately, packaging gets in the way. The typical way to do things is to package up the source code, and then let community members who are interested make their own package for their favorite “distribution”. Each distribution usually has its own package format, build system, different payloads and triggers, leading to a frustrating fragmentation problem for both users and developers.

The developers of Chromium, for instance, doesn’t allow any bugs to be reported for any builds but their official version, since they can’t be sure what patches the community has made. And in some ases, the community has patched a lot. (Side note: I find it personally disappointing that a great app, Chromium, isn’t shipped in Fedora because of disagreements in how their app is developed. Fedora should stand for freedom and choice for the user to use whatever apps they want, and not try to force their engineering practices on the world.)

As a developer

That said, packages are amazing when doing development. Want to read PNGs? apt-get install libpng-devel. Want a database? Instead of hunting around for SQLite binaries, just yum install 'pkg-config(sqlite3)'.

Paired with pkg-config, I think the usability and ease of use have made it quite possibly the most attractive development environment out there today. In fact, other projects like node’s npm, Ruby’s gems, and Python’s pip have stolen the idea of packages and made it their own. Even Microsoft has endorsed NuGet as the easiest way of developing great apps on top of
their .NET platform.

Development packages solve a lot of the typical problems. These libraries are directly uploaded by developers, and typically are installed per-project, not globally across the entire system, meaning I can have one app built against an older SQLite, and another building something more modern. Upgrading these packages don’t run arbitrary scripts as root, they just unpack new files in a certain location.

I’ve also doing a lot of recent development on my ThinkPad and my home computer, both being equipped with SSDs without a lot of disk space. While I happily welcome HP’s memristors to hit shelves and provide data storage in sizes and speeds better than today’s SSDs, I think it’s worth thinking about how to provide a great experience for those not as fortunate to waste another gig on duplicated libraries.

Working towards a solution

With all of this in mind, we can start working on a solution that solves all these problems and meets these goals. As such, you might have seen different things trickle out of the community here. The amazing Colin Walters was the first to actually do my former employeranything when he built OSTree, which allows fully atomic system upgrades. You can never get your system into a hosed state with it.

At Endless Mobile, we want to ship a great OS that upgrades automatically, without ever breaking if the power gets cut or if the user unplugs it from the wall. We’ve been using OSTree successfully in production, and we’ve never seen a failed upgrade in the wild. It would be great to see the same applied to applications.

As mentioned, we’ve also seen some work starting on the app experienced. Lennart Poettering started working on Sandboxed Applications for GNOME back in 2013, and work has steadily been progressing, both on building KDBus for sandboxed IPC, and a more concrete proposal for how this experience will look and fit together.

Reading closely, you might pick up that I, personally, am not entirely happy with this approach, since there’s no development packages, and a number of other minor technical criticisms, but I haven’t really talked about to Lennart or the rest of the team building that yet.

Disclaimer

I also know that this is controversial. Wars have been fought over package management systems and distributions, and it’s very offputting for someone who just wants to develop software for our platform and our OS.

Package managers aren’t magic, they’re a set of well-understood technical tools, with tradeoffs and limitations like every other system out there. I hope we can move past our differences, recognize issues in existing technology, and build something great together.

As always, these opinions are my own. I do not speak for anybody mentioned in this article, anybody else in the GNOME community, the opinion of GNOME in general, and I certainly don’t speak for either my current employer or my former employer.

Please feel free to express opinions in the comments, for or against, however strong, as I’m honestly trying to open an avenue of discussion. However, I will not tolerate comments that make personal attacks on anybody. My blog is not the place for that.

Xplain: Adding Transparency

The next article in my “Xplain” series is now complete and has been published: “Adding Transparency”. It’s an explanation of how exactly we added transparent windows to the X server, explaining the COMPOSITE X extension, along with other things like RENDER and TFP, together with live demos.

Any and all feedback welcome. I’m having a lot of fun doing these, and I recently got some downtime at work, so the next one might come even quicker than expected.

XNG: GIFs, but better, and also magical

It might seem like the GIF format is the best we’ll ever see in terms of simple animations. It’s a quite interesting format, but it doesn’t come without its downsides: quite old LZW-based compression, a limited color palette, and no support for using old image data in new locations.

Two competing specifications for animations were developed: APNG and MNG. The two camps have fought wildly and we’ve never gotten a resolution, and different browsers support different formats. So, for the widest range of compatibility, we have just been using GIF… until now.

I have developed a new image format which I’m calling “XNG”, which doesn’t have any of these restrictions, and has the possibility to support more complex features, and works in existing browsers today. It doesn’t require any new features like <canvas> or <video> or any JavaScript libraries at all. In fact, it works without any JavaScript enabled at all. I’ve tested it in both Firefox and Chrome, and it works quite well in either. Just embed it like any other image, e.g. <img src="myanimation.xng">.

It’s magic.

Have a few examples:

I’ve been looking for other examples as well. If you have any cool videos you’d like to see made into XNGs, write a comment and I’ll try to convert it. I wrote out all of these XNG files out by hand.

Over the next few days, I’ll talk a bit more about XNG. I hope all you hackers out there look into it and notice what I’m doing: I think there’s certainly a lot of unexplored ideas in what I’ve developed. We can push this envelope further.

EDIT: Yes, guys, I see all your comments. Sorry, I’ve been busy with other stuff, and haven’t gotten a chance to moderate all of them. I wasn’t ever able to reproduce the bug in Firefox about the image hanging, but Mario Klingemann found a neat trick to get Firefox to behave, and I’ve applied it to all three XNGs above.

Shellshock will happen again

As usual, I’m a month late, the big Bash bug known as Shellshock has come and gone, and the world was confused as to why this ever happened in the first place. It’s been fixed for a few weeks now. The questions have started: Why has nobody spotted this earlier? Can we can prevent it? Are the maintainers overworked and underfunded? Should we donate to the FSF? Should we switch to another shell by default? Can we ever trust bash again?

During the whole thing, there’s a big piece of evidence that I didn’t see anybody point out. And I think it helps answer all of these questions. So here it is.

I present to you the upstream git log for bash: http://git.savannah.gnu.org/cgit/bash.git/log/

Every programmer who has just clicked that link is now filled with disgust and disappointment.

It’s all crystal clear now: Nobody would have spotted this earlier. No, we can’t really prevent it. No, the maintainers aren’t overworked and underfunded. No, we shouldn’t donate to the FSF. Perhaps we should switch to another shell. No, we cannot trust bash. Not until a serious change in its management comes along.

For those of you who aren’t programmers, you might be staring at that page, not quite understanding what it all means. And that’s OK. Let me help explain it to you.

There’s a saying in the open-source development community: “With enough eyeballs, all bugs are shallow”. I don’t believe in it as strongly as I used to, but I think there’s some truth to it. It can be found in other disciplines as well: in science, it’s known as “peer-review”, where all papers and discoveries should be rigorously double-checked by peers to make sure you didn’t make any mistakes. In other sorts of writing, the person reviewing it is the editor. Basically, “have somebody else double-check your work”.

The issue with this, though, is that you need enough eyeballs to double-check your work. And while it was assumed before that all open-source software had enough eyeballs, that doesn’t seem to be the case. That said, you can certainly design and maintain a software project in certain ways to attract enough eyeballs. And unfortunately, bash isn’t doing these things.

First of all, you can see that there’s zero eyeballs on the code: one person, Chet Ramey, is writing the code, and nobody double-checks it. Because there’s only one developer, we might assume that there’s no big motivation to do code cleanups or try to make it accessible to anybody other than Chet, since nobody is working on it. And that’s true. This makes the eyeballs wander elsewhere.

But, in fact, that isn’t even true: Florian Weimer of the Red Hat Security Team has developed multiple fixes for the Shellshock bug, but his work was included in bash uncredited. Developers really need to be credited for their work. This makes the eyeballs wander elsewhere.

The code isn’t really that actively developed. Down the bottom of that page, we see dates and times that are from 2012. It seems like nobody actually cares about this code anymore, and nobody is really trying to fix bugs and make it modern. This makes the eyeballs wander elsewhere.

There are no detailed descriptions of what changed between versions, and Which commits in that log are ones that are serious and fixed CVEs, and which might just fix minor documentation bugs? It’s impossible to tell. This makes the eyeballs wander elsewhere.

And even with the corresponding code change, it can be difficult to tell whether a specific commit is an important security fix, a new feature, or a minor bug fix. There’s no explanation in the commit message for why this change was mode. or any sort of changelog, which makes it hard for people redistributing and patching bash to know what fixes are important, and which aren’t. The eyeballs will wander elsewhere.

In comparison, look at the commit log for the Linux kernel. There’s a large number of different people contributing, and all of them explain what changes they make and why they’re making them. To use a recent example (at the time of this writing), this NFS change describes in perfect detail why the change was made (for compatibility with Solaris hosts), and includes a link to a bug report with further information from debugging. As a result, even though bash is more commonly used and included in more things than the Linux kernel itself, the Linux kernel has more eyeballs and more developers.

So, what should we do? How should we fix this situation? I don’t really know. Moving to a new shell isn’t really a solution, and neither is a fork of bash. The best case scenario would be for bash to indeed change its development practices to be more like the Linux kernel, and adopt a thriving community of its own. I don’t have enough power or motivation to enact such a change. I can only hope that I can convince enough people of the right way to maintain a project.

Perhaps, Chet, if you’re out there, do you want to talk about it? This is a discussion we really should be having about the future of your project, and your responsibilities.

Hanging up the hat

Hello. It’s been quite a while. I’ve been meaning to post for a while, but I’ve been too busy trying to get GNOME 3.14 finished up, with Wayland all done for you. I also fixed the last stability issue in GNOME, and now both X11 and Wayland are stable as a rock. If you’ve ever had GNOME freeze up on you when switching windows or Alt-Tabbing, well, that’s fixed, and that was actually the same bug that was crashing Wayland. This was my big hesitation in shipping Wayland, and with that out of the way, I’m really happy. Please try out GNOME 3.13.90 on Wayland and let me know how it goes.

I promise to post a few more Xplain articles before the end of the year, and I have another blog post coming up about GPU rendering that you guys are going to enjoy. Promise. Even though, well…

Changes

I have a new job. Next Tuesday, the 26th, is my final day at Red Hat, and after that I’m going to be starting at Endless Mobile. Working at Red Hat has been a wonderful, life-changing experience, and it was a really hard decision to leave the incredible team that made GNOME what it is today. Thank you, each and every one of you. All of you are incredible, and I hope I keep working with you every single day. It would be an absolute shame if I wasn’t.

Endless Mobile is a fantastic new startup that is focused on shipping GNOME to real end users all across the world, and that’s too exciting of an opportunity to pass by. We, the GNOME community, can really improve the lives of people in developing countries. Let’s make it happen.

I’ll still be around on IRC, mailing lists, reddit, blogging, all the usual places. I’m not planning on leaving the GNOME community. If you have any questions at all, feel free to ask.

Cheers,
  Jasper

xdg-shell

Wayland 1.5 is released. It’s a pretty exciting release, with plenty of features, but the most exciting thing about it is that we can begin work on Wayland 1.6!

… No, I’m serious. Wayland 1.6’s release schedule matches up pretty well with GNOME’s. Wayland 1.6 will be released in the coming weeks before GNOME 3.14, the first version of GNOME with full Wayland support out of the box.

Since development is opening again, we can resume work on xdg-shell, the new desktop shell protocol to replace wl_shell. I alongside Kristian Hoegsberg have been prototyping and implementing this in toolkits and Wayland compositors. We’re extremely happy with our current revision of the bare-bones protocol, so it’s at this point that we want to start evangelizing and outreaching to other communities to make sure that everybody can use it. We’ve been working closely with and taking input from the Wayland community. That means that we’ve been working with the Qt/KDE and Enlightenment/EFL Wayland teams, but anybody who isn’t paying close attention to the Wayland community is out of the loop. This needs to change.

Ironically, as the main Wayland developer for GNOME, I haven’t talked too much about the Wayland protocol. My only two posts on Wayland were a user post about the exciting new features, and one about the legacy X11 backwards compatibility mode, XWayland.

Let’s start with a crash course in Wayland protocols.

wl_surface

As odd as it sounds, Wayland doesn’t have a built-in way to get something like a desktop window system, with draggable, resizable windows. As a next-generation display server, Wayland’s protocol is meant to be a bit more generic than that. Wayland can already be found on mobile devices as part of SailfishOS through the hard work of Jolla and other companies. Engineers at Toyota and Jaguar/Land Rover use Wayland for media centers in cars, as part of a custom Linux distribution called GENIVI. I’m also told that LG’s webOS as used in its smart TVs are investigating Wayland for a display server as well. Dragging and resizing tiny windows from on a phone, or inside a car, or on a TV just isn’t going to be a great experience. Wayland was designed, from the start, to be flexible enough to support a wide variety of use cases.

However, that doesn’t mean that Wayland is all custom protocols: there’s a common denominator between all of these cases. Wayland has a core protocol object called a wl_surface on which clients can show some pixels for output, and retrieve various kinds of input on. This is similar to the concept of X11’s “windows”, which I explain in Xplain. However, the wl_surface isn’t simply a subregion of the overall front buffer. Instead of owning parts of the screen, Wayland clients instead create their own pixel buffers, draw to them, and then “attach” them to the wl_surface, causing a new pixel buffer to be displayed. The wl_surface concept is fairly versatile, and is used any time we need a “live surface” to play around with. For instance, the mouse cursor is done simply by providing the Wayland compositor with a wl_surface. The same thing is done for drag-and-drop icons as well.

An interesting aside is that the model taken by Wayland with wl_surface can actually require less copies and be more efficient than X11 with modern systems. More and more GPUs have more interesting and fancy hardware at scanout time. With the rise of low-power phones that require rich graphics, we’re seeing a resurgence in fixed-function alpha blending and compositing hardware when doing scanout, similar to what game consoles like the NES and SNES had (but they called them “sprites“). X11’s model of a giant front buffer that apps draw to means that we must copy all contents to the front buffer eventually from the CPU, while Wayland’s model means that applications can simply hand us their pixel buffers, and we can choose to show it as an overlay, which removes any copy. And if an application is full-screen, we can simply tell the GPU to scan out from that application’s buffer directly, instead of having to copy.

xdg-shell

OK, so I’ve talked about wl_surface. How does this relate to xdg-shell? Since a wl_surface can be used for lots of different purposes, like cursors, simply creating the wl_surface and attaching a buffer doesn’t put it on the screen. Instead, first, we need to let the Wayland compositor know that this wl_surface is intended to be a desktop-style window that can be dragged and resized around. It should appear in Alt-Tab, and clicking on it should give it keyboard focus, etc.

Wayland’s approach here is a bit odd, but to give a wl_surface a role, we construct a new wrapper object which has all of our desktop-level protocol functions, and then hand it the wl_surface. In this case, the protocol that we use to create this role is known as “xdg-shell”, and the wrapper object is known as an “xdg_surface”. The name is a reference to the FreeDesktop Group, an open mailing list where cross-desktop standards are discussed between all the different desktops. For historical reasons, it’s abbreviated as “XDG”. Members from the XDG community have all been contributing to xdg-shell.

The approach of a low-level structure with a high-level role is actually fairly similar to the approach taken in X11. X11 simply provides a data structure called a “window”, as I explained in Xplain: a tool that you can use to construct your interface by pushing pixels here, and getting input there. An external process called a “window manager” turns this window from a simple region of the front buffer into a window with a title and icon that the user can move around, resize, minimize and maximize with keyboard shortcuts and a taskbar. The window manager and the client applications both agree to cooperate and follow a series of complex standards like the ICCCM and EWMH that allow you to provide this “role”. Though I’ve never actually worked on any environments other than traditional desktops, I’d imagine that in more special-case environments, different protocols are used instead, and the concept of the window manager is completely dropped.

X11 has no easy, simple way of creating protocol extensions. Something as simple as a new request or a new event requires a bunch of byte-marshalling code in client libraries, extra support code added in the server, in toolkits, and a new set of APIs. It’s a pain, trust me. Instead, X11 does provide a generic way to send events to clients, and a series of key/value pairs on windows called “properties”, so standards like these often use the generic mechanisms rather than building an actual new protocol, since the effort is better spent elsewhere. It’s an unfortunate way that X11 was developed.

Wayland makes it remarkably easy to create a new protocol extension involving new objects and custom methods. You write up a simple XML description of your protocol, and an automatic tool, wayland-scanner, generates server-side and client-side marshalling code for you. All that you need to do now is write the implementation side of things. On the client, that means creating the object and calling methods on it. Because it’s so easy to write custom extensions in Wayland, we haven’t even bothered creating a generic property or event mechanism. Things with a structure allow us a lot more stability and rigidity.

wl_shell_surface

Long-time users or developers of Wayland might notice this sounds similar to an older protocol known as wl_shell or wl_shell_surface. The intuition is correct: xdg-shell is a direct replacement for wl_shell. wl_shell_surface had a number of frustrating limitations, and due to its inclusion in the Wayland 1.0 core, it is harder to change and make better. As Fred Brooks told us, “write one to throw away”.

xdg-shell can be seen as a replacement for wl_shell_surface, and it solves a number of fundamental issues and race conditions which I’d prefer not to go into here (but if you ask nicely in the comments, I might oblige!), but I guess you’ll have to trust me when I say that they were highly visible user bugs and frustrations about some weird lagginess or flickering when using Wayland. We’re happy that these are gone.

A call to arms

The last remaining ticket item I have to work in xdg-shell is related to “window geometry”, as a way of communicating where the user’s concept of the edge of the window is. This requires significant reworking of the code in weston, mutter, and GTK+. After that, it will serve the needs of GTK+ and GNOME perfectly.

Does it serve the needs of your desktop? The last thing we want to do is to solidify the xdg-shell protocol, only to find that a few months later, it doesn’t quite work right for tiling WMs or for EFL or KDE applications. We’re always experimenting with things like this to make sure everything can work, but really, we’re only so many people, and others testing it out and porting their toolkits and compositors over can’t ever hurt.

So, any and all help is appreicated! If you have any feedback on xdg-shell so far, or need any help understanding this, feel free to post about it to the Wayland mailing list or poke me on IRC (#wayland on Freenode, my nick in there is Jasper).

As always, if anybody has any questions, no matter how dumb or stupid, please let me know! Comments are open, and I always try to reply.