Bytecode

What is the most commonly used bytecode language in the world? Java (JVM Bytecode)? .NET (CLI)? Flash (AVM1/AVM2)? Nope. There’s a few that you use every day, simply by turning on your computer, or tablet, or even phone. You don’t even have to start an application or visit a webpage.

ACPI

The most obvious is the large, gargantuan specification known as “ACPI”. The “Advanced Configuration and Power Interface” specification lives up to its name, with the most recent specification being a mammoth document that weighs in at almost 1000 pages. And yes, operating systems are expected to implement this. The entire thing. The bytecode part is hidden deep, but it’s seen in chapter 20, under “ACPI Machine Language”, describing a semi-register VM with all the usuals: Add, Subtract, Multiply, Divide, standard inequalities and equalities, but then throws in other fun things like ToHexString and Mid (substring). Look even further and you’ll see a full object model, system properties, as well as an asynchronous signal mechanism so that devices are notified about when those system properties change.

Most devices, of course, have a requirement of nothing less than a full implementation of ACPI, so of course all this code is implemented in your kernel, running at early boot. It parallels the complexity of a full JavaScript environment with its type system and system bindings, with the program code supplied directly over the wire from any device you plug in. Because the specification is so complex, an OS-independent reference implementation was created by Intel, and this is the implementation that’s used in the Linux kernel, the BSDs (including Mac OS X), and the fun toy ReactOS, HaikuOS kernels. I don’t know if it’s used by Windows or not. Since the specification’s got Microsoft’s name on it, I assume their implementation was created long before ACPICA.

Fonts

After that, want to have a graphical boot loader? Simply rendering an OpenType font (well, only OpenType fonts with CFF glyphs, but the complexities of the OpenType font format is a subject for another day) requires parsing the Type 2 Glyph Format, which indeed involves a custom bytecode format to establish glyphs. This one’s even more interesting: it’s a real stack-based interpreter, and it even has a “random” opcode to make random glyphs at runtime. I can’t imagine this ever be useful, but it’s there, and it’s implemented by FreeType, so I can only assume it’s used by some fonts from in the real world. This bytecode interpreter also contained at one time a stack overflow vulnerability which was what jailbroke the iPhone in JailbreakMe.com v2.0, with the OTF file being loaded by Apple’s custom PDF viewer.

This glyph language is based on and is a stripped down version of PostScript. Actual PostScript involves a full turing-complete register/stack-based hybrid virtual machine based on Forth. The drawbacks of this system (looping forever, interpreting the entire script to draw a specific page because of complex state) were the major motivations for the PDF format — while based on PostScript, it doesn’t have much shared document state, and doesn’t allow any arbitrary flow control operations. In this model, someone (even an automated program) could easily verify that a graphic was encapsulated, not doing different things depending on input, and that it terminated at some point.

And, of course, since fonts are complicated, and OpenType is complicated, OpenType also includes all of TrueType, which includes a bytecode-based hinting model to ensure that your fonts look correct at all resolutions. I won’t ramble on about it, but here’s the FreeType implementation. I don’t know of anything interesting happening to this. Seems there was a CVE for it at one time.

To get this article to display on screen, it’s very likely that thousands of these tiny little microprograms ran, once for each glyph shape in each font.

Packet filtering

Further on, if you want to capture a network packet with tcpdump or libpcap (or one of its users like Wireshark), it’s being filtered through the Berkeley Packet Filter, a custom register-based bytecode. The performance impact of this at one time was too large for people debugging network issues, so a simple JIT compiler was put into the kernel, under an experimental sysctl flag.

As a piece of historical interest, an earlier version of the BPF code was part of the code claimed to be infringing part of the SCO lawsuits (page 15), but was actually part of BSD4.3 code that was copied to the Linux kernel. The original BSD code was eventually replaced with the current architecture, known as the Linux Socket Filter, in Linux 2.2 (which I can’t easily link to, as there’s no public repository of the Linux kernel code with pre-git history, as far as I know).

What about it?

The popularity of bytecode as a general and flexible solution to problems is alluring, but it’s not without its complexities and faults, with such major security implications (an entire iPhone jailbreak from incorrect stack overflow checking!) and insane implementation requirements (so much that we only have one major implementation of ACPI used across all OSes that we can check).

The four examples also bring out something interesting: the wildly different approaches that can be taken to a bytecode language. In the case of ACPI, it’s an interesting take on what I can only imagine is scope creep on an originally declarative table specification, bringing it to the mess today. The Type 1 Glyph and TrueType Hinting languages are basic stack-based interpreters, showing their PostScript heritage. And BPF is a register-based interpreter, which ends up with a relatively odd register-based language that can really only do simple operations.

Note, though, that all of these implementations above have had security issues in their implementations, with numerous CVEs for each one, because bytecode interpreter implementations are hard to get right. So, to other hackers: do you know of any other low-level, esoteric custom bytecode specifications like these? And to spec writers: did you really need that flexibility?

The Linux Graphics Stack

This is an introductory overview post for the Linux Graphics Stack, and how it currently all fits together. I initially wrote it for myself after having conversations with people like Owen Taylor, Ray Strode and Adam Jackson about this stack. I had to go back to them every month or so and learn the stuff from the ground up all over again, as I had forgotten every single piece. I asked them for a good high-level overview document so I could stop bothering them. They didn’t know of any. I started this one. It has been reviewed by Adam Jackson and David Airlie, both of whom work on this exact stack.

Also, I want to point out that a large amount of this stack applies only to the free software drivers. That means that a lot of what you read here may not apply for the AMD Catalyst and NVIDIA proprietary drivers. They may have their own implementations of OpenGL, or an internal fork of Mesa. I’m describing the stack that comes with the free radeon, nouveau and Intel drivers.

If you have any questions, or if at any point things were unclear, or if I’m horribly, horribly wrong about something, or if I just botched a sentence so that it’s incomprehensible, please ask me or let me know in the comments section below.

To start us off, I’m going to paste the entire big stack right here, to let you get a broad overview of where every piece fits in the stack. If you do not understand this right away, don’t be scared. Feel free to refer to this throughout the post. Here’s a handy link.

So, to be precise, there are two different paths, depending on the type of rendering you’re doing.

3D rendering with OpenGL
  1. Your program starts up, using “OpenGL” to draw.
  2. A library, “Mesa”, implements the OpenGL API. It uses card-specific drivers to translate the API into a hardware-specific form. If the driver uses Gallium internally, there’s a shared component that turns the OpenGL API into a common intermediate representation, TGSI. The API is passed through Gallium, and all the device-specific driver does is translate from TGSI into hardware commands,
  3. libdrm uses special secret card-specific ioctls to talk to the Linux kernel
  4. The Linux kernel, having special permissions, can allocate memory on and for the card.
  5. Back out at the Mesa level, Mesa uses DRI2 to talk to Xorg to make sure that buffer flips and window positions, etc. are synchronized.
2D rendering with cairo
  1. Your program starts up, using cairo to draw.
  2. You draw some circles using a gradient. cairo decomposes the circles into trapezoids, and sends these trapezoids and gradients to the X server using the XRender extension. In the case where the X server doesn’t support the XRender extension, cairo draws locally using libpixman, and uses some other method to send the rendered pixmap to the X server.
  3. The X server acknowledges the XRender request. Xorg can use multiple specialized drivers to do the drawing.
    1. In a software fallback case, or in the case the graphics driver isn’t up to the task, Xorg will use pixman to do the actual drawing, similar to how cairo does it in its case.
    2. In a hardware-accelerated case, the Xorg driver will speak libdrm to the kernel, and send textures and commands to the card in the same way.

As to how Xorg gets things on the screen, Xorg itself will set up a framebuffer to draw into using KMS and card-specific drivers.

X Window System, X11, Xlib, Xorg

X11 isn’t just related to graphics; it has an event delivery system, the concept of properties attached to windows, and more. Lots of other non-graphical things are built on top of it (clipboard, drag and drop). Only listed in here for completeness, and as an introduction. I’ll try and post about the entire X Window System, X11 and all its strange design decisions later.

X11
The wire protocol used by the X Window System
Xlib
The reference implementation of the client side of the system, and a host of tons of other utilities to manage windows on the X Window System. Used by toolkits with support for X, like GTK+ and Qt. Vanishingly rare to see in applications today.
XCB
Sometimes quoted as an alternative to Xlib. It implements a large amount of the X11 protocol. Its API is at a much lower level than Xlib, such that Xlib is actually built on XCB nowadays. Only mentioning it because it’s another acronym.
Xorg
The reference implementation of the server side of the system.

I’ll try to be very careful to label something correctly. If I say “the X server”, I’m talking about a generic X server: it may be Xorg, it may be Apple’s X server implementation, it may be Kdrive. We don’t know. If I say “X11”, or the “X Window System”, I’m talking about the design of the protocol or system overall. If I say “Xorg”, that means that it’s an implementation detail of Xorg, which is the most used X server, and may not apply to any other X servers at all. If I ever say “X” by itself, it’s a bug.

X11, the protocol, was designed to be extensible, which means that support for new features can be added without creating a new protocol and breaking existing old clients. As an example, xeyes and oclock get their fancy shapes because of the Shape Extension, which provide support for non-rectangular windows. If you’re curious how this magic functionality can just appear out of nowhere, the answer is that it doesn’t: support for the extension has to be added to both the server and client before it can be used. There is functionality in the core protocol itself so that clients can ask the server what extensions it has support for, so it knows what functionality it can or cannot use.

X11 was also designed to be “network transparent”. Most importantly it means that we cannot rely on the X server and any X client being on the same machine, so any talking between the two must go over the network. In actuality, a modern desktop environment does not work out of the box in this scenario, as a lot of inter-process communication goes through systems other than X11, like DBus. Over the network, it’s quite chatty, and results in a lot of traffic. When the server and client are on the same machine, instead of going over the network, they’ll go over a UNIX socket, and the kernel doesn’t have to copy data around.

We’ll get back to the X Window System and its numerous extensions in a little bit.

cairo

cairo is a drawing library used either by applications like Firefox directly, or through libraries like GTK+, to draw vector shapes. GTK+3’s drawing model is built entirely on cairo. If you’ve ever used an HTML5 <canvas>, cairo implements practically the same API. Although <canvas> was originally developed by Apple, the vector drawing model is well-known, as it’s the PostScript vector drawing model, which has found support in other vector graphics technologies and standards such as PDF, Flash, SVG, Direct2D, Quartz 2D, OpenVG, and lots more than I can possibly give an exhaustive list for.

cairo has support to draw to X11 surfaces through the Xlib backend.

cairo has been used in toolkits like GTK+. Functionality was added in GTK+ 2 to optionally use cairo in GTK+ 2.8. The drawing model in GTK+3 requires cairo.

XRender Extension

X11 has a special extension, XRender, which adds support for anti-aliased drawing primitives (X11’s existing graphics were aliased), gradients, matrix transforms and more. The original intention was that drivers could have specialized accelerated code paths for doing specific drawing. Unfortunately, it seems to be the case that software rasterization is just as fast, for unintuitive reasons. Oh well. XRender deals in aligned trapezoids – rectangles with an optional slant to the left and right edges. Carl Worth and Keith Packard came up with a “fast” software rasterization method for trapezoids. Trapezoids are also super easy to decompose into two triangles, to align ourselves with fast hardware rendering. Cairo includes a wonderful show-traps utility that might give you a bit of insight into the tessellation into trapezoids it does.

Here’s a simple red circle we drew. This is decomposed into two sets of trapezoids – one for the stroke, and one for the fill. Since the diagrams that show-traps gives you by default aren’t very enlightening, I hacked up the utility to give each trapezoid a unique color. Here’s the set of trapezoids for the black stroke.

Psychedelic.

pixman

Both the X server and cairo need to do pixel level manipulation at some point. cairo and Xorg had separate implementations of things like basic rasterization algorithms, pixel-level access for certain kinds of buffers (ARGB32, RGB24, RGB565), gradients, matrices, and a lot more. Now, both the X server and cairo share a low-level library called pixman, which handles these things for it. pixman is not supposed to be a public API, nor is it a drawing API. It’s not really any API at all; it’s just a solution to some code duplication between various parts.

OpenGL, Mesa, gallium

Now comes the fun part: modern hardware acceleration. I assume everybody already knows what OpenGL is. It’s not a library, there will never be one set of sources to a libGL.so. Each vendor is supposed to provide its own libGL.so. NVIDIA provides its own implementation of OpenGL and ships its own libGL.so, based on its implementations for Windows and OS X.

If you are running open-source drivers, your libGL.so implementation probably comes from Mesa. Mesa is many things, but one of the major things it provides that it is most famous for is its OpenGL implementation. It is an open-source implementation of the OpenGL API. Mesa itself has multiple backends for which it provides support. It has three CPU-based implementations: swrast (outdated and old, do not use it), softpipe (slow), llvmpipe (potentially fast). Mesa also has hardware-specific drivers. Intel supports Mesa and has built a number of drivers for their chipsets which are shipped inside Mesa. The radeon and nouveau drivers are also supported in Mesa, but are built on a different architecture: gallium.

gallium isn’t really anything magical. It’s a set of components to make implementing drivers a lot easier. The idea is that there are state trackers that implement some form of API (OpenGL, GLSL, Direct3D), transform that state to an intermediate representation (known as Tungsten Graphics Shader Infrastructure, or TGSI), and then the backends take that intermediate representation and convert it to the operations that would be consumed by the hardware itself.

Sadly, the Intel drivers don’t use Gallium. My coworkers tell me it’s because the Intel driver developers do not like having a layer between Mesa and their driver.

A quick aside: more acronyms

Since there’s a lot of confusing acronyms to cover, and I don’t want to give them each an H3 and write a paragraph for each, I’m just going to list them here. Most of them don’t really matter in today’s world, they’re just an easy reference so you don’t get confused.

GLES
OpenGL has several profiles for different form factors. GLES is one of them, standing for “GL Embedded System” or “GL Embedded Subset”, depending on who you ask. It’s the latest approach to target the embedded market. The iPhone supports GLES 2.0.
GLX
OpenGL has no concept of platform and window systems on its own. Thus, bindings are needed to translate between the differences of OpenGL and something like X11. For instance, putting an OpenGL scene inside an X11 window. GLX is this glue.
WGL
See above, but replace “X11” with “Windows”, that is, that one Microsoft Operating System.
EGL
EGL and GLES are often confused. EGL is a new platform-agnostic API, developed by Khronos Group (the same group that develops and standardizes OpenGL) that provides facilities to get an OpenGL scene up and running on a platform. Like OpenGL, this is vendor-implemented; it’s an alternative to bindings like WGL/GLX and not just a library on top of them, like GLUT.
fglrx
fglrx is former name for AMD’s proprietary Xorg OpenGL driver, now known as “Catalyst”. It stands for “FireGL and Radeon for X”. Since it is a proprietary driver, it has its own implementation of libGL.so. I do not know if it is based on Mesa. I’m only mentioning it because it’s sometimes confused with generic technology like AIGLX or GLX, due to the appearance of the letters “GL” and “X”.
DIX, DDX
The X graphics parts of Xorg is made up of two major parts, DIX, the “Driver Independent X” subsystem, and DDX, the “Driver Dependent X” subsystem. When we talk about an Xorg driver, the more technically accurate term is a DDX driver.

Xorg Drivers, DRM, DRI

A little while back I mentioned that Xorg has the ability to do accelerated rendering, based on specific pieces of hardware. I’ll also say that this is not implemented by translating from X11 drawing commands into OpenGL calls. If the drivers are implemented in Mesa land, how can this work without making Xorg dependent on Mesa?

The answer was to create a new piece of infrastructure to be shared between Mesa and Xorg. Mesa implements the OpenGL parts, Xorg implements the X11 drawing parts, and they both convert to a set of card-specific commands. These commands are then uploaded to the kernel, using something called the “Direct Rendering Manager”, or DRM. libdrm uses a set of generic, private ioctls with the kernel to allocate things on the card, and stuffs the commands and textures and things it needs in there. This ioctl interface comes in two forms: Intel’s GEM, and Tungsten Graphics’s TTM. There is no good distinction between them; they both do the same thing, they’re just different competing implementation. Historically, GEM was designed and proudly announced to be a simpler alternative to TTM, but over time, it has quietly grown to about the same complexity as TTM. Welp.

This means that when you run something like glxgears, it loads Mesa. Mesa itself loads libdrm, and that talks to the kernel driver directly using GEM/TTM. Yes, glxgears talks to the kernel driver directly to show you some spinning gears, and sternly remind you of the benchmarking contents of said utility.

If you poke in ls /usr/lib64/libdrm_*, you’ll note that there are hardware-specific drivers. For cases when GEM/TTM aren’t enough, the Mesa and X server drivers will have a set of private ioctls to talk to the kernel, which are encapsulated in here. libdrm itself doesn’t actually load these.

The X server needs to know what’s happening here, though, so it can do things like synchronization. This synchronization between your glxgears, the kernel, and the X server is called DRI, or more accurately, DRI2. “DRI” stands for “Direct Rendering Infrastructure”, but it’s sort of a strange acronym. “DRI” refers to both the project that glued Mesa and Xorg together (introducing DRM and a bunch of the things I talk about in this article), as well as the DRI protocol and library. DRI 1 wasn’t really that good, so we threw it out and replaced it with DRI 2.

KMS

As a sort of aside, let’s say you’re working on a new X server, or maybe you want to show graphics on a VT without using an X server. How do you do it? You have to configure the actual hardware to be able to put up graphics. Inside of libdrm and the kernel, there’s a special subsystem that does exactly that, called KMS, which stands for “Kernel Mode Setting”. Again, through a set of ioctls, it’s possible to set up a graphics mode, map a framebuffer, and so on, to display directly on a TTY. Before, there were (and still are) hardware-specific ioctls, so a shared library called libkms was created to give a shared API. Eventually, there was a new API at the kernel level, literally called the “dumb ioctls”. With the new dumb ioctls in place, it is recommended to use those and not libkms.

While it’s very low-level, it’s entirely possible to do. Plymouth, the boot splash screen integrated into modern distributions, is a good example of a very simple application that does this to set up graphics without relying on an X server.

The “Expose” model, Redirection, TFP, Compositing, AIGLX

I’ve used the term “compositing window manager” without really describing what it means to composite, or what a window manager does. You see, back in the 80s when the X Window System was designed on UNIX systems, a lot of random other companies like HP, Digital Equipment Corp., Sun Microsystems, SGI, were also developing products based on the X Window System. X11 intentionally didn’t mandate any basic policy on how windows were to be controlled, and delegated that responsibility to a separate process called the “window manager”.

As an example, the popular environment at the time, CDE, had a system called “focus follows mouse”, which focused windows when the user moved their mouse over the window itself. This is different from the more popular model that Windows and Mac OS X use by default, “click to focus”.

As window managers started to become more and more complex, documents started to appear describing interoperability between the many different environments. It, too, wouldn’t really mandate any policy either like “Click to Focus” either.

Additionally, back in the 80s, many systems did not have much memory. They could not store the entire pixel contents of all window contents. Windows and X11 solve this issue in the same way: each X11 window should be lossy. Namely, a program will be notified that a window has been “exposed”.

Imagine a set of windows like this. Now let’s say the user drags GIMP away:

The area in the dark gray has been exposed. An ExposeEvent will get sent to the program that owns the window, and it will have to redraw contents. This is why hung programs in versions of Windows or Linux would go blank after you dragged a window over them. Consider the fact that in Windows, the desktop itself is just another program without any special privileges, and can hang like all the others, and you have one hell of a bug report.

So, now that machines have as much memory as they do, we have the opportunity to make X11 windows lossless, by a mechanism called redirection. When we redirect a window, the X server will create backing pixmaps for each window, instead of drawing directly to the backbuffer. This means that the window will be hidden entirely. Something else has to take the opportunity to display the pixels in the memory buffer.

The composite extension allows a compositing window manager, or “compositor”, to set up something called the Composite Overlay Window, or COW. The compositor owns the COW, and can paint to it. When you run Compiz or GNOME Shell, these programs are using OpenGL to display the redirected windows on the screen. The X server can give them the window contents by a GL extension, “Texture from Pixmap“, or TFP. It lets an OpenGL program use an X11 Pixmap as if it were an OpenGL Texture.

Compositing window managers don’t have to use TFP or OpenGL, per se, it’s just the easiest way to do so. They could, if they wanted to, draw the window pixmaps onto the COW normally. I’ve been told that kwin4 uses Qt directly to composite windows.

Compositing window managers grab the pixmap from the X server using TFP, and then render it on to the OpenGL scene at just the right place, giving the illusion that the thing that you think you’re clicking on is the actual X11 window. It may sound silly to describe this as an illusion, but play around with GNOME Shell and adjust the size or position of the window actors (Enter global.get_window_actors().forEach(function(w) { w.scale_x = w.scale_y = 0.5; }) in the looking glass), and you’ll quickly see the illusion break down in front of you, as you realize that when you click, you’re poking straight through the video player, and to the actual window underneath (Change 0.5 to 1.0 in the above snippet to revert the behavior).

With this information, let’s explain one more acronym: AIGLX. AIGLX stands for “Accelerated Indirect GLX”. As X11 is a networked protocol, this means that OpenGL should have to work over the network. When OpenGL is being used over the network, this is called an “indirect context”, versus a “direct context” where things are on the same machine. The network protocol used in an “indirect context” is fairly incomplete and unstable.

To understand the design decision behind AIGLX, you have to understand the problem that it was trying to solve: making compositing window managers like Compiz fast. While NVIDIA’s proprietary driver had kernel-level memory management through a custom interface, the open-source stack at this point hadn’t achieved that yet. Pulling a window texture from the X server into the graphics hardware would have meant that it would have to be copied every time the window was updated. Slow. As such, AIGLX was a temporary hack to implement OpenGL in software, preventing the copy into hardware acceleration. As the scene that compositors like Compiz used wasn’t very complex, it worked well enough.

Despite all the fanfare and Phoronix articles, AIGLX hasn’t been used realistically for a while, as we now have the entire DRI stack which can be used to implement TFP without a copy.

As you can imagine, copying (or more accurately, sampling) the window texture contents so that it can be painted as an OpenGL texture requires copying data. As such, most window managers have a feature to turn redirection off for a window that’s full-screen. It may sound a bit silly to describe this as unredirection, as that’s the initial state a window is in. But while it may be the initial state for a window, with our modern Linux desktops, it’s hardly the usual state. The logic here is that if a window would be covering the COW anyway and compositing features aren’t necessary, it can safely be unredirected. This feature is designed to give high performance to programs like games, which need to run with high performance at 60 frames per second.

Wayland

As you can see, we’ve split out quite a large bit of the original infrastructure from X’s initial monolithic behavior. This isn’t the only place where we’ve tore down the monolithic parts of X: a lot of the input device handling has moved into the kernel with evdev, and things like device hotplug support has been moved back into udev.

A large reason that the X Window System stuck around for now was just because it was a lot of effort to replace it. With Xorg stripped down from what it initially was, and with a large amount of functionality required for a modern desktop environment provided solely by extensions, well, how can I say this, X is overdue.

Enter Wayland. Wayland reuses a lot of existing infrastructure that we’ve built up. One of the most controversial things about it is that it lacks any sort of network transparency or drawing protocol. X’s network transparency falls flat in modern times. A large amount of Linux features are hosted in places like DBus, not X, and it’s a shame to see things like Drag and Drop and Clipboard support being by and large hacks with the X Window System solely for network support.

Wayland can use almost the entire stack as detailed above to get a framebuffer on your monitor up and running. Wayland still has a protocol, but it’s based around UNIX sockets and local resources. The biggest drastic change is that there is no /usr/bin/wayland binary running like there is a /usr/bin/Xorg. Instead, Wayland follows the modern desktop’s advice and moves all of this into the window manager process instead. These window managers, more accurately called “compositors” in Wayland terms, are actually in charge of pulling events from the kernel with a system like evdev, setting up a frame buffer using KMS and DRM, and displaying windows on the screen with whatever drawing stack they want, including OpenGL. While this may sound like a lot of code, since these subsystems have moved elsewhere, code to do all of these things would probably be on the order of 2000-3000 SLOC. Consider that the portion of mutter just to implement a sane window focus and stacking policy and synchronize it with the X server is around 4000-5000 SLOC, and maybe you’ll understand my fascination a bit more.

While Wayland does have a library that implementations of both clients and compositors probably should use, it’s simply a reference implementation of the specified Wayland protocol. Somebody could write a Wayland compositor entirely in Python or Ruby and implement the protocol in pure Python, without the help of a libwayland.

Wayland clients talk to the compositor and request a buffer. The compositor will hand them back a buffer that they can draw into, using OpenGL, cairo, or whatever. The compositor is at the discretion to do whatever it wants with that buffer – display it normally because it’s awesome, set it on fire because the application is being annoying, or spin it on a cube because we need more YouTube videos of Linux cube spinning.

The compositor is also in control of input and event handling. If you tried out the thing with setting the scale of the windows in GNOME Shell above, you may have been confused at first, and then figured out that your mouse corresponded to the untransformed window. This is because we weren’t *actually* affecting the X11 window itself, just changing how it gets displayed. The X server keeps track of where windows are, and it’s up to the compositing window manager to display them where the X server thinks it is, otherwise, confusion happens.

Since a Wayland compositor is in charge of reading from evdev and giving windows events, it probably has a much better idea of where a window is, and can do the transformations internally, meaning that not only can we spin windows on a cube temporarily, we will be able to interact with windows on a cube.

Summary

I still hear a lot today that Xorg is very monolithic in its implementation. While this is very true, it’s less true than it was a long while ago. This isn’t due to the incompetence on the part of Xorg developers, a large amount of this is due to baggage that we just have to support, like the hardware-accelerated XRender protocol, or going back even further, non-anti-aliased drawing commands like XPolyFill. While it’s very apparent that X is going to go away in favor of Wayland some time soon, I want to make it clear that a lot of this is happening with the acknowledgement and help of the Xorg and desktop developers. They’re not stubborn, and they’re not incompetent. Hell, for dealing with and implementing a 30-year-old protocol plus history, they’re doing an excellent job, especially with the new architecture.

I also want to give a big shout-out to everybody who worked on the stuff I mentioned in this article, and also want to give my personal thanks to Owen Taylor, Ray Strode and Adam Jackson for being extremely patient and answering all my dumb questions, and to Dave Airlie and Adam Jackson for helping me technically review this article.

While I went into some level of detail in each of these pieces, there’s a lot more here where you could study in a lot more detail if it interests you. To pick just a few examples, you could study the geometry algorithms and theories that cairo exploits to convert arbitrary shapes to trapezoids. Or maybe the fast software rendering algorithm for trapezoids by Carl Worth and Keith Packard and investigate why it’s fast. Consider looking at the design of DRI2, and how it differs from DRI1. Or maybe you’re interested in the hardware itself, and looking at graphics card architecture, and looking at the data sheets to see how you would program one. And if you want to help out in any of these areas, I assume all the projects listed above would be more than happy to have contributions.

I’m planning on writing more of these in the future. A large amount of the stacks used in the Linux and GNOME community today don’t have a good overview document detailing them from a very high level.

Requirements and tips for getting your GNOME Shell Extension approved

If you’ve written and submitted a GNOME Shell Extension, you’ve probably run across a reviewer saying that you shouldn’t do something, not even knowing what you shouldn’t have done beforehand. I’m sorry, the Shell team (including myself) hasn’t really done a good job documenting what a good GNOME Shell Extension should do, and we reviewers constantly run across the same mistakes in review.

I’m going to hopefully change this for the better by documenting most of what us reviewers look for when reviewing code. This isn’t meant to be an exhaustive list – even if you do everything in here, we still may reject your extension for other reasons. If it’s a code reason and it’s a common one, I’ll update the list.

This is mostly about GNOME-specific gotchas. Please attempt to write straightforward and clean code. Unravelling spaghetti is not fun for any of the extension reviewer team.

Don’t do anything major in init()

init() is there to have anything that needs to happen at first-run, like binding text domains. Do not make any UI modifications or setup any callbacks or anything in init(). Your extension will break, and I will reject it. Do any and all modifications in enable().

Undo all UI modifications in disable()

If you make any UI modifications in enable(), you need to undo them in disable(). So, if you add an icon to the top bar when your extension is enabled, you need to remove it. Simple. Failure to do so will be an immediate rejection.

Remove all possible callback sources

If you’re writing an extension, you must disconnect all signals and remove all Mainloop sources. Failure to do so will be an immediate rejection. So, if you connect to a signal using global.display.connect('window-created', onWindowCreated);, you need to disconnect it:

let _windowCreatedId;

function enable() {
    _windowCreatedId = global.display.connect('window-created', onWindowCreated);
}

function disable() {
    global.display.disconnect(_windowCreatedId);
}

The same goes for any timeouts or intervals added with the Mainloop module. Use Mainloop.source_remove to make sure your source is removed.

Extensions targeted for GNOME 3.2 cannot use GSettings

It’s unfortuate, but true. Extensions that target GNOME 3.2 cannot use GSettings when uploaded to the repository. It will crash the Shell. There are new APIs in GNOME 3.4 that allow us to work around this. The technique used most often is to have a bunch of const statements at the top of the shell that the user can modify.

A quick aside, a libsoup gotcha

function buttonPressed() {
    // This code will crash the Shell, do not use it!
    let session = new Soup.SessionAsync();
    let message = Soup.form_request_new_for_hash('GET', 'http://example.com', {'beans': 'sauce'});
    session.queue_message(message, function() {
        log('soup is so easy!');
    });
}

While we really shouldn’t be crashing here, it’s a hard issue to fix. The technical reason is that when the session gets garbage collected, it calls back into all callbacks to let them know that the request is cancelled. Because the session is in destruction, this will crash gjs when it tries to find the object to pass to the callback.

A quick fix is to prevent the session from being garbage collected. Move it outside the function to get the correct behaviour.

const session = new Soup.SessionAsync();

// This makes the session work under a proxy. The funky syntax here
// is required because of another libsoup quirk, where there's a gobject
// property called 'add-feature', designed as a construct property for
// C convenience.
Soup.Session.prototype.add_feature.call(session, new Soup.ProxyResolverDefault());

function buttonPressed() {
    let message = Soup.form_request_new_for_hash('GET', 'http://example.com', {'beans': 'sauce'});
    session.queue_message(message, function() {
        log('soup is so easy!');
    });
}

And now, these aren’t required, but it helps me tremendously when reading extensions. Following these tips will likely get your extension approved faster.

Consistent and appropriate indentation

If you indent your code properly and consistently, I don’t have to figure out which brace corresponds to what. Makes it easier for me to read code. It also helps if you follow the GNOME Shell’s indentation and brace style (4-space indents, braces on the same line), but I don’t want to start a flamewar in the comments section. If you like another indentation and brace style, use it. The important thing is that you are consistent.

Removing dead and commented out code

If you know that some code you have is dead, remove it. If you’re using a modern VCS like git, you should be able to get it back at any point. Don’t leave a bunch of commented code in there for me to digest through.

More Extension API breaks/improvements

Since GNOME Shell Extensions launched, we’ve been overwhelmed at the response from the community. You guys are awesome, and I’m constantly impressed at what things you guys are doing with the Shell. I’ve been a bit behind on reviewing extensions and adding requested features to the website (I still need to get started on search and investigating some pretty big bugs), and I apologize. I recently landed a bunch of features to the review system to make it easier for me to review your extensions, such as rewriting the diff system. Hopefully things will go a lot faster now.

When Owen Taylor first suggested the project, we worked out some basic sketches and built an overview for the site, but we didn’t want to change the extension API until we knew what extensions were going to need or want to do. As I’ve said before, I felt that we needed a co-operative live enable/disable system for asthetic reasons: the Shell isn’t the fastest thing at restarting, and a user having to spend ten or fifteen seconds to revert your system to the previous state after deciding that an extension sucks makes for a pretty poor experience. That was the only API break that I inserted into the extension system for 3.2, and it was purely to make sure that your extensions were presented in the best way possible.

Multi-file extensions

The extension system in the shell really wasn’t designed for extensions that had multiple files. The many people that built the system had no idea that this clever hack would be commonplace. While I hate to break API, I’d say it’s for the better. Introducing the “extension object”. Previous to now, we would construct a meta object from your metadata.json, install some random things into it and pass it to you, the extension, through the init method. Over time, we inserted more and more on the metadata object where it was this jumbled bag of things that the extension system added, and things that were in the metadata.json file. While adding a few more keys for the prefs tool below, I finally decided it was worth a redesign.

Here’s an example of the new system in action:

const Gio = imports.gi.Gio;

let extension = imports.misc.extensionUtils.getCurrentExtension();
let convenience = extension.imports.convenience;

Gio.app_info_launch_default_for_uri(extension.dir.get_uri(), global.create_app_launch_context());

let metadata = extension.metadata;
Gio.app_info_launch_default_for_uri(metadata.url, global.create_app_launch_context());

function init() {}
function enable() {}
function disable() {}

Of course, you shouldn’t do these things outside of any user interaction, but a concise, imperfect example is better than one that misses the point by building an StButton, sticking it somewhere on enable(), and then removing it on disable().

You might notice that you don’t get the extension object passed to you – you go out and grab it. If you’re curious as to how we know what extension you are, there’s some clever trickery involved.

What’s on this object? A bunch of stuff that used to be on the metadata object: state, type, dir, path, etc. When we introduce new APIs specifically designed for the ease of extension, this is that place we will put them. metadata is now a pristine and untampered representation of your metadata.json file, and we’re going to keep it that way.

Preferences

One thing that we don’t really have an area for in the current 3.2 API is preferences. Unfortunately, GSettings was incompatible with extensions installed into your home directory (like extensions installed from GNOME Shell Extensions do). We couldn’t get this fixed in time for 3.2, but Ryan Lortie spent a lot of his personal time fixing this, and I thank him graciously. We now have a usable API for settings from extensions. It requires a bit more code, as you can see in gnome-shell-extensions’s convenience.js.

I spend a lot of my time looking at the code for extensions. Extensions have been shipping their own Python or gjs scripts that stick up a GTK+ dialog. Someitimes they built a configuration UI inside the extension with St. Sometimes they installed a .desktop file so that they would show up in the Applications section of the Shell. Sometimes they would insert a button to launch the script, or maybe they made users right-click on their item in the toolbar.

When Owen and I sat down to discuss this, we both decided we need some consistent way for extensions to be configured. Introducing the “GNOME Shell Extension Preferences” tool. It’s a new entry point to your extension – prefs.js. Here’s a simple prefs.js file adding support to the Alternate Tab extension.

... and here's a screenshot!

I didn’t make it too fancy, and it’s entirely possible that Giovanni may want to make a better UI.

Your entry point is a function labeled buildPrefsWidget(), and it should return some sort of GTK+ widget. Whatever you return from there will get inserted into preferences widget.

The combobox is for switching between extensions, and it’s ours. Otherwise, go crazy… the world below is your canvas. As usual, try to make it useful and pretty, but the reviewers on the extensions repo can’t and won’t discriminate against ugly UIs. There is still a policy for this, though: the widgets that you put in the preferences UI have to be related to preferences (I think I’ll allow version strings and other things though), and you should not walk up the GTK+ widget tree beyond your preferences widget (to adjust the combobox or manipulate any UI that is not yours). I’m looking forward to the neat things that you guys do with this!

“But how do I launch it?”

Oh, right. The tool is marked NoDisplay, so it won’t show up in the Applications menu. This is intentional. I’m currently working on a patch to SweetTooth so you can launch it directly from the website, provided the browser-plugin is installed correctly.

“Something broke, and I don’t think it’s my fault!”

I’m human. I make mistakes. You can let me know I was or am being a moron by filing a bug for GNOME Shell, or filing a bug for SweetTooth. While I try to respond to everyone’s mail and read all blog comments, I regularly check my task list on these two things, so this is a better system for me.

“How do I try it out?”

Providing everything goes well, GNOME Shell 3.3.5 should be released tonight, which should have all this new awesomeness. If you want to test with the new browser-plugin stuff, you’ll need to copy the browser plugin from gnome-shell/browser-plugin/.libs/libgnome-shell-browser-plugin.so to ~/.mozilla/plugins, and then restart your browser.

GJS Improvements

Myself, Giovanni Campagna as well as Colin Walters have all been working hard trying to make GJS somewhat of a competitor to PyGObject, being a full introspection stack for the GNOME Desktop Environment. Rather than give you a bunch of history, let me just give you a quick taste. There’s much more to the landing than this, such as implementing your own signals, properties, as well as implementing interfaces, but it will take me a few days to come up with an exciting example that fully showcases the power that you now have.

const Lang = imports.lang;

const Cogl = imports.gi.Cogl;
const Clutter = imports.gi.Clutter;
const Gio = imports.gi.Gio;

const MyClutterActor = new Lang.Class({
    Name: 'MyClutterActor',
    Extends: Clutter.Actor,

    vfunc_get_preferred_width: function(actor, forHeight) {
        return [100, 100];
    },

    vfunc_get_preferred_height: function(actor, forWidth) {
        return [100, 100];
    },

    vfunc_paint: function(actor) {
        let alloc = this.get_allocation_box();
        Cogl.set_source_color4ub(255, 0, 0, 255);
        Cogl.rectangle(alloc.x1, alloc.y1, alloc.x2, alloc.y2);
    }
});

const MyClutterEffect = new Lang.Class({
    Name: 'MyClutterEffect',
    Extends: Clutter.DeformEffect,

    vfunc_deform_vertex: function(effect, width, height, vertex) {
        vertex.x += Math.random() * 20 - 10;
        vertex.y += Math.random() * 20 - 10;
    }
});

let actor = new MyClutterActor();
let stage = new Clutter.Stage();
actor.animatev(Clutter.AnimationMode.EASE_IN_OUT_CUBIC, 2000, ['x', 'y'], [200, 200]);
actor.add_effect(new MyClutterEffect());
stage.add_actor(actor);
stage.show_all();

Clutter.main();

We’ll do it live!

Live extension enabling and disabling has landed! This allows users to do this!

I could do that all day. It’s quite fun.

But there’s one little problem. Extension developers? We need to talk.

“What did I do?”

Nothing bad, I promise. You just have a new responsibility: in case the user doesn’t like the extension, you need to undo all the changes that you ever made to the shell. Make it look like your code never touched the place. Otherwise: that video above? The giant illusion will drop to the ground and crack into a thousand pieces. Like that China I broke last weekend.

“Uh, OK. How?”

If your extension looked like this before, you need to make it look like either this, or this.

Put simply: main(), uh, how do I say this? main() has been, uh, let go. Sure. Don’t worry though, he’s been replaced with init(), enable() and disable(). They’re just as cool. I promise. The in-depth obitua — uh, termination notice can be read here. init()‘s best friend, the “helper” object couldn’t make it for this trip. He’ll be here next year.

There’s three simple rules to take into account:

  • There used to be one required method: main(). This has been removed. It has been replaced with a new hook, init(). You should use init() to initialize any state. Do not modify the Shell inside of this function.
  • The init() may return an object, called the “state object”. If you do not return an object, the module is considered the state object. This object is supposed to have two new methods: enable() and disable(). These should modify Shell in the appropriate way. For lack of better vocabulary, extensions which use the module (return nothing from init()) as their state object are called “stateless” objects, and those which don’t are “stateful”.
  • init() is guaranteed to be called at most once per shell session. enable() and disable() may be called at any time. If your extension is enabled at the start of the session, enable() will be called. If your extension is disabled at the start of the session, disable() will not be called, and there is no guarantee that init() will be called.

Additionally, keep in mind that Shell extensions are versioned, so only extensions that are targeted for 3.2 should port to the new API. Some extensions that want to support both 3.0 and 3.2. Fortunately there is an easy way to do exactly that. Just run init(), and then enable():

function main() { init(); enable(); }

… Stateful extensions just need to run enable() on their state object.

function main() { init().enable(); }

Oh, and one last thing. Thank you. You don’t have to believe it, but I’m doing this for you. I’m trying to make it easier for users to safely try out your awesome work. There’s more awesome stuff for you guys on the way. I promise.

One is the loneliest border…

Today is an exciting day for GNOME 3.0! Why? Invisible borders have landed! What does this mean for you? One pixel borders are no longer! No more pixel hunting! The last remnants of the cursor accuracy kingdom have been tore down! Pointer precision, you are —

“What?”

Let me draw you a picture.

Motivation

In GNOME3 before today, if I wanted to conveniently resize a window, I had to hunt for one-pixel borders. I would finally get my mouse positioned right on the border, and then carefully click…

and then miss… So, it’s not surprising people are upset about this. Now, you can go outside the actual visible part of the window, and still resize it!

Now, you have as much space in the world to resize your windows: we extend the resizable area to outside of the actual window. Additionally, the mount of area that is available to resize is a user preference, and completely customizable! That is, the green area in the picture below is completely customizable by a gconf setting.

Implementation

The way I did this was by making every X window a little larger, and use Owen’s existing shaping code to hide the excess area around the borders. This means that toolkits that use the parent window’s size to find their visible extents are now going to break. Use the _NET_FRAME_EXTENTS X atom as a replacement.

Oh, and since I’m using Owen’s existing shaping code, and that uses an 8-bit mask to hide the shaped region, this made it quite a bit easier to add a feature that people have been begging for for a little while: antialiased borders. This hasn’t landed quite yet, but it should be coming soon enough!

Next time, I’ll talk about SweetTooth some more!

Shell-like Switch Widget

A recent SweetTooth screenshot

I’ve been tinkering with web development a lot lately as part of my work with SweetTooth, the GNOME Shell extensions repository I’m working on for GNOME.

At the suggestion of Vinicius Depizzol, who’s been doing a wonderful job redesigning the GNOME web sites, I tried my hand at creating a Shell-like switch widget to allow a user to click a button to turn extensions on and off. Using some precious combination of JS and CSS, I’ve finally made a switch widget I think looks nice. Go ahead: grab it, click on it, fling it around, try and break it! It should just feel “right”.

Figuring out whether the widget is activated and setting the appropriate style classes, the drag logic, and figuring out where to put the switch handle are the only responsibilities of the JS code: everything else is done with CSS, using a combination of border-radius, position: absolute, floats. The triple-line on the grab handle is “drawn” with CSS and manipulates the border styling properties to do its bidding. The animations to fade the background-color and slider position are all CSS3 transitions.

There’s still one unsolved problem, which is the flash and fade of the background color when loading the page, and as much as I tried, I couldn’t solve it. It’s a fallout from CSS thinking it has transitioned from one class to another, but in actuality, it’s just jQuery adding a style class after the node has been constructed. Feel free to yell at me I’m being stupid.

The unnecessary details of my adventure

As usual, the hard part in concocting something like this is compatibility. The web truly isn’t meant for something pixel-perfect like this, and it shows. Browsers like IE, where compatibility is usually a pain, weren’t involved at all here (I haven’t bothered to test them — again, yell at me). Through poking people on IRC, I found that OS X has different font metrics for the “ON” and “OFF” labels

(A mini rant to the CSS WG: why can’t I create a new flow root/block formatting context to contain floats by asking for one, instead having to rely on the side effects of other properties?)

Originally, the DOM looked like this image to the left, if you forget about the switch handle. That is, there were two sub elements that were absolutely positioned inside the switch: the switch was hardcoded to a 64px width. Here, everything worked fine and the code was relatively simple.

The problem was that on some browsers and platforms, the text metrics are inconsistent. Unfortunately, the HTML specification gives no guarantee about text measurement, explicitly stating it to be agent-specific. I could deal with this, but ti would be nice if CSS had relative units for glyph widths or JS let you get at some arbitrary measurements of text. <canvas> lets you get at a TextMetrics object, but the only current attribute there is width. For instance, even though I included a web font, WebKit on Mac (which uses CoreGraphics) looked terrible.

Just to make me more irritated, a normal font-weight made it work perfectly, and was almost pixel-perfect to the way bold looked on every other browser/OS. “Go eat a ham”, said Apple. So, obviously, I can’t rely on things which aren’t meant to be relied on.

Welp. So, using some complicated trickery using width: 50%, floats, float containment, I finally got a DOM I wanted, and should hopefully be a bit more flexible. This one should make the box stretch in accordance to font weights and such: float containment means that the parent will expand to the container, and width: 50% means that the children will expand to fill half their container. Perfect.

I simultaneously love and hate web development because of things like this.