Wayland 1.5 is released. It’s a pretty exciting release, with plenty of features, but the most exciting thing about it is that we can begin work on Wayland 1.6!

… No, I’m serious. Wayland 1.6’s release schedule matches up pretty well with GNOME’s. Wayland 1.6 will be released in the coming weeks before GNOME 3.14, the first version of GNOME with full Wayland support out of the box.

Since development is opening again, we can resume work on xdg-shell, the new desktop shell protocol to replace wl_shell. I alongside Kristian Hoegsberg have been prototyping and implementing this in toolkits and Wayland compositors. We’re extremely happy with our current revision of the bare-bones protocol, so it’s at this point that we want to start evangelizing and outreaching to other communities to make sure that everybody can use it. We’ve been working closely with and taking input from the Wayland community. That means that we’ve been working with the Qt/KDE and Enlightenment/EFL Wayland teams, but anybody who isn’t paying close attention to the Wayland community is out of the loop. This needs to change.

Ironically, as the main Wayland developer for GNOME, I haven’t talked too much about the Wayland protocol. My only two posts on Wayland were a user post about the exciting new features, and one about the legacy X11 backwards compatibility mode, XWayland.

Let’s start with a crash course in Wayland protocols.


As odd as it sounds, Wayland doesn’t have a built-in way to get something like a desktop window system, with draggable, resizable windows. As a next-generation display server, Wayland’s protocol is meant to be a bit more generic than that. Wayland can already be found on mobile devices as part of SailfishOS through the hard work of Jolla and other companies. Engineers at Toyota and Jaguar/Land Rover use Wayland for media centers in cars, as part of a custom Linux distribution called GENIVI. I’m also told that LG’s webOS as used in its smart TVs are investigating Wayland for a display server as well. Dragging and resizing tiny windows from on a phone, or inside a car, or on a TV just isn’t going to be a great experience. Wayland was designed, from the start, to be flexible enough to support a wide variety of use cases.

However, that doesn’t mean that Wayland is all custom protocols: there’s a common denominator between all of these cases. Wayland has a core protocol object called a wl_surface on which clients can show some pixels for output, and retrieve various kinds of input on. This is similar to the concept of X11’s “windows”, which I explain in Xplain. However, the wl_surface isn’t simply a subregion of the overall front buffer. Instead of owning parts of the screen, Wayland clients instead create their own pixel buffers, draw to them, and then “attach” them to the wl_surface, causing a new pixel buffer to be displayed. The wl_surface concept is fairly versatile, and is used any time we need a “live surface” to play around with. For instance, the mouse cursor is done simply by providing the Wayland compositor with a wl_surface. The same thing is done for drag-and-drop icons as well.

An interesting aside is that the model taken by Wayland with wl_surface can actually require less copies and be more efficient than X11 with modern systems. More and more GPUs have more interesting and fancy hardware at scanout time. With the rise of low-power phones that require rich graphics, we’re seeing a resurgence in fixed-function alpha blending and compositing hardware when doing scanout, similar to what game consoles like the NES and SNES had (but they called them “sprites“). X11’s model of a giant front buffer that apps draw to means that we must copy all contents to the front buffer eventually from the CPU, while Wayland’s model means that applications can simply hand us their pixel buffers, and we can choose to show it as an overlay, which removes any copy. And if an application is full-screen, we can simply tell the GPU to scan out from that application’s buffer directly, instead of having to copy.


OK, so I’ve talked about wl_surface. How does this relate to xdg-shell? Since a wl_surface can be used for lots of different purposes, like cursors, simply creating the wl_surface and attaching a buffer doesn’t put it on the screen. Instead, first, we need to let the Wayland compositor know that this wl_surface is intended to be a desktop-style window that can be dragged and resized around. It should appear in Alt-Tab, and clicking on it should give it keyboard focus, etc.

Wayland’s approach here is a bit odd, but to give a wl_surface a role, we construct a new wrapper object which has all of our desktop-level protocol functions, and then hand it the wl_surface. In this case, the protocol that we use to create this role is known as “xdg-shell”, and the wrapper object is known as an “xdg_surface”. The name is a reference to the FreeDesktop Group, an open mailing list where cross-desktop standards are discussed between all the different desktops. For historical reasons, it’s abbreviated as “XDG”. Members from the XDG community have all been contributing to xdg-shell.

The approach of a low-level structure with a high-level role is actually fairly similar to the approach taken in X11. X11 simply provides a data structure called a “window”, as I explained in Xplain: a tool that you can use to construct your interface by pushing pixels here, and getting input there. An external process called a “window manager” turns this window from a simple region of the front buffer into a window with a title and icon that the user can move around, resize, minimize and maximize with keyboard shortcuts and a taskbar. The window manager and the client applications both agree to cooperate and follow a series of complex standards like the ICCCM and EWMH that allow you to provide this “role”. Though I’ve never actually worked on any environments other than traditional desktops, I’d imagine that in more special-case environments, different protocols are used instead, and the concept of the window manager is completely dropped.

X11 has no easy, simple way of creating protocol extensions. Something as simple as a new request or a new event requires a bunch of byte-marshalling code in client libraries, extra support code added in the server, in toolkits, and a new set of APIs. It’s a pain, trust me. Instead, X11 does provide a generic way to send events to clients, and a series of key/value pairs on windows called “properties”, so standards like these often use the generic mechanisms rather than building an actual new protocol, since the effort is better spent elsewhere. It’s an unfortunate way that X11 was developed.

Wayland makes it remarkably easy to create a new protocol extension involving new objects and custom methods. You write up a simple XML description of your protocol, and an automatic tool, wayland-scanner, generates server-side and client-side marshalling code for you. All that you need to do now is write the implementation side of things. On the client, that means creating the object and calling methods on it. Because it’s so easy to write custom extensions in Wayland, we haven’t even bothered creating a generic property or event mechanism. Things with a structure allow us a lot more stability and rigidity.


Long-time users or developers of Wayland might notice this sounds similar to an older protocol known as wl_shell or wl_shell_surface. The intuition is correct: xdg-shell is a direct replacement for wl_shell. wl_shell_surface had a number of frustrating limitations, and due to its inclusion in the Wayland 1.0 core, it is harder to change and make better. As Fred Brooks told us, “write one to throw away”.

xdg-shell can be seen as a replacement for wl_shell_surface, and it solves a number of fundamental issues and race conditions which I’d prefer not to go into here (but if you ask nicely in the comments, I might oblige!), but I guess you’ll have to trust me when I say that they were highly visible user bugs and frustrations about some weird lagginess or flickering when using Wayland. We’re happy that these are gone.

A call to arms

The last remaining ticket item I have to work in xdg-shell is related to “window geometry”, as a way of communicating where the user’s concept of the edge of the window is. This requires significant reworking of the code in weston, mutter, and GTK+. After that, it will serve the needs of GTK+ and GNOME perfectly.

Does it serve the needs of your desktop? The last thing we want to do is to solidify the xdg-shell protocol, only to find that a few months later, it doesn’t quite work right for tiling WMs or for EFL or KDE applications. We’re always experimenting with things like this to make sure everything can work, but really, we’re only so many people, and others testing it out and porting their toolkits and compositors over can’t ever hurt.

So, any and all help is appreicated! If you have any feedback on xdg-shell so far, or need any help understanding this, feel free to post about it to the Wayland mailing list or poke me on IRC (#wayland on Freenode, my nick in there is Jasper).

As always, if anybody has any questions, no matter how dumb or stupid, please let me know! Comments are open, and I always try to reply.

26 thoughts on “xdg-shell

  1. Hi Jasper,

    Do you know why Wayland didn’t elect to bind itself more tightly to egl so as to avoid the impedance? The only reason that makes sense to me is if you expect Wayland to run on platforms other than egl or you expect egl to be supplanted soonish. Otherwise the overhead doesn’t seem worth it.
    Also, can you explain the relationship between offscreen buffers and wl_surface? You said apps request buffers, draw to them THEN attach them to wl_surface. It’s at that point that Wayland knows the app is finished drawing? If that’s the case why the need for both the offscreen buffer and wl_surface? Is wl_surface immutable? Can the app request multiple wl_surface for flipping or is that done via the offscreen buffer?


    • We want to provide the opportunity for simple clients to not have to boot up EGL, or to be used in cases where a GL stack would be slow (see: Raspberry Pi). For these, we have “SHM buffers”, where instead of GL surfaces, applications write to a formatted buffer in shared memory. For a video player, these could even be YUV buffers which would be passed directly to a hardware overlay. You can imagine that with more special cases, we could have other buffer types (Google/Netflix have a wl_tpm_buffer in which the buffer contents are decoded by a special hardware module).

      wl_surface is more like a generic abstraction over a surface that can retrieve input and output, and a grab bag of assorted things (e.g. “set_input_region” which lets you set a region for input, “frame” which lets the compositor and client do cooperative redraw throttling). As I said, it can be used for a cursor as well, and it’s also part of the subsurface mechanism.

      wl_surface only appears onscreen once you attach a role like with xdg_shell.get_xdg_surface, or with wl_pointer.set_cursor.

      An app really shouldn’t create multiple wl_surfaces for flipping, it should instead create multiple wl_buffers, and then alternate between attaching them on altering frames.

      It’s designed so that each wl_surface that has a role is like its own window. If you requested multiple wl_surfaces, the app would be constantly showing and hiding like it was a new window. The xdg_surface is attached 1:1 to the wl_surface.

      • Great example with the rpi. I’d completely forgotten about its rather unusual architecture. The other examples were helpful as well.
        Also, thanks for the clairification with regards to the surface/buffer relationship.

    • Right now, since I’m sort of in charge of the xdg-shell protocol, I can keep tabs on all new development, and know when it has to be copied to all the repositories and be updated.

      Eventually, we will want to make this in a shared protocol Wayland protocol in the Wayland core. Until then, we’re just going to consider the one in the Weston repo the “master” one.

  2. Hi,
    will there be a way to share surfaces with different processes ?
    Some of the technologies for this in OSX are the most interesting thing to happrn im the realm of VJing and graphocs generally for years.

    • No. Surfaces are resources, and resources are always client-private in the protocol. What kinds of things do resource sharing allow on OS X?

        • For this case, what the WebKit guys are doing is building a subcompositor. You pretend you’re a Wayland compositor which opens its own socket, has its own children, and then presents a composited buffer back up to the clients. You can even do this zero-copy using subsurfaces when talking to the host. It’s very different from the screensharing use case, which is “I want to access the pixel buffers of other arbitrary clients”.

      • Well, don’t know about OSX but, but I’m thinking something along the lines fo a Twitch.tv client. “Hey, get the buffer of that process there and stream it”.
        I think this is possible on winblows. :-/

        • This is the “screenshooter” case where you want to access the pixel buffers of other arbitrary clients. This has been talked about in depth on the mailing list, and Wayland even has a private screenshooter protocol for this ( http://cgit.freedesktop.org/wayland/weston/tree/protocol/screenshooter.xml )

          What we want to make sure is that we get the user design of this right. We don’t want to allow clients to just be able to take screenshots of the screen without the user noticing, since that’s a giant security hazard. However, there is no reason it can’t be done, technically.

          • You right on that, user has no know what’s happening in this cases. Just one more question, does screenshooter works for “moving images” (game like). So, I have a process, that is using OpenGL to render itself. Will this protocol be able to keep up with 60/120fps and record the several screenshots? Basically, can it take 120 pictures a second? ;)

  3. How would a screen annotator work under this system? (Basically, something that let’s you switch between interacting with the desktop normally and drawing tools, with the resultant drawing Always On Top.)

    It could request a surface the size of the screen (! Forgot about dual monitors !) and start drawing there, but how would it pass on the input to the windows below it when in normal mode? I know that Wayland has input confidentiality and integrity, so how would that work?

    • Screen annotators would have to be provided by a special kind of privileged client, or by the compositor itself.

  4. Hi Jasper,

    I know it’s a bit off-topic, but do you know if there is any progress to make desktop OpenGL available for Wayland without X installed? As far as I know it’s currently impossible, because libGL contains some X-specific code for context creation and thus depends on X (as mentioned in the Wayland FAQ).

    • We now use full GL in Wayland and have for a while. The FAQ is simply out of date. The way your distribution libGL might mean that it includes the X11 platform and thus links to X11, but you should be able to build mesa without that — I did see some work from Intel about this a while ago! You might entering unexplored territory, and things might break, though.

  5. Great post Jasper. I have a few questions because I am a curious fox.
    How does a compositor detect whether an application is in full screen? And how are application which run in a lower screen resolution than the compositor handled?

  6. How is multiple display support? For example, have we considered multiple full-screen apps (like an editor and a movie player), or being able to work on another display while one is in full screen (e.g. an active browser or text editory while a movie is playing full-screen on a secondar display)?

    A lot of systems were initially designed around the concept of one full-screen app, and only on the primary display, and it’s a bit constrictive these days.

    Also, would this be the best place to consider the needs of tiling WMs? It’s possible that arbitrarily sectioning off pieces of a large display (like a 4K display) and treating them as different displays could simplify tiling work.

    • The set_fullscreen request takes as an argument the monitor the app wants to be fullscreen on. Wayland was designed from the very start to be multi-monitor-aware. I’m not aware with any protocol or API issues that would stop a great multi-monitor experience.

      Right now, we consider tiling to be a special case of maximized windows, not full-screen windows. Compositors can choose on their own whether they want one or the other, though.

  7. Since you’re working on window geometry now, what happens if you have two screens connected with different DPI settings, e.g. a hidpi laptop screen and an external 90dpi monitor? If you drag a window from one screen to the other, is the client notified of the dpi change? What happens if a window half-overlaps both screens?

      • Wow, it seems like you have thought of every eventuality. This is awsome!

        Is it possible for a client to provied a buffer per monitor for a single surface? (This would enable correct subpixel font rendering on multiple monitors with different subpixel orders.)

        • Unfortunately, not right now. Surfaces submit one buffer to be displayed everywhere. With how prevalent Hi-DPI displays are and the inability to do proper subpixel antialiasing with GL, I don’t think subpixel antialiasing is a great use case to optimize for. You’re welcome to bring it up on the wayland-devel mailing list, though.

          • Subpixel rendering on hidpi monitors makes a large difference. It essentially triples the apparent resolution (e.g. 6000dpi instead of 200dpi) *without* the rainbow artifacts that plague lodpi monitors.

            As long as the client can query the subpixel format of each monitor, then we can perform subpixel AA on the client side without any involvement of Wayland. Client-side font libraries will have to get a little bit smarter, which is a tractable problem.

            Note that you *can* perform correct subpixel AA for text in OpenGL (I’ve been doing that for close to a decade now). You cannot use plain GPU MSAA for this, but, again, this is possible and desirable (depending on the application.)

Leave a Reply

Your email address will not be published. Required fields are marked *