Managing focus in the shadow DOM

Update: I wrote a more recent follow-up to this post.

One of the trickiest things about the shadow DOM is that it subverts web developers’ expectations about how the DOM works. In the normal rules of the game, document.querySelectorAll('*') grabs all the elements in the DOM. With the shadow DOM, though, it doesn’t work that way: shadow elements are encapsulated.

Other classic DOM APIs, such as element.children and element.parentElement, are similarly unable to traverse shadow boundaries. Instead, you have to use more esoteric APIs like element.shadowRoot and getRootNode(), which didn’t exist before shadow DOM came onto the scene.

In practice, this means that a lot of JavaScript libraries designed for the pre-shadow DOM era might not work well if you’re using web components. This comes up more often than you might think.

The problem

For example, sometimes you want to iterate through all the tabbable elements on a page. Maybe you’re doing this because you want to build a focus trap for a modal dialog, or because you’re implementing arrow key navigation for KaiOS devices.

Now, without doing anything, elements inside of the shadow DOM are already focusable or tabbable just like any other element on the page. For instance, with my own emoji-picker-element, you can tab through its <input>, <button>s, etc.:

 

When implementing a focus trap or arrow key navigation, we want to preserve this existing behavior. So the first challenge is to emulate whatever the browser normally does when you press Tab or Shift+Tab. In this case, shadow DOM makes things a bit more complicated because you can’t just use a straightforward querySelectorAll() (or other pre-shadow DOM iteration techniques) to find all the tabbable elements.

Pedantic note: an element can be focusable but not tabbable. For instance, when using tabindex="-1", an element can be focused when clicking, but not when tabbing through the page.

While researching this, I found that a lot of off-the-shelf JavaScript libraries for focus management don’t handle the shadow DOM properly. For example, focusable provides a query selector string that you’re intended to use like so:

import focusable from 'focusable'

document.querySelectorAll(focusable)

Unfortunately, this can’t reach inside the shadow DOM, so it won’t work for something like emoji-picker-element. Bummer.

To be fair to focusable, though, many other libraries in the same category (focus traps, “get all tabbable elements,” accessible dialogs, etc.) have the same problem. So in this post, I’d like to explain what these libraries would need to do to support shadow DOM.

The solution

I’ve written a couple of JavaScript packages that deal with shadow DOM: kagekiri, which implements querySelectorAll() in a way that can traverse shadow boundaries, and arrow-key-navigation, which makes the left and right keys change focus.

To understand how these libraries work, let’s first understand the problem we’re trying to solve. In a non-shadow DOM context, what does this do?

document.querySelectorAll('*')

If you answered “grab all the elements in the DOM,” you’re absolutely right. But more importantly: what order are the elements returned in? It turns out that they’re returned in a depth-first tree traversal order, which is crucial because this is the same order as when the user presses Tab or Shift+Tab to change focus. (Let’s ignore positive tabindex values for the moment, which are an anti-pattern anyway.)

In the case of shadow DOM, we want to maintain this depth-first order, while also piercing into the shadow DOM for any shadow roots we encounter. Essentially, we want to pretend that the shadow DOM doesn’t exist.

There are a few ways you can do this. In kagekiri, I implemented a depth-first search myself, whereas in arrow-key-navigation, I used a TreeWalker, which is a somewhat obscure API that traverses elements in depth-first order. Either way, the main insight is that you need a way to enumerate a node’s “shadow children” as well as its actual children (which can be mixed together in the case of slotted elements). You also need to be able to run the reverse logic: finding the “light” parent of a shadow tree. And of course, this has to be recursive, since shadow roots can be nested within other shadow roots.

Rather than bore you with the details, suffice it to say that you need roughly a dozen lines of code, both for enumerating an element’s children and finding an element’s parent. In the non-shadow DOM world, these would be equivalent to a simple element.children and element.parentElement respectively.

Why the browser should handle this

Here’s the thing: I don’t particularly want to explain every line of code required for this exercise. I just want to impress upon you that this is a lot of heavy lifting for something that should probably be exposed as a web API. It feels silly that the browser knows perfectly well which element it would focus if I typed Tab or Shift+Tab, but as a web developer I have to reverse-engineer this behavior.

You might say that I’m missing the whole point of shadow DOM: after all, encapsulation is one of its major selling points. But I’d counter that a lot of folks are using shadow DOM because it’s the only way to get native CSS encapsulation (similar to “scoped” CSS in frameworks like Vue and Svelte), not necessarily DOM API encapsulation. So the fact that it breaks querySelectorAll() is a downside rather than an upside.

Here’s a sketch of my dream API:

element.getNextTabbableElement()
element.getPreviousTabbableElement()

Perhaps, like getRootNode(), these APIs could also offer an option for whether or not you want to pierce the shadow boundary. In any case, an API like this would obviate the need for the hacks described in this post.

I’d argue that browsers should provide such an API not only because of shadow DOM, but also because of built-in elements like <video> and <audio>. These behave like closed-shadow roots, in that they contain tabbable elements (i.e. the pause/play/track controls), but you can’t reach inside to manipulate them.

Screenshot of GNOME Web (WebKit) browser on an MDN video element demo page showing the dev tools open with a closed use agent shadow content for the controls of the video

WebKit’s developer tools helpfully shows the video controls as “shadow content (user agent).” You can look, but you can’t touch!

As far as I know, there’s no way to implement a WAI-ARIA compliant modal dialog with a standard <video controls> or <audio controls> inside. Instead, you would have to build your own audio/video player from scratch.

Brief aside: dialog element

There is the native <dialog> element now implemented in Chrome, and it does come with a built-in focus trap if you use showModal(). And this focus trap actually handles shadow DOM correctly, including closed shadow roots like <video controls>!

Unfortunately, though, it doesn’t quite follow the WAI-ARIA guidelines. The problems are that 1) closing the dialog doesn’t return focus to the previously focused element in the document, and 2) the focus trap doesn’t “cycle” through tabbable elements in the modal – instead, focus escapes to the browser chrome itself.

The first issue is irksome but not impossible to solve: you just have to listen for dialog open/close events and keep track of document.activeElement. It’s even possible to patch the correct behavior onto the native <dialog> element. (Shadow DOM, of course, makes this more complicated because activeElement can be nested inside shadow roots. I.e., you have to keep drilling into document.activeElement.shadowRoot.activeElement, etc.).

As for the second issue, it might not be considered a dealbreaker – at least the focus is trapped, even if it’s not completely compliant with WAI-ARIA. But it’s still disappointing that we can’t just use the <dialog> element as-is and get a fully accessible modal dialog, per the standard definition of “accessible.”

Update: After publishing this post, Chris Coyier clued me in to the inert attribute. Although it’s not shipped in any browser yet, I did write a demo of building a modal dialog with this API. After testing in Chrome and Firefox with the right flags enabled, though, it looks like the behavior is similar to <dialog> – focus is correctly trapped, but escapes to the browser chrome itself.

Second update: After an informal poll of users of assistive technologies, the consensus seems to be that having focus escape to the browser chrome is not ideal, but not a show-stopper as long as you can Shift+Tab to get back into the dialog. So it looks like when inert or <dialog> are more widely available in browsers, that will be the only way to deal with <video controls> and <audio controls> in a focus trap.

Last update (I promise!): Native <dialog> also seems to be the only way to have the Esc key dismiss the modal while focus is inside the <video>/<audio> controls.

Conclusion

Handling focus inside of the shadow DOM is not easy. Managing focus in the DOM has never been particularly easy (see the source code for any accessible dialog component for an example), and shadow DOM just makes things that much trickier by complicating a basic routine like DOM traversal.

Normally, DOM traversal is the kind of straightforward exercise you’d expect to see in a web dev job interview. But once you throw shadow DOM into the mix, I’d expect most working web developers to be unable to come up with the correct algorithm off the tops of their heads. (I know I can’t, and I’ve written it twice.)

As I’ve said in a previous post, though, I think it’s still early days for web components and shadow DOM. Blog posts like this are my attempt to sketch out the current set of problems and working solutions, and to try to point toward better solutions. Hopefully the ecosystem and browser APIs will eventually adapt to support shadow DOM and focus management more broadly.

More discussion about native <dialog> and Tab behavior can be found in this issue.

Thanks to Thomas Steiner and Sam Thorogood for feedback on a draft of this post.

3 responses to this post.

  1. Thanks for the big writeup. We’ve run into big problems with focus trapping for indeed shadowDOm encapsulated dialogs. Another big problem we have is form encapsulation: the shadowDOM boundary breaks a lot of native functionality. For example, if input fields are shadowDOM encapsulated, the browser does not recognize a surrounding form and hence it will never do a form-submit on enter of the last field (or show “Go” in a soft-keyboard instead of “next”). It shows that there is still a lot to be ironed out before it truly becomes an integrated part of webpages.
    Encapsulation is great, but in the end, we are still looking at a “web page” with one shared scope and state, we have to deal with that.

    Reply

    • Yep, there are other accessibility problems with shadow DOM too, such as not being able to associate IDs across the shadow boundary. For instance, say you have two elements in different shadow trees – you can’t use for/aria-labelledby/aria-describedby/aria-activedescendant/etc to associate the two, because element IDs are unique to each shadow tree.

      My understanding is that Accessibility Object Model should solve this, because you can programmatically describe the relationship between elements. I’m not sure if it handles your form scenario though.

      Reply

  2. […] dialog element in HTML, but unfortunately it has massive accessibility issues. With the Shadow DOM, managing focus isn’t easy either. We can use the inert attribute to remove, and then restore the ability of interactive […]

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.