Accidentally Quadratic Constant Folding

Neal Gafter writes:


I just fixed a bug in the implementation of constant folding in the C# and VB.Net compilers. They used to require O(n^2) space. Now they require O(n) space.

The naive approach to implementing string concatenation for an expression such as "S1" + "S2" + "S3" is to recursively visit the left-hand and right-hand operands and compute their constant string values, and then concatenate these strings to produce the constant string for the concatenation. It seems obviously correct and the only reasonable way to do it. But it has a serious problem.

The concatenation operation is left associative, so for a long sequence of n concatenations, each of size k, we would produce intermediate results of size k2k3k, and so on up to nk. The total amount of space needed to represent this collection of strings is O(k n^2). For moderately-sized values of n, this would cause the compilers to run out of memory and crash. This is not merely a theoretical problem, as we had been getting customer reports of this problem about quarterly for the past 7-8 years.

Fixing the problem was fairly straightforward, using a technique I learned from the Cedar/Mesa compiler in the early ‘80s. Rather than representing string constants in the compilers using strings, they are now represented using a data structure called Ropes. Concatenating two ropes requires only a small constant amount of memory.

Any compiler for a language that supports constant folding of string concatenation will run into this problem unless there is a similarly low-space representation of two concatenated strings. This is a technique that should be in the bag of tricks of any compiler writer.

accidentally quadratic submission compilers ropes strings

`godoc` struct rendering

godoc is Go’s tool for extracting doc comments from source code and rendering HTML documentation (loosely parallel Java’s “javadoc”).

godoc processes Go struct definitions in two passes. First, it runs over the AST recursively building rendering documentation and declarations into a textual output buffer containing an HTML fragment. The walker for this pass is shared with the rest of godoc’s documentation rendering (so that e.g. top-level definitions and struct fields share the bulk of their rendering code).

Then, it performs a second pass, modifying the HTML to add anchors to field names, so that e.g. https://golang.org/pkg/net/http/#Response.Header links specifically to the Header field in the Response struct.

Unfortunately, previously, godoc implemented this with nested loops over the struct fields and over the rendered text:

for _, f := range st.Fields.List {
    foreachLine(buf.Bytes(), func(line []byte) {
        …
    }
}

Since each struct field added at least one line to the output, this turned into an O(n²) loop.

The fix walks the two lists sequentially, first building up a map of names that needs links, and then walking the text in a single pass, looking for lines that match any of the names.

accidentally quadratic godoc documentation text

mercurial changegroup application

Mercurial (sometimes called “hg”, the name of its command-line tool, named after the elemental symbol for mercury) is a distributed version-control system.

When applying a “change group” (a group of changes applied as a unit, e.g. in a single hg push), Mercurial tracks the heads before and after the changeset. Previously, it stored the “oldheads” list of pre-changeset heads as a Python list, and computed the newly-created heads post-changeset with:

newheads = [h for h in repo.heads() if h not in oldheads]

Since h not in oldheads is O(n) on a list, a large repository would incur an O(N²) cost with respect to the number of heads in the repository.

What’s perhaps remarkable about this change is the size of the fix, which I think is the smallest diff in Accidentally Quadratic’s history: Wrapping a simple set(…) around the instantiation of the list.

In my judgment this fix is notable for a few reasons:

  • It’s a testament to Python’s expressiveness and consistency of APIs, that swapping out a set for a list can very often be a 5-character change.
  • It’s a vote in favor of all languages having a readily-available set type in their standard libraries along side lists and mapping types, so that fixes of this type are easy, and also hopefully less-often necessary in the first place.
  • My first point notwithstanding, it’s an argument against having a polymorphic in or contains method that silently degrades to O(n) behavior on lists or similar containers. Wherever practical, asymptotics should be clear from the call site!
accidentally quadratic hg membership list set

Elasticsearch IndicesQuery

Elasticsearch is a distributed search engine. One can store “documents” within “indices”, which are collections of documents. One common pattern for storing time based data is to use one index per day. E.g.: tweets-2017-01-01, tweets-2017-02-01 and so on. This makes having many indices decently common, and it also makes searching across many indices common in a single search.

Elasticsearch has a query type called IndicesQuery, which lets you (among other things) optimise your query path by knowing which index to search in. For example, say that you have one index pattern for tweets (tweets-YYYY-MM-DD) and one for Reddit posts (reddit-YYYY-MM-DD). Say that someone searches for “hashtag: kittens OR subreddit: awww”. You know that you don’t have to execute the hashtag query in the Reddit indices and that you do not have to execute the subreddit query in the tweet indices. This is done by specifying to only execute the hashtag in indices that match tweets-*.

When executing the query, Elasticsearch will consider each shard (of an index) separately. Usually one has 2-5 shards per index. For each shard, it will check if the index name matches the pattern given by the IndicesQuery. This is where we find our performance bug.

The algorithm in the 1.X and 2.X versions of ES work by first expanding a pattern into the list of all matching indices – tweets-* becomes ["tweets-2017-01-01", "tweets-2017-01-02", …]. Then, for each index being considered, it checks for membership in that list:

 protected boolean matchesIndices(String currentIndex, String... indices) {
    final String[] concreteIndices = indexNameExpressionResolver.concreteIndices(clusterService.state(),     IndicesOptions.lenientExpandOpen(), indices);
    for (String index : concreteIndices) {
      if (Regex.simpleMatch(index, currentIndex)) {
        return true;
      }
    }
  return false;
}

(Here, indices is a list of patterns, and the concreteIndices() method applies the wildcard internally to the list of all indices)

This means that the time complexity per shard is O(n), where n is the number of indices in your cluster. Since we consider at least one shard per index, the overall search complexity then goes to O(n^2). Furthermore, this expansion (and Elasticsearch’s Regex.simpleMatch pattern matcher) generate a lot of garbage, stressing the garbage collector and making the process even slower.

(For those of you arriving here via Russ’s blog, you’ll be chagrined to note that ES implements an exponential-time backtracking glob matcher, although that fact isn’t implicated in the bug in question.)

We used this query in our not-too-shabby production cluster. At the time, it had a total of 1152 cores and we searched over roughly 150 TB of data in about 8500 indices. We discovered that we spent almost half of our CPU time in the Regex.simpleMatch method. We patched the algorithm to instead directly check if the current index matches any of the specified indices, making it O(m), where m is the number of index patterns specified in the query (we usually had 2-3). On top of the saved CPU, this also had the benefit of making us spent about 1/3 as much time in Garbage Collection pauses, due to the fewer allocations of Strings.

This seems to have been fixed in the ‘Great Query Rewrite’ for Elasticsearch 5. I do not know if this was explicitly targeted as a performance fix, if it was by accident or if someone just thought ‘oh this might be slow’.


This post was contributed by Anton Hägerstrand.

accidentally quadratic submission elasticsearch wut java pattern matching

vim TAGS lookup

A tags file is a classic UNIX format for storing an index of where symbols are defined in a source tree, for ease of finding a symbol definition from within an editor. Both vim and emacs have native support for loading TAGS files and using them to jump to symbol definitions.

When performing tags lookups, vim deduplicates results, to suppress identical matches (perhaps in case you’ve concatenated multiple tags files? I’ll admit to not fully understanding the intent here).

However, historically, it stored the matches in an array, which meant that looking up tags was accidentally quadratic in the number of returned matches. Presumably this escaped notice in part because usually most symbols have only a few definitions, but it was a problem for programmatic use, or in some edge cases of manual search or pathological code bases.

The fix, as it often is on this blog, was to store the results in a hash table, providing duplicate-merging for free.

Interestingly, some earlier developer had noticed that this code could be slow, but instead of fixing it, had added code to check for ^C and abort the entire match, so as to return use of the editor to the user! We may never know if that author didn’t realize there was a relatively-easy fast solution, or just thought it was too much work in C (as of the fix, vim did have a built-in hash table type, but it needed to be extended to support non-NULL-terminated strings).

Thanks to James McCoy for fixing this one as well as bringing it to my attention.

accidentally quadratic vim tags c hash table

Capistrano server definition

Capistrano is a server automation and deployment tool written in Ruby. It provides a Ruby DSL for defining deployments and other operations on a set of servers, and executing those flows.

A Capistrano environment defines a number of servers, each of which has zero or more “roles”, which help define which of your rules should be executed on them. Servers are distinguished by their hostname and ssh ports; Two servers with the same hostname and ssh port are considered to be the same server.

Capistrano allows you to define servers using either the role helper, which attaches a role to a server (defining the server if needed), or the more explicit server method. If you define multiple roles on the same server (as in this example in the docs):

role :app, %w{example.com}
role :web, %w{example.com}
role :db,  %w{example.com}

Then Capistrano identifies that those servers are the same server, and merges the roles into a single Server:

[10] pry(main)> Capistrano::Configuration.env.servers
=> #<Capistrano::Configuration::Servers:0x00000001763eb8
 @servers=
  [#<Capistrano::Configuration::Server:0x00000001763c88
    @hostname="example.com",
    @keys=[],
    @local=false,
    @port=nil,
    @properties=#<Capistrano::Configuration::Server::Properties:0x00000001763940 @properties={}, @roles=#<Set: {:db, :app, :web}>>,
    @user=nil>]>

However, until recently, this merging and deduplication was performed by walking the list of all registered servers, comparing hostname and port. Obviously, this meant that each new server definition had to scan all previous servers, for a classic O(n²) bug.

The patch resolved the issue in the obvious way, by storing servers in a hash keyed by the (hostname, port) pair.

Thanks to Daniel Benamy for finding, fixing, and bringing this one to my attention.

accidentally quadratic capistrano ruby devops

Ruby `reject!`

The reject! method in Ruby selectively removes elements from an array in-place, based on a provided predicate (in the form of a block).

Between Ruby 1.9.3 and Ruby 2.3, reject! was accidentally quadratic.

The underlying bug was fairly straightforward. Every time the provided predicate returned true, the code immediately deleted that element:

if (RTEST(rb_yield(v))) {
    rb_ary_delete_at(ary, i);
    result = ary;
}

In an array, deleting an index necessitates shifting back all the following episodes, and so is an O(n) operation; If your predicate rejects a fixed fraction of elements, the total runtime is this O(n²).

The bug was fixed by keeping a count of accepted elements, and moving each element into its proper final position as it is scanned, and truncating the array at the end.

This bug is fairly straightforward, but the part I find most interesting was why it was introduced. The code used to be linear, but it regressed in response to bug #2545, which concerned the behavior when the block passed to reject! executed a break or otherwise exited early. Because reject! is in-place, any partial modifications it makes are still visible after an early exit, and reject! was leaving the array in a nonsensical state. The obvious fix was to ensure that the array was always in a consistent state, which is what resulted in the “delete every time” behavior.

I find this interesting as a cautionary tale of how several of Ruby’s features (here, ubiquitous mutability, blocks, and nonlocal exits) interact to create suprising edge cases that need to be addressed, and how addressing those edge cases can easily result in yet more problems (here, quadratic performance). In my mind, I’d just rather not have a reject! at all, and callers who need to mutate an array in-place can figure out how to do safely with respect to their own use cases.

(Thanks to Russell Davis for bringing this one to my attention).

accidentally quadratic ruby reject!

Chrome Server-Sent Event Parsing

Server-sent events are a standard for web servers to stream a sequence of events to a browser over a single HTTP connection. You can view them as a simpler, unidirectional, alternative to websockets (that doesn’t require support from any middleboxes), or as an optimization over one-event-per-request longpolling.

The wire format for an event consists of a number of newline-separated key-value pairs, in a simple key: value format. For instance, the documentation provides the example event:

event: userconnect
data: {"username": "bobby", "time": "02:33:48"}

Until recently, Chrome’s parser for this format worked strictly line-at-a-time, maintaining no additional state. When it received data from the server, it appended the data to its buffer, and then attempted to parse a k-v pair off the first line:

void EventSource::didReceiveData(const char* data, unsigned length)
{
    append(m_receiveBuf, m_decoder->decode(data, length));
    parseEventStream();
}

If it didn’t find a complete event (e.g. missing newline), the parser just stopped, waiting for the next batch of data to arrive.

This works well as long each line arrives in a relatively small number of packets. However, if you have a very long line (e.g. a 16MB data field), it will (of necessity) be chunked into many smaller packets on the wire, which will end up getting passed to didReceiveData separately. In response to each packet, the parser will scan the entire line to the end, note the lack of newline, and then abort. Thus, we will do N/chunk-size scans of our line, each one scanning more and more bytes, resulting in an overall quadratic blowup!

The fix makes the new parser more stateful, so that it keeps track of its progress in the parse across each new packet, and avoids scanning the buffer O(N) times.

(bug submitted by @whyareallthesenamestakenalready)

accidentally quadratic chrome server-sent-events strings parsing

Rust hash iteration+reinsertion

It was recently discovered that some surprising operations on Rust’s standard hash table types could go quadratic.

Perhaps the simplest illustration is this snippet from a comment, here simplified even further:

use std::collections::hash_set::HashSet;

fn main() {
    println!("populating...");

    let mut one = HashSet::new();
    for i in 1..5000000 {
        one.insert(i);
    }
    println!("cloning...");

    let mut two = HashSet::new();
    for v in one {
        two.insert(v);
    }
}

In this example, the first loop (populating one) finishes fairly quickly, as you would expect, but the second loop (copying into two) takes much, much longer.

I enjoy this bug for at least two reasons: One, it’s fun technically, allowing us to take a brief deep dive into hash-table implementation – something most of us are normally able to treat as a solved problem – and two, it’s a great example of subtle quadratic behavior in a system specifically designed by smart and well-informed developers to avoid the possibility of accidental quadratic behavior!

Robin-Hood hashing

To understand the bug, we’re going to need to understand the hash table strategy used by Rust’s standard hash tables. A major challenge in any general-purpose hash table implementation is how to handle hash collisions; If you have a table with N buckets, eventually you’ll get two elements with the same value of hashcode%N, and what do you do?

Rust uses Robin Hood hashing, a relatively old technique that’s recently received new attention. Robin Hood hashing is a variation on good old open addressing with linear probing, with a small twist.

First, let’s refresh our memory:

In hash tables, “open addressing” refers to the technique of, upon encountering a collision, somehow selecting an alternate location in the hash table. It’s most commonly contrasted with “chaining”, in which you store multiple entries in a single hash bucket, typically using a linked list.

Linear probing, in particular, means that select alternate buckets by scanning forward from the natural bucket. If you try to insert an element with hash code H into a table with N buckets, and bucket H%N is full, you try (H+1)%N, (H+2)%N and so on, until you find an empty bucket. On lookup, you start at H%N and scan forward until you find your element or an empty bucket.

Open addressing with linear probing is arguably the simplest possible general-purpose hash table implementation, and it can work fine (assuming a good hash function) as long as you don’t let your load factor get too high (reminder: the load factor is the number of elements stored in the table, divided by the number of buckets in the hash table. A load factor of 0.5 means that half the buckets are in use). If your load factor gets too high, then elements can end up extremely far from their natural hash bucket, necessitating a lot of scanning on inserts and lookups.

Robin Hood hashing improves on linear probing with a simple trick: When you’re scanning forward from H%N, look at each element you encounter. If the inserting element is further from its “natural” bucket than the element in the bucket you’re considering, then swap the new element and the element in that bucket, and continue scanning with the other element.

Let’s consider an example. Imagine this is a subset of a hash table; I’ve labeled each bucket below it with its index, and inside the bucket I’ve placed hashcode%N, what I’ve been calling the “natural” index of an element:

+---+---+---+---+---+
| 0 | 0 | 2 | 1 |   | ...
+---+---+---+---+---+
  0   1   2   3   4

Now suppose we want to insert an element with hash%N == 1. We start at bucket 1, and find it full. The element in that bucket is at index 1 and wants to be at index 0, so it’s further from home than our new element is. We keep looking. At index 2 we find an element with natural index of 2. It is a distance 0 from home, and we’re a distance 1, which is further. So we place our new element at index 2, and pick up whatever element used to be there, and continue. Eventually, we end up with:

+---+---+---+---+---+
| 0 | 0 | 1 | 1 | 2 | ...
+---+---+---+---+---+
  0   1   2   3   4

The net result of this process is to reduce the variance of how far out-of-position elements are. Reducing variance means we have much more predictable insert and lookup times, which is strongly desirable. When we employ robin hood hashing, we can safely load a table up to a load factor of 90% or even 95%. We can even prove (as the author of the original paper did) fairly tight bounds on the expected maximum probe distance.

The Problem

Rust’s problem arises when you iterate over one table and insert the resulting entries into another table. The problem occurs because iteration over a hash table procedes by walking the backing array of the hash table directly. This results in yielding entries approximately in hash order, which turns out to be fatal when combined with the other elements of Rust’s implemention.

The final detail we need to know in order to understand the bug is Rust’s table-sizing algorithm. Rust uses power-of-two sized tables (a common technique, since it means you can use bitmasks instead of more-expensive modulus), and grows tables when they reach a 90% load factor.

Now, let’s consider what happens when we execute the above example. For simplicity, we’ll pretend that two starts out with a table size of half of one’s. In reality it will start out extremely small and grow through powers of two, but the half-size case exhibits the necessary behavior and is simpler for illustration. We can modify the above example to initialize two as let two = HashSet::with_capacity(one.capacity()/2); to verify that this still exhibits quadratic behavior.

So, let’s consider what happens as we start copying one into a two half its size. Things procede uneventually for the first half of the table; Because two has half as many buckets as one, hash%two.raw_capacity() == hash%one.raw_capacity() for the first half of the table, and elements are inserted into two in approximately the same indexes as they had in one. Because elements have been reordered slightly in one by the insertion process we occasionally have to search slightly, but never far, because most of the space to the right is still free. So, as we go, the picture looks something like (where Xs are filled buckets, and dots are empty):

one:    |XX.XX.XXXX.XXXXXX.XXX..XXX|
                 ^-- cursor
two:    |X.XXXX.XX....|

However, eventually we reach the midpoint of one’s table. At this point, indexes in two’s half-sized table wrap around, and we start searching for openings beginning near the start of two once again.

At this point, two’s load factor is approximately equal to that of one; it has half as many buckets, and approximately half as many elements. Because Rust resizes tables at a 90% load factor, we know one’s load factor to be between 45% and 90%. If we suppose by way of example that one is at a 70% factor, two can absorb 20% of its capacity past one’s half-way point until it will resize.

During the insertion of that additional 20% of two’s capacity (constituting something like 70% * 10% = 7% of one’s elements), we run into trouble.

We begin inserting at the beginning of two. two is 70% full, so as we insert we search to the right until we find an empty bucket.

However, “immediately to the right” is also where our future inserts will land, since we’re walking two in bucket order. And so, as we go, the position at which we attempt an insertion marches forward at a rate of one bucket per bucket in one; But the insertions all “stack up”, pushing the distance-until-empty forward at greater than one bucket per bucket, since we must also contend with the 70% of buckets already full. The situation rapidly looks something more like:

one:    |XX.XX.XXXX.XXXXXX.XXX..XXX|
                           ^-- cursor
two:    |XXXXXXX.X.....|
            ^-- cursor/2

With each element we insert, we must search further and further to the right to find the empty spaces in two. This is, essentially, a classic triangular pattern, and definitely quadratic.

This pattern continues until we fill up two to 90% capacity; At that point, two will be resized, and performance will go back to something reasonable. However, even though the quadratic regime covers only a small fraction of the middle of the loop, it’s an approximately-constant fraction, and so the quadratic behavior is still quadratic.

We can demonstrate this, visually. If we augment the above program to measure the cumulative time to insert N elements into two, we observe the above behavior:

We can see the exact behavior I described – efficient performance for the first half, a quadratic explosion, and then a return to efficiency.

If we don’t preallocate capacity/2 elements, we can see the full picture:

It is an essentially similar picture, except that the quadratic breakdown happens prior to each resize of the underlying table.

The Fix

Rust is working around the issue by reseeding the underlying SipHash hashing algorithm on a per-table basis. This solves the problem very thoroughly, by ensuring that the order of elements in two different tables has no correlation to each other, restoring all the nice assumptions about independence that make Robin Hood hashing work. They’re also speculating about a more fundamental fix to the hash table implementation, so that users who drop in a different hash algorithm are not again vulnerable. One proposal is that hash tables ought to retain their insertion order, since this bug is much less likely to happen if tables do not leak their hash order.

Concluding Thoughts

This was a subtle bug. Rust’s hash tables were very carefully thought by experienced developers well-versed in the literature and practice of deliberate quadratic attacks on hash tables. They selected an algorithm, a hashing algorithm (SipHash) and developed an implementation specifically to avoid such anomalies. But quadratics are sneaky creatures, and unexpected correlations can bite any algorithm based on statistical independence.

For my part, this was one of the harder bugs I’ve written up to understand. When the ticket was initially linked to me, I had to read it several times and read up on robin hood hashing and try several approaches to developing intuitions about the system before it began to make sense to me. But even then, when I sat down to write this post, I realized I had oversimplified it in my head and my explanation was wrong! I had to resort to a lot of experimentation with the above test case before I was confident I understood the dynamics.

Surprisingly to me, the specific dynamics of Robin Hood hashing end up being relatively unimportant here; I believe that vanilla linear probing would exhibit similar behaviors. The key effect of Robin Hood hashing is just that it gives you confidence and/or hubris to push a table to 90% capacity, which greatly exacerbates the problem.

Definitely a keeper of a bug.

accidentally quadratic rust hash table robin hood hashing

golang `text/template` parsing

Go’s text/template package implements a templating system for Go. Like many templating systems, when it loads a template it parses it into an AST, which can then be walked to render the template against one or more inputs.

AST nodes in the template AST hold the line number from which that element was parsed, primarily for better error-reporting.

In go1.7.3 and earlier, as the AST is walked, the line number field of each element is filled in by counting the newlines since the start of the file.

Of course, this means that for an N-line file containing roughly a template directive every few lines, we have to count through O(N²) newlines, making template parsing quadratic in file line count!

go1.8 will update the lexer to count the line number as it lexes and sees newlines, restoring linear-time behavior.

(Thanks to @derivativeburke for bringing this one to my attention)

accidentally quadratic golang text/template parse lex line numbers