Uncategorized

Solving the express ‘array of routes’ problem

Today on hacker news people have been discussing an issue Netflix had with a bug in their route reload code using Express. Now in express, routes aren’t meant to be reloaded (and there are plenty of options for restarting without stopping accepting connections – I’ve even written a simple one myself), but they hacked it in anyway, and every reload resulted in the processing of routes slowing down.

The discussion on hacker news didn’t focus on the actual problem Netflix had, more on the issue that express uses a simple (recursive) loop through an array of routes to find the appropriate route for the request, continuing on to the next matching route if next() is called.

While this isn’t a problem for our systems at work, some people in the discussion and elsewhere said there’s no way to solve this without breaking the semantics of how express currently works (routes have a defined order, and multiple routes can match a single request). Suggestions of a path-based DFA were quickly rebutted.

However that’s a naive way to solve the problem.

V8 has a very fast regular expression engine that compiles regular expressions to x86 machine code. This could be exploited to implement the current routing logic using a large single regular expression (actually N regular expressions, where N is the number of routes, but I’ll come to that).

The implementation is very simple:

  • Take each route (which can be a string, a simplified regexp, or a full regexp) and convert it to a regexp (presumably the router already does this internally because /foo/:id has to match any value for :id).
  • Combine the regexps with bracket matches and alternation: ((re1)|(re2)|(re3)|(re4)|(re5))
  • Generate N regexps, depending on number of routes: ((re1)|(re2)|(re3)|(re4)|(re5)) and ((re2)|(re3)|(re4)|(re5)) and ((re3)| (re4)|(re5)) and ((re4)|(re5)) and ((re5))
  • When a request comes in, check the first regexp, if it matches, you know which alternation matched based on the return from RegExp.prototype.exec(), from there you pick the route function based on an index in an array of route functions.
  • Alternating regexps match in order so this preserves the Express semantics.
  • If next() is called, you go to the next regexp in the list (e.g. ((re2)|(re3)|(re4)|(re5))), and run that. Use the same algorithm (with index + 1) to find the appropriate route function.

This solves the performance problem because the V8 regexp engine finds the appropriate route extremely quickly, and keeps the Express ordering and multiple routes semantics in-tact.

I haven’t proved this is any faster yet, I’m just assuming. I suspect not much for small numbers of routes, and I don’t know if the JS engine has size limits that would prevent it working on large numbers of routes.

Also, this is just an idea I had – I welcome people telling me why it couldn’t work as perhaps my understanding of the Express internals isn’t as strong as I hoped.

Standard

2 thoughts on “Solving the express ‘array of routes’ problem

  1. One slight correction – after a match, instead of going to the next regexp in the list, you go to the next regexp that doesn’t contain the previous match. So for example:

    ((re1)|(re2)|(re3)|(re4)|(re5)) – if re3 matches, your next test should be ((re4)|(re5)).

    Very clever solution, it would be interesting to see what type of gains this would yield.

    • That is what I meant – “the list” being the list of combined regular expressions.

      Your suggestion of “the next regexp that doesn’t contain the previous match” doesn’t work – the alternation can contain the same regexp multiple times (that’s how you implement multiple routes matching the same path).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s