Today on hacker news people have been discussing an issue Netflix had with a bug in their route reload code using Express. Now in express, routes aren’t meant to be reloaded (and there are plenty of options for restarting without stopping accepting connections – I’ve even written a simple one myself), but they hacked it in anyway, and every reload resulted in the processing of routes slowing down.
The discussion on hacker news didn’t focus on the actual problem Netflix had, more on the issue that express uses a simple (recursive) loop through an array of routes to find the appropriate route for the request, continuing on to the next matching route if
next() is called.
While this isn’t a problem for our systems at work, some people in the discussion and elsewhere said there’s no way to solve this without breaking the semantics of how express currently works (routes have a defined order, and multiple routes can match a single request). Suggestions of a path-based DFA were quickly rebutted.
However that’s a naive way to solve the problem.
V8 has a very fast regular expression engine that compiles regular expressions to x86 machine code. This could be exploited to implement the current routing logic using a large single regular expression (actually N regular expressions, where N is the number of routes, but I’ll come to that).
The implementation is very simple:
- Take each route (which can be a string, a simplified regexp, or a full regexp) and convert it to a regexp (presumably the router already does this internally because
/foo/:idhas to match any value for
- Combine the regexps with bracket matches and alternation:
- Generate N regexps, depending on number of routes:
- When a request comes in, check the first regexp, if it matches, you know which alternation matched based on the return from
RegExp.prototype.exec(), from there you pick the route function based on an index in an array of route functions.
- Alternating regexps match in order so this preserves the Express semantics.
next()is called, you go to the next regexp in the list (e.g.
((re2)|(re3)|(re4)|(re5))), and run that. Use the same algorithm (with index + 1) to find the appropriate route function.
This solves the performance problem because the V8 regexp engine finds the appropriate route extremely quickly, and keeps the Express ordering and multiple routes semantics in-tact.
I haven’t proved this is any faster yet, I’m just assuming. I suspect not much for small numbers of routes, and I don’t know if the JS engine has size limits that would prevent it working on large numbers of routes.
Also, this is just an idea I had – I welcome people telling me why it couldn’t work as perhaps my understanding of the Express internals isn’t as strong as I hoped.