A (RegExp) Catastrophe

Have you ever come across the term “Catastrophic Backtracking” in relation to regular expressions? Well I came face-to-face with it’s fury the other day while researching a bug in an old CMS module which I was trying my best to fix.

It started out with an outrageous claim from a QA tester that he could crash Chrome by typing in a large number of characters in to a text box. I tested it out and to my surprise the problem was worse than he realised and it all stemmed from a particular regular expression being used to validate an email address. It looked like this:


Looks pretty harmless?? But combine it with a test like so:


Whoooop browser bye bye. In fact you could simplify it to the following test as we found on the stackoverflow thread:


So what’s going on here? Well for the full explanation provided by Tim take a look at the answer, but to give you a quick summary. (d+)* in essence can be fulfilled by the following matches (each group separated by spaces):

dddd d
ddd dd
dd ddd
d dddd
ddd d d
dd d dd
d d ddd
dd dd d
d dd dd
d ddd d
d d d dd
d d dd d
d dd d d
dd d d d
d d d d d

This essentially means that there are an exponential number of possibilities dependant on the length of the string, and at a character length of only 40 you have a possible 549,755,813,888 combinations which takes time to go through and causes the browser to hang.

So keep an eye out for the following combinations in your regular expressions where d can be a single character or (more likely) the result of another block:


Read more information on Catastrophic Backtracking.

Sidenote: As a lot of people made quite clear in the discussion on Stack Overflow, it isn’t recommended to use a regular expression to test the full email. It is much safer and efficient to break it up in to it’s sections and validate each individually.