Recently, while spending time in the .NET forums, I have come across an inordinate number of questions regarding regular expressions, which often sparked the usual religious argument between the proponents and opponents of regular expressions. One of the more widespread stances regarding regular expressions is that they are bad, and lead to unnecessarily unreadable and, in inexperienced hands, inefficient code. While there is some truth to this, regular expressions are no worse than the goto
command, the if
command, or any command for that matter; i.e. if used correctly, it can save time and enhance readability.
The thing to remember is that regular expressions are a field unto their own, and that it requires some expertise to use them right, as well as to properly judge when to use them at all. Two of the main misuses for regular expressions that I've encountered are:
- Discovering their usefulness for a particular task, and then using them everywhere, just because you can.
- Attempting to use regular expressions to parse structured text, such as a programming language, or XML.
To use regular expressions correctly, I follow these rules of thumb:
- Keep the regular expressions short, don't let them span more than one line. But also remember not to compress them too much just so that they fit on one line. Keep them readable.
- Use them only when necessary, if a concise, readable and efficient alternative exists to regular expressions, use that instead.
- Use named capture groups where possible, to give semantics to the expressions.
- Write comments regarding the expression, and include some examples of what it is expected to match and to reject.