Regular expressions that work “everywhere”
- Programming
- Developer Tools
- Standards
- Open Source
The post argues that despite the mess of regex dialects, there is still a practical core you can use “everywhere,” including basics like character classes, anchors, repetition, grouping, alternation, and some escapes. Readers mostly agreed with the goal and disagreed with the confidence. The big correction was that syntax support is only half the problem. Defaults matter just as much. GNU grep and sed use POSIX Basic Regular Expressions by default, which means operators like `+`, `?`, `|`, and grouping often need `-E` or escaping before they mean what many developers expect. BSD and macOS sed diverge further, especially around shorthand classes like `\w` and `\s` and word boundaries like `\b`. Several people pushed the discussion past syntax into semantics. POSIX and Perl-style engines can match the same pattern differently because they resolve alternatives and greediness differently, so “works everywhere” can still produce different matches. That turned the practical lesson into something stricter than the post itself: portability is not a list of tokens, it is a combination of dialect, defaults, and matching rules. The most useful additions were references to attempts to standardize or fence off the problem, from POSIX BRE and RFC 9485’s I-Regexp to JSON Schema’s recommended subset and Russ Cox’s writing on regex engines and behavior. Another recurring complaint was that software and docs routinely say “supports regex” without naming the dialect, which is fine for language-internal code and terrible for user-facing configuration. A few side threads branched into tooling pain like nested escaping in shells, Python raw strings, and Emacs, plus the idea that regexes would have been easier to live with if they had evolved as a structured composable language instead of a pile of mini-DSLs.
If your product, docs, or config accepts regexes, name the exact dialect and matching behavior instead of saying just “regex.” For anything that must run across tools, test against the actual engines you care about and stick to a deliberately tiny subset.
- johndcook.com
- Discuss on HN