PHP 8.2: No-capture modifier (/n) support in preg_* functions
PHP 8.2 adds support for the /n (no capture) modifier to preg_* family of functions. When the /n modifier is used in a regular expression, all non-named groups do not capture by default.
Capturing groups and named capture groups
In regular expressions, () meta characters denominate a capturing group. All matches for the expression inside the bracket are captured, and returned.
preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(2) {
[0]=> string(7) "a-123-b"
[1]=> string(3) "123"
}
In the snippet above, $matches[1] contains the match for the expression within the braces: \d+. This is helpful in extracting individual subsets from a given string.
PREG (and in turn PHP) supports named capture groups, which return captured values by name:
- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?P<num>\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
+ ["num"]=> string(3) "123"
[1]=> string(3) "123"
}
With non-capturing groups syntax ((?:expr)) , it is possible to mark a group as non-capturing. Any expressions matched for a non-capturing group are not returned.
- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?:\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
- [1]=> string(3) "123"
}
Writing better Regular Expressions in PHP How to write more readable, self-explanatory, and effective regular expressions in PHP.
New /n modifier
When the /n modifier is used, all groups (with () meta-characters) no longer capture, except for the named capture groups. This is essentially the same as marking each non-named capturing group as non-capturing.
preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
The expression above has three capturing groups: (\w), (?P<num>\d+), and (\w). By default, PHP captures all groups, and results in a $matches value as the following:
array(5) {
[0]=> string(7) "a-123-b"
[1]=> string(1) "a"
["num"]=> string(3) "123"
[2]=> string(3) "123"
[3]=> string(1) "b"
}
With the /n modifier, only named-capture groups capture:
- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(\w)-(?P<num>\d+)-(\w)/n', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
["num"]=> string(3) "123"
[1]=> string(3) "123"
}
The result is identical to marking each group as non-capturing:
- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(?:\w)-(?P<num>\d+)-(?:\w)/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
["num"]=> string(3) "123"
[1]=> string(3) "123"
}
The advantage of the /n modifier is that it simplifies complex regular expressions multiple groups, but only a smaller set of subsets needs to be extracted. Instead of marking each unnecessary group as non-capturing, it is now possible to mark all groups as non-capturing, and cherry-pick groups that must capture by naming them.
Backward Compatibility Impact
The /n modifier is only support in PHP 8.2 and later. Attempting to use it in older PHP versions result in a PHP warning and the preg_* functions to return null:
preg_match('/(foo)/n', 'foo', $matches);
Warning: preg_match(): Unknown modifier 'n' in ... on line ...
It is not possible to back-port the support for /n modifier to older PHP versions. However, the same effect can be achieved by marking each group as non-capturing. Non-capturing groups are supported in all PHP versions, even PHP versions prior to PCRE2 migration:
- preg_match('/(foo)/n', 'foo', $matches);
+ preg_match('/(?:foo)/', 'foo', $matches);