PHP 8.2: No-capture modifier (/n
) support in preg_*
functions
PHP 8.2 adds support for the /n
(no capture) modifier to preg_*
family of functions. When the /n
modifier is used in a regular expression, all non-named groups do not capture by default.
Capturing groups and named capture groups
In regular expressions, ()
meta characters denominate a capturing group. All matches for the expression inside the bracket are captured, and returned.
preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(2) {
[0]=> string(7) "a-123-b"
[1]=> string(3) "123"
}
In the snippet above, $matches[1]
contains the match for the expression within the braces: \d+
. This is helpful in extracting individual subsets from a given string.
PREG (and in turn PHP) supports named capture groups, which return captured values by name:
- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?P<num>\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
+ ["num"]=> string(3) "123"
[1]=> string(3) "123"
}
With non-capturing groups syntax ((?:expr)
) , it is possible to mark a group as non-capturing. Any expressions matched for a non-capturing group are not returned.
- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?:\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
- [1]=> string(3) "123"
}
Writing better Regular Expressions in PHP How to write more readable, self-explanatory, and effective regular expressions in PHP.
New /n
modifier
When the /n
modifier is used, all groups (with ()
meta-characters) no longer capture, except for the named capture groups. This is essentially the same as marking each non-named capturing group as non-capturing.
preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
The expression above has three capturing groups: (\w)
, (?P<num>\d+)
, and (\w)
. By default, PHP captures all groups, and results in a $matches
value as the following:
array(5) {
[0]=> string(7) "a-123-b"
[1]=> string(1) "a"
["num"]=> string(3) "123"
[2]=> string(3) "123"
[3]=> string(1) "b"
}
With the /n
modifier, only named-capture groups capture:
- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(\w)-(?P<num>\d+)-(\w)/n', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
["num"]=> string(3) "123"
[1]=> string(3) "123"
}
The result is identical to marking each group as non-capturing:
- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(?:\w)-(?P<num>\d+)-(?:\w)/', 'a-123-b', $matches);
var_dump($matches);
array(3) {
[0]=> string(7) "a-123-b"
["num"]=> string(3) "123"
[1]=> string(3) "123"
}
The advantage of the /n
modifier is that it simplifies complex regular expressions multiple groups, but only a smaller set of subsets needs to be extracted. Instead of marking each unnecessary group as non-capturing, it is now possible to mark all groups as non-capturing, and cherry-pick groups that must capture by naming them.
Backward Compatibility Impact
The /n
modifier is only support in PHP 8.2 and later. Attempting to use it in older PHP versions result in a PHP warning and the preg_*
functions to return null
:
preg_match('/(foo)/n', 'foo', $matches);
Warning: preg_match(): Unknown modifier 'n' in ... on line ...
It is not possible to back-port the support for /n
modifier to older PHP versions. However, the same effect can be achieved by marking each group as non-capturing. Non-capturing groups are supported in all PHP versions, even PHP versions prior to PCRE2 migration:
- preg_match('/(foo)/n', 'foo', $matches);
+ preg_match('/(?:foo)/', 'foo', $matches);