PHP 8.2: No-capture modifier (/n) support in preg_* functions

Version8.2
TypeNew Feature

PHP 8.2 adds support for the /n (no capture) modifier to preg_* family of functions. When the /n modifier is used in a regular expression, all non-named groups do not capture by default.

Capturing groups and named capture groups

In regular expressions, () meta characters denominate a capturing group. All matches for the expression inside the bracket are captured, and returned.

preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
var_dump($matches);
array(2) {
  [0]=> string(7) "a-123-b"
  [1]=> string(3) "123"
}

In the snippet above, $matches[1] contains the match for the expression within the braces: \d+. This is helpful in extracting individual subsets from a given string.


PREG (and in turn PHP) supports named capture groups, which return captured values by name:

- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?P<num>\d+)-\w/', 'a-123-b', $matches);
  var_dump($matches);
array(3) {
  [0]=> string(7) "a-123-b"
+ ["num"]=> string(3) "123"
  [1]=> string(3) "123"
}

With non-capturing groups syntax ((?:expr)) , it is possible to mark a group as non-capturing. Any expressions matched for a non-capturing group are not returned.

- preg_match('/\w-(\d+)-\w/', 'a-123-b', $matches);
+ preg_match('/\w-(?:\d+)-\w/', 'a-123-b', $matches);
  var_dump($matches);
array(3) {
  [0]=> string(7) "a-123-b"
- [1]=> string(3) "123"
}

Writing better Regular Expressions in PHP How to write more readable, self-explanatory, and effective regular expressions in PHP.

New /n modifier

When the /n modifier is used, all groups (with () meta-characters) no longer capture, except for the named capture groups. This is essentially the same as marking each non-named capturing group as non-capturing.

preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);

The expression above has three capturing groups: (\w), (?P<num>\d+), and (\w). By default, PHP captures all groups, and results in a $matches value as the following:

array(5) {
  [0]=> string(7) "a-123-b"
  [1]=> string(1) "a"
  ["num"]=> string(3) "123"
  [2]=> string(3) "123"
  [3]=> string(1) "b"
}

With the /n modifier, only named-capture groups capture:

- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(\w)-(?P<num>\d+)-(\w)/n', 'a-123-b', $matches);
  var_dump($matches);
array(3) {
  [0]=> string(7) "a-123-b"
  ["num"]=> string(3) "123"
  [1]=> string(3) "123"
}

The result is identical to marking each group as non-capturing:

- preg_match('/(\w)-(?P<num>\d+)-(\w)/', 'a-123-b', $matches);
+ preg_match('/(?:\w)-(?P<num>\d+)-(?:\w)/', 'a-123-b', $matches);
  var_dump($matches);
array(3) {
  [0]=> string(7) "a-123-b"
  ["num"]=> string(3) "123"
  [1]=> string(3) "123"
}

The advantage of the /n modifier is that it simplifies complex regular expressions multiple groups, but only a smaller set of subsets needs to be extracted. Instead of marking each unnecessary group as non-capturing, it is now possible to mark all groups as non-capturing, and cherry-pick groups that must capture by naming them.

Backward Compatibility Impact

The /n modifier is only support in PHP 8.2 and later. Attempting to use it in older PHP versions result in a PHP warning and the preg_* functions to return null:

preg_match('/(foo)/n', 'foo', $matches);
Warning: preg_match(): Unknown modifier 'n' in ... on line ...

It is not possible to back-port the support for /n modifier to older PHP versions. However, the same effect can be achieved by marking each group as non-capturing. Non-capturing groups are supported in all PHP versions, even PHP versions prior to PCRE2 migration:

- preg_match('/(foo)/n', 'foo', $matches);
+ preg_match('/(?:foo)/', 'foo', $matches);

Implementation