PHP 8.2: Locale-independent case conversion

Version8.2
TypeChange

In PHP 8.2, functions that provide case conversion and case insensitive operations only consider the ASCII character range.

When a locale is set, it changes how the underlying C libraries handle strings, including case conversion and case-insensitive comparisons. Prior to PHP 8.0, PHP inherited the system locale, which was often unexpected and caused the PHP applications to be somewhat unpredictable on certain locales. PHP 8.0 and later no longer inherits the system locale, but calling setlocale with a custom locale can still enable the side-effects, especially in relation to case folding.


In PHP 8.2 and later, PHP's internal case conversion functions are made locale-independent, which affects the following functions:

  • strtolower
  • strtoupper
  • lcfirst
  • ucfirst
  • ucwords
  • stristr
  • stripos
  • strripos
  • str_ireplace

All of the functions above only perform case conversion and comparisons in the ASCII character range.

Because PHP 8.0 changed the way the default locale is set, PHP 8.0 no longer inherits the system locale. Unless an application explicitly calls setlocale (with a value other than "C"), this change in PHP 8.2 should not have any effect in applications.

For example, when the locale is set to tr_TR, PHP versions older than PHP 8.2 returned returned a dotted İ (LATIN CAPITAL LETTER I WITH DOT ABOVE ) as the capital letter for ASCII i:

setlocale(LC_ALL, 'tr_TR');
echo strtoupper('i'); // İ

In PHP 8.2, this behavior is fixed, and the current locale has no impact on case conversions or comparisons:

setlocale(LC_ALL, 'tr_TR');
echo strtoupper('i'); // I

Related Changes

Backwards Compatibility Impact

PHP Applications that do not call setlocale to switch to an alternative locale should not experience any change in their functionality due to this.

PHP 8.0 and later no longer respects the system locale, and overriding it with setlocale is almost always a bad idea, and can cause side-effects because the locale is set per-process, and not per-request/thread.

For applications that need to reliably convert character cases across various languages should consider using the functionality provided by intl, mbstring, or iconv extensions.


RFC Discussion Implementation