PHP 8.2: utf8_encode
and utf8_decode
functions deprecated
utf8_encode
and utf8_decode
functions, despite their names, are used to convert strings between ISO-8859-1 (Also known as "Latin 1") and UTF-8 encodings. These functions do not attempt to detect the actual character encoding in a given text, and always convert character encodings between ISO-8859-1 and UTF-8, even if the source text is not encoded in ISO-8859-1.
Although PHP includes utf8_encode
and utf8_decode
functions in its standard library, these functions cannot be used to detect and convert other character encodings such as Windows-1252, UTF-16, and UTF-32 to UTF-8. Passing arbitrary text to utf8_encode
function is prone to bugs that do not result in any warnings or errors but may lead to undesired results.
Some frequent examples of bugs include:
- The Euro sign (
€
, character sequence\xE2\x82\xAC
), when passed toutf8_encode
function asutf8_encode("€")
results in a a garbled (also called as "Mojibake") text output ofâ¬
. - The German Eszett character (
ß
, character sequence\xDF
), when passed throughutf8_encode("ß")
results inÃ
.
Both of the examples above do not emit any warnings or errors although their resulting text is wrong.
Because of the misleading function names, lack of error messages and warnings, and the lack of support for character encodings other than ISO-8859-1, utf8_encode
and utf8_decode
functions are deprecated in PHP 8.2.
Using utf8_encode
and utf8_decode
functions emit a deprecation notice in PHP 8.2, and the functions will be removed in PHP 9.0.
utf8_encode('foo');
uft8_decode('foo');
Function utf8_encode() is deprecated in ... on line ...
Function uft8_decode() is deprecated in ... on line ...
Replacements for the deprecated functions
utf8_encode
function encodes a ISO-8859-1 encoded string text into UTF-8. Most of the utf8_encode
calls in legacy PHP applications use this function as an additional safe-guard to prevent any potential malformed text to UTF-8, but as shown in the examples above, using this function often results in undesired outcomes rather than fixing any malformed text.
Similarly, calling utf8_decode
function on a string decodes that string to ISO-8859-1 character encoding. Majority of the web applications, web sites, and text formats in fact expect UTF-8 encoded text and not ISO-8859-1.
It might be ideal to reevaluate the need of utf8_encode
and utf8_decode
function calls prior to replacing them, because more often than not, these function calls are not required, and only result in undesired outcomes.
PHP does not bundle multi-byte character encoding functions in its core, but PHP core mbstring
, intl
, and iconv
extensions provide a robust and accurate functionality to detect and convert character encodings. Both mbstring
and iconv
are core extensions, but mbstring
is used widely in modern PHP applications, and can be polyfilled as well.
Replacements for utf8_encode
If the actual use case of an existing utf8_encode
function call is to convert a known ISO-8859-1 string to UTF-8, it is possible to use iconv
, intl
, or mbstring
extensions to properly convert the encoding. Alternatively, it is possible to directly convert code-points to UTF-8 string as well using user-land PHP albeit with a small performance penalty.
When the use case of utf8_encode
is to automatically detect the character encoding and convert it to UTF-8, even though the function did not detect character encodings in the first place, the replacement would be detecting the character encoding first, and then converting it to UTF-8.
ISO-8859-1 to UTF-8 | Any encoding to UTF-8 | |
---|---|---|
PHP Standard Functions | ISO-8859-1 to UTF-8 using Standard PHP Functions | N/A |
With mbstring |
ISO-8859-1 to UTF-8 using mbstring |
Any encoding to UTF-8 using mbstring |
With intl |
ISO-8859-1 to UTF-8 using intl |
N/A |
With iconv |
ISO-8859-1 to UTF-8 using iconv |
N/A |
ISO-8859-1 to UTF-8 using Standard PHP Functions
symfony/polyfill-php72
library provides a PHP function that mimics the utf8_encode
functionality using standard PHP functions. For better readability and to convey the meaning of the function, it is renamed to iso8859_1_to_utf8
in the example below.
function iso8859_1_to_utf8(string $s): string {
$s .= $s;
$len = \strlen($s);
for ($i = $len >> 1, $j = 0; $i < $len; ++$i, ++$j) {
switch (true) {
case $s[$i] < "\x80": $s[$j] = $s[$i]; break;
case $s[$i] < "\xC0": $s[$j] = "\xC2"; $s[++$j] = $s[$i]; break;
default: $s[$j] = "\xC3"; $s[++$j] = \chr(\ord($s[$i]) - 64); break;
}
}
return substr($s, 0, $j);
}
With the function above declared in application code, it is now possible to replace all utf8_encode
calls with the new iso8859_1_to_utf8
function to avoid the deprecation notice:
- utf8_encode($string);
+ iso8859_1_to_utf8($string);
ISO-8859-1 to UTF-8 using mbstring
mbstring
extension, one of the most widely used optional PHP extensions, provides a cleaner and straight-forward approach to convert ISO-8859-1 encoded strings to UTF-8. This can be used to replace the utf8_encode
function deprecated in PHP 8.2.
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', 'ISO-8859-1');
Any encoding to UTF-8 using mbstring
Without knowing the actual character encoding used in the input text, it might lead to erroneous results when PHP is forced to detect the input character encoding. However, it is possible to make a reasonable guess of the source character encoding and convert it to UTF-8 using mbstring
extension.
- utf8_encode($string);
+ mb_convert_encoding($string, 'UTF-8', mb_list_encodings());
ISO-8859-1 to UTF-8 using intl
The UConverter
class in the intl
extension also provides a way to convert character encodings from one to another. It follows a similar function signature as mbstring
counterparts as well. Using UConverter::transcode
, it is possible to replicate utf8_encode
functionality:
- utf8_encode($string);
+ UConverter::transcode($latin1, 'UTF8', 'ISO-8859-1');
ISO-8859-1 to UTF-8 using iconv
Applications that can use the iconv
extension can replace the utf8_encode
function using iconv
function:
- utf8_encode($string);
+ iconv('ISO-8859-1', 'UTF-8', $string);
Replacements for utf8_decode
utf8_decode
function decodes a UTF-8 encoded string to ISO-8859-1. With the utf8_decode
function deprecated, it is possible to replicate this functionality using PHP standard functions, mbstring
extension, intl
extension, or iconv
extension.
UTF-8 to ISO-8859-1 | |
---|---|
PHP Standard Functions | UTF-8 to ISO-8859-1 using Standard PHP Functions |
With mbstring |
UTF-8 to ISO-8859-1 using mbstring |
With intl |
UTF-8 to ISO-8859-1 using intl |
With iconv |
UTF-8 to ISO-8859-1 using iconv |
UTF-8 to ISO-8859-1 using Standard PHP Functions
Similar the the utf8_encode
polyfill, symfony/polyfill-php72
library provides a PHP function that mimics the utf8_decode
functionality:
function utf8_to_iso8859_1(string $string): string {
$s = (string) $string;
$len = \strlen($s);
for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
switch ($s[$i] & "\xF0") {
case "\xC0":
case "\xD0":
$c = (\ord($s[$i] & "\x1F") << 6) | \ord($s[++$i] & "\x3F");
$s[$j] = $c < 256 ? \chr($c) : '?';
break;
case "\xF0":
++$i;
// no break
case "\xE0":
$s[$j] = '?';
$i += 2;
break;
default:
$s[$j] = $s[$i];
}
}
return substr($s, 0, $j);
}
With the function above included, it is now possible to replace utf8_decode
calls with the new utf8_to_iso8859_1
function:
- utf8_decode($string);
+ utf8_to_iso8859_1($string);
UTF-8 to ISO-8859-1 using mbstring
Using mbstring
, the following example replaces the deprecated utf8_decode
function with mb_convert_encoding
:
- utf8_decode($string);
+ mb_convert_encoding($string, 'ISO-8859-1', 'UTF-8');
UTF-8 to ISO-8859-1 using intl
With help of UConverter::transcode
in the intl
extension, the following example shows a utf8_decode
replacement:
- utf8_encode($string);
+ UConverter::transcode($string, 'ISO-8859-1', 'UTF8', ['to_subst' => '?']);
UTF-8 to ISO-8859-1 using iconv
iconv
function can also be used to mimic and replace the utf8_decode
functionality to avoid the utf8_decode
deprecation in PHP 8.2:
- utf8_encode($string);
+ iconv('UTF-8', 'ISO-8859-1', $string);
Backwards Compatibility Impact
utf8_encode
and utf8_decode
functions are sometimes used in legacy PHP applications and applications that process incoming data and files with various character encodings. These functions are deprecated in PHP 8.2, and will be removed in PHP 9.0 because these functions are misleadingly named, and are prone to unexpected and undesired results that emit no warnings or errors.
Since PHP 8.2 and later, using these functions result in a deprecation notice for each time the functions are called.
utf8_encode
and utf8_decode
functions are to be removed from PHP in PHP 9.0.
A large number of applications that use these functions use them without being aware that they only work with ISO-8859-1 character encoding and nothing else for the source character encoding. It is possible that the ideal fix for the deprecation is to see why these functions are used in the first place, and determine if they are absolutely necessary.
Depending on the availability of PHP extensions and the willingness to use a somewhat slower PHP implementation, it is possible to replace utf8_encode
and utf8_decode
function calls.
Related Changes
- PHP 8.2: Mbstring: Base64, Uuencode, QPrint, and HTML Entity encodings are deprecated
- PHP 8.2:
${var}
string interpolation deprecated