PHP 8.2: Mbstring: Base64, Uuencode, QPrint, and HTML Entity encodings are deprecated

Version8.2
TypeDeprecation

PHP's Multi-Byte Strings extension (mbstring) adds functionality to manipulate PHP strings that contains multi-byte characters such as characters from Asian scripts, Emojis, and thousands of other characters that cannot be assigned and fit into a single byte.

The extension supports converting to and from several character encodings such as UTF-8/16/32, and ISO-8859-1. Mbstring also supports a few encodings such as Base64, Quoted-Printable, Uuencode, and HTML Entities. These four encodings do not fit to the rest of the text encodings, in that they do not process a sequence of bytes which make a Unicode code-point, but rather on raw bytes.

Further, PHP core already provides functions to encode/decode Base64, Quoted-Printable, Uuencode, and HTML Entities as separate dedicate functions.

In PHP 8.2, using Mbstring extension to encode/decode strings to Base64, Quoted-Printable, Uuencode, and HTML Entities is deprecated.

The following labeled encodings are affected. The encoding labels are case-insensitive.

  • BASE64
  • UUENCODE
  • HTML-ENTITIES
  • html (alias of HTML-ENTITIES)
  • Quoted-Printable
  • qprint (alias of Quoted-Printable)

Using Mbstring extension to encode/decode said character encodings will emit a PHP deprecation notice. In PHP 9.0, support for these encodings will be dropped.

mb_detect_encoding and mb_convert_encoding Changes

Since PHP 8.2 and later, the mb_detect_encoding function no longer detects the encodings listed above. Further, the mb_convert_encoding function will not attempt to detect the text encoding as one of the deprecated encodings.

Prior to this change, mb_detect_encoding function would broadly fallback return encodings such as UUENCODE even if the text is not encoded.

Replacing the Deprecated Functionality

PHP already provides built-in functions that support encoding/decoding Base64, Quoted-Printable, Uuencode, and HTML Entities.

Replacing the Mbstring-provided conversion functions with the built-in dedicated functions avoids the deprecation notice, with the added benefit of the replacing functions being built-in to PHP core itself, without having to require the Mbstring extension.

Base64 encoding/decoding

Attempting to encode a string into Base64 using Mbstring functions result in a deprecation notice:

mb_convert_encoding('test', 'base64'));
Deprecated: mb_convert_encoding(): Handling Base64 via mbstring is deprecated; use base64_encode/base64_decode instead in ... on line ...

To avoid the deprecation notice, replace functions that Base64 encode/decode with base64_encode() and base64_decode() functions:

- $base64Encoded = mb_convert_encoding('test', 'Base64'));
+ $base64Encoded = base64_encode('test'));
- $str = mb_convert_encoding($base64Encoded, 'UTF-8', 'Base64'));
+ $str = base64_decode($base64Encoded));

HTML Entities encoding/decoding

Mbstring extension provides an encoding named HTML-Entities, which can be used to encode/decode strings to HTML entities. This functionality is similar to the PHP core's htmlentities and html_entity_decode functions.

$str = '¢ <script>';
$entities = mb_convert_encoding($str, 'HTML-Entities', 'UTF-8');
// "&cent; <script>"
Deprecated: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead in ... on line ...

Mbstring-provided HTML entity conversion can be replaced with htmlentities function, which encodes all HTML entities.

  $str = '¢ <script>';
- $entities = mb_convert_encoding($str, 'HTML-Entities', 'UTF-8');
- // "&cent; <script>"
+ $entities = htmlentities($str);
+ // "&cent; &lt;script&gt;"

Note that HTML-Entities encoding in Mbstring does not encode '"<>& characters. These are the characters that htmlspecialchars function specifically encodes. It is possible to replicate Mbstring's HTML-Entities encode functionality verbatim by later decoding the characters htmlentities() encodes, but HTML-Entities would not have ('"<>&). Not decoding these special characters, and displaying them to a browser can result in cross-site scripting vulnerability.

Additionally, it might be necessary to convert non-UTF-8 characters to UTF-8 prior to passing them to htmlentities function.

  $fromEncoding = 'ISO-8859-1';
- $encodedStr = mb_convert_encoding($str, 'HTML-Entities', $fromEncoding);
+ $encodedStr = mb_convert_encoding($str, 'UTF-8', $fromEncoding);
+ $encodedStr = htmlentities($encodedStr);

Although not recommended, if the '"<>& must not be encoded, this behavior can be achieved with an htmlspecialchars_decode call:

  $fromEncoding = 'ISO-8859-1';
- $encodedStr = mb_convert_encoding($str, 'HTML-Entities', $fromEncoding);
+ $encodedStr = mb_convert_encoding($str, 'UTF-8', $fromEncoding);
+ $encodedStr = htmlentities($encodedStr);
+ $encodedStr = htmlspecialchars_decode($encodedStr);

Uuencode/decode

Mbstring's UUENCODE performs uuencoding, which results in a deprecation notice since PHP 8.2 and later.

mb_convert_encoding('test', 'UUENCODE');
Deprecated: mb_convert_encoding(): Handling Uuencode via mbstring is deprecated; use convert_uuencode/convert_uudecode instead in ... on line ...

Mbstring-provided Uuencode/decode functionality from its UUENCODE encoding can be replaced with PHP core's built-in convert_uuencode() and convert_uudecode functions.

- $encoded = mb_convert_encoding('test', 'UUENCODE');
+ $encoded = convert_uuencode('test');
- $encoded = mb_convert_encoding('test', 'UTF-8', 'UUENCODE');
+ $encoded = # convert_uudecode('test');

Now might be a good time to reevaluate the decision of using uuencode in the first place. Base64, for example, is a more robust alternative to encode binary data into a string.

Quoted-Printable encoding/decoding

Similar to other deprecated encodings, using Mbstring for Quoted-Printable encoding results in a deprecation notice in PHP 8.2.

$encoded = mb_convert_encoding('a=b', 'Quoted-Printable');
Deprecated: mb_convert_encoding(): Handling QPrint via mbstring is deprecated; use quoted_printable_encode/quoted_printable_decode instead in ... on line ...

PHP core already supports Quoted-Printable character encoding/decoding, and replacing the Mbstring-functionality with PHP core's quoted_printable_encode() and quoted_printable_decode() avoids the deprecation notice while retaining the output.

-$encoded = mb_convert_encoding('a=b', 'Quoted-Printable');
+$encoded = quoted_printable_encode('a=b');
-$decoded = mb_convert_encoding('a=b', 'UTF-8', 'Quoted-Printable');
+$decoded = quoted_printable_decode('a=3Db');

Backwards Compatibility Impact

In PHP 8.2, using Mbstring extension to encode/decode strings to/from HTML Entities, Base64, Uuencode, and Quoted Printable encodings are affected, and results in a deprecation notice.

The deprecation notice will not be emitted more than once per request, which can help reduce the noise in the likely scenario that the functions are called within loops.

Note that the deprecation notice is not limited to mb_convert_encoding. It can be emitted anywhere Mbstring encounters a now-deprecated character encoding. For example:

mb_strlen('test', 'BASE64');

All deprecated encodings will be removed in since PHP 9.0.


Implementation