PHP 8.4: New mb_ucfirst and mb_lcfirst functions

Version8.4
TypeNew Feature

PHP provides ucfirst and lcfirst functions to change the first character's uppercase or lowercase in a given string.

The mbstring extension provides multi-byte safe functions for the majority of PHP's standard string functions. However, prior to PHP 8.4, the mbstring extension did not provide multi-byte safe counterpart functions for ucfirst and lcfirst functions.

In PHP 8.4, the mbstring extension adds mb_ucfirst and mb_lcfirst functions to as multi-byte safe alternatives to ucfirst and lcfirst functions.

Titlecase vs UPPERcase The Unicode standard defines a list of character mappings that work differently in uppercase and titlecase.

For example, the lower case character nj (U+01CC - Latin Small Letter Nj) is changed to NJ (U+01CA - Latin Capital Letter Nj) in uppercase, and to Nj (U+01CB - Latin Capital Letter N with Small Letter J) in titlecase. Another example is the German Eszett character (ß - U+00DF), which has uppercase as SS, while the titlecase is Ss.

See Unicode FAQ and Unicode Derived Code Properties for code-points with Changes_When_Titlecased (CWT) property.

New mb_ucfirst and mb_lcfirst functions

The new mb_ucfirst and mb_lcfirst functions provide multi-byte safe functions to change the first character's case to uppercase or lowercase for a given string.

Similar to the rest of the mb_* functions, mb_ucfirst and mb_lcfirst functions also accept ?string $encoding = null as the last parameter, and the first parameter on both functions is the string of which the case needs to be changed.

Note the multi-byte case conversions can change the byte-size (strlen() output) as well as the length (mb_strlen() output) of the values. For example:

  • The lowercase character of the Kelvin sign ( - U+212A, taking 3 bytes) is k (U+006B, taking 1 byte).
  • Eszett character (ß) folds to Ss (titlecase) and SS (uppercase). The byte-size remains 2 bytes, but in this case, the length (mb_strlen()) changes from 1 to 2.

This may affect functionality that validates the string length and size, such as a database index size limit.


mb_ucfirst {#mb_ucfirst} Function

mb_ucfirst function converts the first character of the given string to titlecase. The rest of the string remains unchanged, even if it is in upper case. The difference with ucfirst function is that mb_ucfirst supports multi-byte characters, and thus supports all Unicode case conversion rules.

/**  
 * Make a string's first character uppercase multi-byte safely.
 **/
function mb_ucfirst(string $string, ?string $encoding = null): string {}  

Usage examples

mb_ucfirst('test'); // Test - unchanged
mb_ucfirst('TEST'); // TEST
mb_ucfirst('tEst'); // TEst
mb_ucfirst('tEst'); // TEst
mb_ucfirst('łámał'); // Łámał
mb_ucfirst("\u{01CA}"); // "\u{01CB}"
mb_ucfirst("💓🙈"); // "💓🙈" - unchanged
mb_ucfirst("ß"); // "Ss" - Only the first S uppercase.

mb_lcfirst {#mb_lcfirst} Function

Similar to the mb_ucfirst function, the mb_lcfirst function changes the first character of the given string to lowercase. Unlike the lcfirst function, mb_lbfirst can change multi-byte characters.

/**  
 * Make a string's first character lowercase multi-byte safely.
 **/
function mb_lcfirst(string $string, ?string $encoding = null): string {}

Usage examples

mb_ucfirst('test'); // test - unchanged
mb_ucfirst('TEST'); // tEST
mb_ucfirst('tEst'); // tEst
mb_ucfirst('tEst'); // TEst
mb_ucfirst('Łámał'); // łámał
mb_ucfirst("\u{01CA}"); // "\u{01CB}"
mb_ucfirst("ß"); // "ß" - unchanged

PHP Polyfills

These functions can be trivially implemented in user-land PHP:

/**
 * Make a string's first character uppercase multi-byte safely.
 */
function mb_ucfirst(string $string, ?string $encoding = null): string {
    $firstChar = mb_substr($string, 0, 1, $encoding);
    $firstChar = mb_convert_case($firstChar, MB_CASE_TITLE, $encoding);

    return $firstChar . mb_substr($string, 1, null, $encoding);
}

/**
 * Make a string's first character lowercase multi-byte safely.
 */
function mb_lcfirst(string $string, ?string $encoding = null): string {
    $firstChar = mb_substr($string, 0, 1, $encoding);
    $firstChar = mb_convert_case($firstChar, MB_CASE_LOWER, $encoding);

    return $firstChar . mb_substr($string, 1, null, $encoding);
}

The above implementation can also be installed as a Composer package:

composer require polyfills/mb-ucfirst-lcfirst

Backward Compatibility Impact

The two new functions, mb_ucfirst and mb_lcfirst, are declared in the global namespace. Unless there is an existing function with the same name in the global namespace, this change has no backward compatibility impact.

Further, the new functions can be implemented in trivially user-land PHP.


RFC Discussion Implementation