PHP 8.4: New mb_ucfirst
and mb_lcfirst
functions
PHP provides ucfirst
and lcfirst
functions to change the first character's uppercase or lowercase in a given string.
The mbstring
extension provides multi-byte safe functions for the majority of PHP's standard string functions. However, prior to PHP 8.4, the mbstring
extension did not provide multi-byte safe counterpart functions for ucfirst
and lcfirst
functions.
In PHP 8.4, the mbstring
extension adds mb_ucfirst
and mb_lcfirst
functions to as multi-byte safe alternatives to ucfirst
and lcfirst
functions.
Titlecase vs UPPERcase The Unicode standard defines a list of character mappings that work differently in uppercase and titlecase.
For example, the lower case character
nj
(U+01CC
- Latin Small Letter Nj) is changed toNJ
(U+01CA
- Latin Capital Letter Nj) in uppercase, and toNj
(U+01CB
- Latin Capital Letter N with Small Letter J) in titlecase. Another example is the German Eszett character (ß
-U+00DF
), which has uppercase asSS
, while the titlecase isSs
.See Unicode FAQ and Unicode Derived Code Properties for code-points with
Changes_When_Titlecased
(CWT
) property.
New mb_ucfirst
and mb_lcfirst
functions
The new mb_ucfirst
and mb_lcfirst
functions provide multi-byte safe functions to change the first character's case to uppercase or lowercase for a given string.
Similar to the rest of the mb_*
functions, mb_ucfirst
and mb_lcfirst
functions also accept ?string $encoding = null
as the last parameter, and the first parameter on both functions is the string of which the case needs to be changed.
Note the multi-byte case conversions can change the byte-size (
strlen()
output) as well as the length (mb_strlen()
output) of the values. For example:
- The lowercase character of the Kelvin sign (
K
-U+212A
, taking 3 bytes) isk
(U+006B
, taking 1 byte).- Eszett character (
ß
) folds toSs
(titlecase) andSS
(uppercase). The byte-size remains 2 bytes, but in this case, the length (mb_strlen()
) changes from 1 to 2.This may affect functionality that validates the string length and size, such as a database index size limit.
mb_ucfirst
{#mb_ucfirst} Function
mb_ucfirst
function converts the first character of the given string to titlecase. The rest of the string remains unchanged, even if it is in upper case. The difference with ucfirst
function is that mb_ucfirst
supports multi-byte characters, and thus supports all Unicode case conversion rules.
/**
* Make a string's first character uppercase multi-byte safely.
**/
function mb_ucfirst(string $string, ?string $encoding = null): string {}
Usage examples
mb_ucfirst('test'); // Test - unchanged
mb_ucfirst('TEST'); // TEST
mb_ucfirst('tEst'); // TEst
mb_ucfirst('tEst'); // TEst
mb_ucfirst('łámał'); // Łámał
mb_ucfirst("\u{01CA}"); // "\u{01CB}"
mb_ucfirst("💓🙈"); // "💓🙈" - unchanged
mb_ucfirst("ß"); // "Ss" - Only the first S uppercase.
mb_lcfirst
{#mb_lcfirst} Function
Similar to the mb_ucfirst
function, the mb_lcfirst
function changes the first character of the given string to lowercase. Unlike the lcfirst
function, mb_lbfirst
can change multi-byte characters.
/**
* Make a string's first character lowercase multi-byte safely.
**/
function mb_lcfirst(string $string, ?string $encoding = null): string {}
Usage examples
mb_ucfirst('test'); // test - unchanged
mb_ucfirst('TEST'); // tEST
mb_ucfirst('tEst'); // tEst
mb_ucfirst('tEst'); // TEst
mb_ucfirst('Łámał'); // łámał
mb_ucfirst("\u{01CA}"); // "\u{01CB}"
mb_ucfirst("ß"); // "ß" - unchanged
PHP Polyfills
These functions can be trivially implemented in user-land PHP:
/**
* Make a string's first character uppercase multi-byte safely.
*/
function mb_ucfirst(string $string, ?string $encoding = null): string {
$firstChar = mb_substr($string, 0, 1, $encoding);
$firstChar = mb_convert_case($firstChar, MB_CASE_TITLE, $encoding);
return $firstChar . mb_substr($string, 1, null, $encoding);
}
/**
* Make a string's first character lowercase multi-byte safely.
*/
function mb_lcfirst(string $string, ?string $encoding = null): string {
$firstChar = mb_substr($string, 0, 1, $encoding);
$firstChar = mb_convert_case($firstChar, MB_CASE_LOWER, $encoding);
return $firstChar . mb_substr($string, 1, null, $encoding);
}
The above implementation can also be installed as a Composer package:
composer require polyfills/mb-ucfirst-lcfirst
Backward Compatibility Impact
The two new functions, mb_ucfirst
and mb_lcfirst
, are declared in the global namespace. Unless there is an existing function with the same name in the global namespace, this change has no backward compatibility impact.
Further, the new functions can be implemented in trivially user-land PHP.