Character escape sequences and numeric notations in PHP

Published On2021-02-28

PHP escape character usage in a terminal

Many modern programming languages support various way to use various characters such as simple English Latin characters, numbers, symbols, Emojis, and various special characters such as a new line or a tab character.

Most of the characters can be simply typed from a keyboard, and used in a PHP snippet as-is. For example, $string = "" is a completely valid string in PHP, and $num = 42 is a completely valid number. It is also possible to use multi-byte characters (i.e. characters that require more than one byte to store), and are completely valid: $emoji = "😻".

PHP, among many other programming languages, support several character escaping sequences to use various characters that cannot be typed from a standard keyboard, cannot be represented in a text form (such as invisible characters, or various control characters), or otherwise not readable. These characters use character escaping sequences that PHP recognizes.

For numbers, PHP supports the standard decimal numbers, but it is also possible to use other notations such as binary, octal, hexadecimal, and even scientific notation. They can make the code more readable and clearer depending in various contexts.

Double Quotes and Heredoc

In PHP, a string with double quotes ("string") or a Heredoc (see below) support character escaping sequences and variable interpolation.

Variable interpolation means that PHP will try to interpolate variables if the string literal is inside a double-quoted string, or a Heredoc.

$name = 'John';
echo "Hi $name"; // "Hi John"
$name = 'John';
echo <<<HEREDOC
Hi $name
// "Hi John"

Alternately, and preferably, variables being interpolated can be more readable curly braces:

$name = 'John';
echo "Hi {$name}"; // "Hi John"

Single-quoted strings ('string') and Nowdoc syntax do not interpolate variables:

$name = 'John';
echo 'Hi $name'; // "Hi $name"
$name = 'John';
echo <<<'NOWDOC'
Hi $name
// "Hi $name"

Only double-quoted strings and Heredoc support character escape sequences.

Character Escaping

Because PHP interprets and interpolates special characters inside double-quoted string literals and heredoc string literals, the backslash sign (\) is used as an "escape character".

For example, using \$name instead of $name prevents PHP from interpolating the $name variable.

$name = 'John';
echo "Hi \$name"; // "Hi $name"

Using two backslash characters prevent the backslash character itself from being the escaping character.

$name = 'John';
echo "Hi \\$name"; // "Hi \John"

PHP supports several special escape sequences for special characters. In the example above \$ is considered an escape-sequence, because it negates the PHP's interpolation by making PHP use the literal $ character.

Tab Characters: \t and \v

Perhaps the simplest one is the tab character. It is possible to use a tab character inside a string literal, but using the \t makes it more obvious that a tab character is used, instead of spaces. Using the \t instead of a literal tab character also avoids various IDEs from automatically changing tab characters to spaces.

echo "Foo\tBar";
Foo Bar

\v is a vertical-tab. On supported terminals, a vertical tab character advances to the next-character in the next line:

echo "Foo\vBar\vBaz";

New Lines: \r and \n

\r ("Carriage Return") and \n ("Line Feed") are new-line characters.

Historically, various systems started to use either \r or \n, and even Windows with \r\n. For example, Linux and MacOS use "line-feed" character (\n) as a new-line character by default, while Windows uses \r\n (a carriage-return, followed by a line-feed). Older MacOS systems used \r as the new-line character.

PHP has a PHP_EOL constant that always refers to the system-specific new-line character.

echo "Left\nLeft\nRight\nRight";

Escape character: \e

The escape character is used to often used to send ANSI escape sequences to a terminal. For example, \e, followed by [32m tells the terminal to change the color to green, and [33m for yellow.

echo "\e[32mGreen text\e[0m \e[33mYellow text\e[0";

If the snippet above is run in a terminal that supports ANSI escape sequences, it will interpret it and modify the text:

PHP escape character usage in a terminal

Form-Feed character: \f

Form-feed character is an ASCII control character for page breaks. Printers might eject the current page and start from the top of another. When \f is passed to a display terminal, it can clear the screen, although this is very rare in most terminal emulating software.

Octal ASCII Character Escaping Sequences

PHP supports escaping an Octal number to its ASCII character.

For example, the ASCII character (see chart) for P is 80 in decimal. 80 in decimal to Octal is 120.

An Octal character escape sequence can be used for the P character:

echo "\120";

It is in fact possible to represent any basic ASCII character with this notation:

echo "\120\110\120\56\127\141\164\143\150";

Any value in the range of \0 to \377 will be interpreted as an Octal character escape sequence.

Note that extended ASCII characters (128 through 255) numbers are not compatible with UTF-8. PHP considers a value of 128 (Oct: 200; Hex: 80) as invalid because it is not a valid UTF-8 value.

While PHP accepts such values, they are considered invalid characters in a UTF-8 context.

Hexadecimal ASCII Character Escaping Sequences

Similar to the Octal character escaping sequences, PHP also allows Hexadecimal numbers in a character escaping sequence with the \x prefix.

It only allows one byte, which means the valid range is x0 to xFF. However, the UTF-8 restriction still applies, and only values up to x80 are considered valid characters.

Further, Hexadecimal characters are not case-sensitive (i.e. AF is equal to af and aF).

ASCII P is 80, which is equal to x50:

echo "\x50";

The same "PHP.Watch" example can be made with Hex character escaping sequences:

echo "\x50\x48\x50\x2E\x57\x61\x74\x63\x68";

Unicode Character Escaping Sequences

PHP supports using any Unicode character with the \u prefix, and the Hex value of the code point inside curly braces.

echo "\u{1F418} - \u{50}\u{48}\u{50}\u{2E}\u{57}\u{61}\u{74}\u{63}\u{68}";
🐘 - PHP.Watch

PHP will throw a Parser error if the Unicode character is beyond the 10FFFF value:

echo "\u{10FFFF1}"
Invalid UTF-8 codepoint escape sequence: Codepoint too large on line ...

The 10FFFF upper limit is because UTF-8 specification declares the boundaries as U+0000 and U+10FFFF.

A prior version of this article incorrectly mentioned the upper limit as FFFFF, instead of the now corrected 10FFFF. Thanks to Sara Golemon for pointing it out.

The \u{} Unicode notation can be used as an escape sequence for any character. Here are some examples:

Character Code point (Dec) Code point (Hex) Unicode escape sequence
A 65 41 "\u{41}"
B 66 42 "\u{42}"
$ 36 24 "\u{24}"
8364 20AC "\u{20AC}"
\n (line feed) 10 A "\u{A}"
\r (carriage return) 13 D "\u{D}"
\t (horizontal tab) 9 9 "\u{9}"
\v (vertical tab) 11 B "\u{B}"
\e (escape) 27 1B "\u{1B}"
\f (form-feed) 12 C "\u{C}"
🐘 128024 1F418 "\u{1F418}"
3461 D85 "\u{D85}"

Binary Strings

Several years ago, PHP 5.2.1 introduced a new string syntax called "binary strings". It was merely a syntax, and it was meant as a forward-compatibility improvement for the upcoming PHP 6.

The syntax was to prefix the quotes single/double quoted string with b, and PHP can infer it as a binary string.

echo b'Foo';

is_binary, is_unicode, and is_buffer functions were to tell the binary strings and Unicode strings apart in PHP 6, but they never made it to the next PHP 7 version. Binary string syntax, however, made it to PHP 7 and continues in PHP 8 as well.

Binary string syntax has no special functionality, and serves no purpose other than being a historical remnant, a hair-pulling interview question, a tidbit for PHP articles, and confusing fellow PHP developers.

This syntax is not deprecated, nor planned to be removed in a future PHP version.

Numeric Notations

When using numeric literals in a PHP script, PHP expects decimal values by default. However, PHP also allows other numeric notations such as binary numbers, octal numbers, hexadecimal numbers, and scientific notation.

From PHP 7.4, PHP also allows underscore numeric separators for better readability for long numbers.

Binary Numeric Notation

Any numeric literal that starts with prefix 0b will be considered a binary number.

$number_binary  = 0b101010;

With underscore numeric separators, it is possible to use underscores for better readability.

$number_binary = 0b10_1010;
0b101010  === 42; // true
0b10_1010 === 42; // true

Octal Numeric Notation

PHP accepts octal numeric values with the 0 prefix. From PHP 8.1, PHP also supports explicit Octal numeric notation with 0O and 0o prefixes.

$number_octal = 052; // === Decimal 42
$number_octal = 0o52; // === Decimal 42
$number_octal = 0O52; // === Decimal 42

Hexadecimal Numeric Notation

Hexadecimal numbers are interpreted with \0X and \0x prefixes.

$number_hex = 0x2A; // === Decimal 42
$number_hex = 0X2A; // === Decimal 42

Scientific Numeric Notation

PHP also supports scientific "E notation" for float values.

$number_float = 42E1;

The E notation is equivalent to 42 * 10^0 (10 to the power of 0). Note that in many programming languages (including PHP), the ^ operator is used for XOR, while ** is used for power.

The readability advantage of scientific numeric notation is more pronounced for numbers that are very precise, or large numbers.

$planck_constant    = 6.62607004E-34;
$avogadros_constant = 6.022140857E+23;
3.844E5 === 3.844 * 10**5 === 384400.0; // true

PHP supports various forms of character escaping sequences and numeric notations. It recently added explicit Octal numeric notation with 0O/0o prefixes (PHP 8.1), and underscore numeric separators (PHP 7.4).

Note that these character escape sequences and numeric notations are not interpreted in user-input. For example, casting an underscore-separated number to an integer will not make use of PHP's underscore-numeric-seprator feature for user-provided strings.

var_dump((int) "2_34_5");
// int(2)

var_dump((int) "0xabcd");
// int(0)

Further chraracter escaping sequences used in user-input are not evaluated either. For example, if a form submits "\43" to a form field, that value will be used as-is, without being interpreted as an Octal character escape sequence. In contrast, $str = "\43" yields the equivalent "#" because they are evaluated in PHP source files.

Recent Articles on PHP.Watch

All ArticlesFeed
Function Inlining in Zend Engine

Function Inlining in Zend Engine

A list of special PHP functions that Zend Engine can inline and optimize.
How to dump and inspect PHP OPCodes

How to dump and inspect PHP OPCodes

OPCodes, the execution units the PHP's Virtual Machine executes, can be listed and inspected, to reveal performance and code structure caveats and improvements.
What's New in WordPress 5.7

What's New in WordPress 5.7

WordPress 5.7 is just around the corner, and here is a summary of what's new and improved in WordPress 5.7.
Subscribe to PHP.Watch newsletter for monthly updates

You will receive an email on last Wednesday of every month and on major PHP releases with new articles related to PHP, upcoming changes, new features and what's changing in the language. No marketing emails, no selling of your contacts, no click-tracking, and one-click instant unsubscribe from any email you receive.