PHP 8.0: New PhpToken Tokenizer class

Version8.0
TypeNew Feature

token_get_all function from tokenizer extension parsers a given PHP source code, and returns an array of tokens. Its array return value is not quite convenient to use.

PHP 8.0 comes with a new class PhpToken that provides an object-oriented approach to tokenizer results.

$code = "<?php echo "Hello world"; ?>";
$tokens = PhpToken::tokenize($code);

Prior to PHP 8.0 rc4, the PhpToken::tokenize() method was called PhpToken::getAll().

The return value is an array of PhpToken objects, that provides more fluent methods and public properties to retrieve information about each token.

PhpToken class synopsis

class PhpToken implements Stringable {

    /**
     * One of the T_* constants, or an integer < 256 representing a single-char token.
     */
    public int $id;

    /** The textual content of the token. */
    public string $text;

    /** The starting line number (1-based) of the token. */
    public int $line;

    /** The starting position (0-based) in the tokenized string. */
    public int $pos;

    /** @return static[] */
    public static function tokenize(string $code, int $flags = 0): array {}

    final public function __construct(int $id, string $text, int $line = -1, int $pos = -1) {}

    /**  
     * Whether the token has the given ID, the given text,
     * or has an ID/text part of the given array.
     */
    public function is(int|string|array $kind): bool {}

    /** Whether this token would be ignored by the PHP parser. */
    public function isIgnorable(): bool {}

    /** Get the name of the token. */
    public function getTokenName(): ?string {}

    /** Returns $text property */
    public function __toString(): string {}
}

Related Changes from the snippet

Each returned PhpToken object will contain public properties $token->id, $token->text, $token->line, and $token->pos to retrieve information about the token.

  • $token->id: One of the T_* constants, or an integer < 256 representing a single-char token.
  • $token->text: The textual content of the token.
  • $token->line: The starting line number (1-based) of the token.
  • $token->pos: The starting position (0-based) in the tokenized string.

The methods $token->is(), $token->getTokenName(), $token->isIgnorable(), and $token->__toString() can return additional information as well.

  • PhpToken::is(int|string|array $kind): bool: Returns whether the token is of a given T_ token (integer), a string token, or an array of int|string tokens to match against the token ID.
  • $token->getTokenName: The textual content of the token.
  • $token->isIgnorable(): Whether this token would be ignored by the PHP parser.
  • $token->__toString(): Returns the $token->text value.

Backwards Compatibility Impact

PhpToken is a new class added in PHP 8.0, and unless there is no such class in user-land PHP code, there should not any upgrading issues.

The functionality can be backported to other PHP versions as well. See phpwatch/phptoken-polyfill


RFC Discussion Implementation