How to dump and inspect PHP OPCodes
PHP is an interpreted programming language (with cached intermediary forms and JIT), and at its core is the Zend Engine. It executes a PHP program, and uses "Opcodes" as its execution unit.
An Opcode is a full PHP VM instruction, that may or may not have operands. These structures contain a simple and single unit of execution, and there is a finite set of such OPCodes.
Consider a Hello-World snippet similar to the one below:
echo "Hello World";
This PHP code, when turned into Opcodes that the Zend-Engine can consume uses ECHO
Opcode, with the string ("Hello World"
) as its operand.
Using extensions that hook into the Zend Engine, or using debuggers such as phpdbg
, it is possible to dump a list of Opcodes for a given PHP snippet/file.
The list of dumped Opcodes can then be inspected to compare performance differences and other internal information. Although this sounds like a rather mundane form of debugging, it can help work out the Zend engine internals for a given PHP snippet/file easily, without having to compile PHP from source, or having to use a full debugger.
Listing the Opcodes for a snippet is very helpful to quickly grasp the lower layer of the snippet. It can also reveal certain performance caveats and overlooked areas.
echo "Hello World";
The PHP snippet above produces an Opcode list like this:
0000 ECHO string("Hello World")
0001 RETURN int(1)
PHP has an internal optimization that it combined multiple echo
calls to a single ECHO
opcode, and the post-optimization Opcode can reveal this:
echo "Hello World";
echo "Foo";
echo "Bar";
0000 ECHO string("Hello WorldFooBar")
0001 RETURN int(1)
Dumping Opcodes
PHP's Opcache extension, the bundled phpdbg
debugger, and the Vulcan Logic Dumper (VLD) PECL extension provide easy ways to dump all the Opcodes for a given snippet or a file.
Regardless of how the Opcodes are retrieved, they all contain the Opcode name and zero or more operands the Opcode uses.
Dump OPCodes using OPCache extension
PHP's bundled OPCache extension has an INI directive, that prints the OPCodes. The output is quite simple, and supports the OPCode output prior and after the optimizations. OPCache extension must be enabled for this to work.
opcache.opt_debug_level
accepts a hex value to configure the OPCode output. Unset or set to 0
to disable the output.
opcache.opt_debug_level=0x10000
: Output OPCodes prior to optimizations.opcache.opt_debug_level=0x20000
: Output OPCodes After optimizations.opcache.opt_debug_level=0x40000
: Output OPCodes with Context-Free Grammar-
opcache.opt_debug_level=0x200000
: Output OPCodes with Static Single Assignments forms.
php -d opcache.opt_debug_level=0x10000 test.php
$_main:
; (lines=4, args=0, vars=0, tmps=0)
; (before optimizer)
; test.php:1-4
; return [] RANGE[0..0]
0000 ECHO string("Hello World")
0001 ECHO string("Foo")
0002 ECHO string("Bar")
0003 RETURN int(1)
With opcache.opt_debug_level=0x20000
, it is possible to determine the code after the OPCode optimizations.
php -d opcache.opt_debug_level=0x20000 test.php
$_main:
; (lines=2, args=0, vars=0, tmps=0, ssa_vars=0, no_loops)
; (before dfa pass)
; test.php:1-4
; return [long] RANGE[1..1]
BB0:
; start exit lines=[0-1]
; level=0
0000 ECHO string("Hello WorldFooBar")
0001 RETURN int(1)
Dump OPCodes using phpdbg
phpdbg
is a light-weight but PHP debugger that supports code coverage, step debugging, and printing OPCodes. phpdbg
is a part of the PHP project, and available since PHP 5.4.
On most Linux software repositories, phpdbg
is available with package names such as php-phpdbg
or php8.0-phpdbg
.
For Windows systems, the phpdbg.exe
executable is often bundled alongside the php.exe
executable in the same directory.
phpdbg -p* test.php
function name: (null)
L1-5 {main}() test.php - 0x41b465e0 + 2 ops
L4 #0 ECHO "Hello WorldFooBar"
L5 #1 RETURN<-1> 1
[Script ended normally]
Dump OPCodes using Vulkan Logic Dumper (VLD)
The PECL extension from Derick Rethans was one of the very first projects to provide OPCode dump features, and it is maintained today as well.
The extension must be installed first by using the pre-compiled binaries, or by compiling it from source.
php -d vld.active=1 test.php
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename: test.php
function name: (null)
number of ops: 2
compiled vars: none
line #* E I O op fetch ext return operands
-------------------------------------------------------------------
4 0 E > ECHO 'Hello+WorldFooBar'
5 1 > RETURN 1
branch: # 0; line: 4- 5; sop: 0; eop: 1; out0: -2
path #1: 0,
Inspecting OPCodes
Excellent article by Nikita Papov with an in-depth look to PHP's Virtual Machine and OPCodes
The list of OPCodes, it is now possible to take a look at some of the internal details that may not be immediately obvious. The internal optimizations can also reveal some of the common wisdom that may not hold true today.
At the moment, there are over 200 OPCodes in the Zend Engine. A list, and their definitions can be seen from the PHP source.
The following examples use the OPCache's OPCode output.
The OPCode output is grouped by code units, and makes it easier to inspect them:
greeting('PHP.Watch');
function greeting(string $name): void {
echo "Hello $name";
}
php -d opcache.opt_debug_level=0x20000 test.php
$_main:
; (lines=4, args=0, vars=0, tmps=0)
; (after optimizer)
; test.php:1-5
0000 INIT_FCALL 1 112 string("greeting")
0001 SEND_VAL string("PHP.Watch") 1
0002 DO_UCALL
0003 RETURN int(1)
greeting:
; (lines=4, args=1, vars=1, tmps=1)
; (after optimizer)
; test.php:3-5
0000 CV0($name) = RECV 1
0001 T1 = FAST_CONCAT string("Hello ") CV0($name)
0002 ECHO T1
0003 RETURN null
The $_main
section contains the main executable code, akin to other main()
functions in C or Rust.
Each section ($_main
and greeting
in this example), it shows additional information that provide the context.
; (lines=4, args=0, vars=0, tmps=1)
lines=4
There are 4 OPCodes.args=0
Number of arguments the code unit takes.vars=0
Number of variables in the code unit.tmps=1
Number of temporary variables in the code unit.
; (before optimizer)
This line explains the level of debugging information. This can range from before optimizer
to after optimizer
to before block pass
to before dfa pass
and after dfa pass
, with other several passes in between. The level of debug information is controlled with the opcache.opt_debug_level
INI setting
; test.php:1-5
This line is merely showing the file path and the range of lines of the given code block. In this example, the $_main
block is in test.php
file, from lines 1
to 5
.
The lines are numbered from 1.
After the meta information is the list of OPCodes for each code block. In OPCache OPCode output, they start from 0000
.
Followed by the number is the OPCode name, and then any operands for the OPCode.
Explaining the use of each OPCode is not in the scope of this article, but most of the OPCodes are documented and are often self-explanatory.
In this example at the first block:
0000 INIT_FCALL 1 112 string("greeting")
0001 SEND_VAL string("PHP.Watch") 1
0002 DO_UCALL
0003 RETURN int(1)
This set of OPCodes initializes a function-call (INIT_FCALL
) to greeting
operand, and sends a string value PHP.Watch
to it. After the call (DO_UCALL
), the code block RETURN
s a value of integer 1
.
In the second block:
0000 CV0($name) = RECV 1
0001 T1 = FAST_CONCAT string("Hello ") CV0($name)
0002 ECHO T1
0003 RETURN null
It creates a variable $name
, and assigns the first argument to that function.
Secondly, a FAST_CONCAT
OPCode concatenates 'Hello "
with the variable $name
. Notice how the Zend Engine turned a double-quoted variable interpolation into a FAST_CONCAT
OPCode. The resulting value is stored at T1
, and used with the ECHO
OPCode after.
Finally, the code block RETURN
s null
back to the caller.
OPCodes can also contain jumps to certain pointers within the same block.
for ($i=0; $i<5; $i++) {
echo "Hello";
}
The code within the for
loop, and the qualifier ($i<5
) means the execution will "jump" from one pointer to another, that is visible from OPCodes:
0000 ASSIGN CV0($i) int(0)
0001 JMP 0004
0002 ECHO string("Hello")
0003 PRE_INC CV0($i)
0004 T1 = IS_SMALLER CV0($i) int(5)
0005 JMPNZ T1 0002
0006 RETURN int(1)
The JMP 0004
OPCode means the execution will be jumped to position 0004
, which is the qualifier ($i<5
). The value of the qualifier is then stored in T1
, and the execution continues to 0005
.
In position 0005
, it contains a JMPNZ
OPCode, which means to jump to position 0002
if the value in T1
is not zero.
Inspecting the OPCodes prior and after optimizer can reveal some of the improvements to leverage them.
For example, PHP eliminates certain if
code blocks if it can preemptively determine conditions that would never be executed.
if (1 === 2) {
echo "Test";
}
The if
condition (1 === 2
) would never be true, so PHP can optimize this snippet to completely eliminate this block.
Before optimizer:
php -d opcache.opt_debug_level=0x10000 test.php
0000 JMPZ bool(false) 0002
0001 ECHO string("test")
0002 RETURN int(1)
After optimizer
php -d opcache.opt_debug_level=0x20000 test.php
0000 RETURN int(1)
PHP can work out more optimization patterns and eliminate OPCodes as well:
if (PHP_VERSION_ID < 80000) {
libxml_disable_entity_loader(true);
}
The PHP_VERSION_ID
constant refers to the PHP version ID, and it does not change for a given PHP setup. PHP OPCode optimizer can eliminate this block if the current PHP version does not fulfill the condition within this if
block, and results in an OPCode list like this:
0000 RETURN int(1)
However, if this code block is inside a namespace, Optimizer cannot apply this optimization because it is possible for the code to declare a PHP_VERSION_ID
constant within that namespace.
namespace Foo;
if (PHP_VERSION_ID < 80000) {
libxml_disable_entity_loader(true);
}
0000 T1 = FETCH_CONSTANT (unqualified-in-namespace) string("Foo\PHP_VERSION_ID")
0001 T0 = IS_SMALLER T1 int(80000)
0002 JMPZ T0 0006
0003 INIT_NS_FCALL_BY_NAME 1 string("Foo\libxml_disable_entity_loader")
0004 SEND_VAL_EX bool(true) 1
0005 DO_FCALL_BY_NAME
0006 RETURN int(1)
Notice the positions 0000
and 0003
attempts to resolve PHP_VERSION_ID
constant and libxml_disable_entity_loader
function within the current namespace Foo
. This is not optimal because the Optimizer cannot optimize this further safely.
namespace Foo;
if (\PHP_VERSION_ID < 80000) {
\libxml_disable_entity_loader(true);
}
With the \
prefix, the optimizer knows to not try and resolve constant and function names within the current namespace, and thus allows to use better OPCodes:
0000 JMPZ bool(false) 0004
0001 INIT_FCALL 1 96 string("libxml_disable_entity_loader")
0002 SEND_VAL bool(true) 1
0003 DO_FCALL_BY_NAME
0004 RETURN int(1)
In optimized form:
0000 RETURN int(1)
PHP provides various ways to retrieve its internal OPCodes for a given snippet/file. Bundled OPCode extension, phpdbg
, and VLD extension can dump the OPCodes with multiple optimization levels, entry points, and jump points.
OPCodes help inspect the lower-level constructs of a given PHP code, and can help apply tweaks to assist the optimizations.