PHP JIT in Depth
One of the most important new features in PHP 8.0 is Just-In-Time Compiler. JIT can bring performance improvements by compiling and storing the full or frequently called parts of a PHP application as CPU machine code, and directly execute it, which bypasses the Zend VM and its process overhead.
JIT is a hybrid of the traditional interpreters and Ahead-Of-Time (AOT) compilers. The hybrid model brings the pros and cons of both approaches, and a finely tuned application can outweigh the cons of JIT.
PHP's JIT implementation is with amazing efforts of Dmitry Stogov for over a few years' worth of discussions, implementations, and tests.
PHP JIT: The Basics For PHP 8.0's JIT overview and configuration options, see PHP 8.0: JIT. This post is about benchmarks, how JIT works, and ideal configuration options.
Most of the PHP applications accept HTTP requests, retrieve and process data from a database, and return a result. More often than not, the important performance bottlenecks are with IO: reading data from disk, writing, and network requests.
PHP 8.0 introduces JIT, as a next step to improve performance of PHP applications, but it also adds a significant barrier in debugging, because some parts of the application might be cached as CPU machine code, which standard PHP debuggers cannot work with. The PHP 8.0's JIT pull-request is well over 50,000 new lines added to PHP code base, that PHP core developers themselves, apart from those working on JIT, might not be well-versed at.
PHP VM
PHP code, once processed (tokenize, parse, build AST, and build opcodes), is run on Zend Virtual Machine. Similar to Java and JavaScript, the virtual machine abstracts the hardware side of the application, which makes it possible to "run" a PHP source code without compiling.
Opcache extension can help store the opcodes in a shared memory, to skip repetitive tokenize/parse/opcode steps.
PHP already includes several optimizations such as dead code elimination at Opcode level, but it was not possible to perform optimizations beyond the virtual machine level, because at that point, the code is interpreted by the virtual machine, as opposed to compiling it.
Handing Off to Other Applications
PHP already has several integrations that it invokes other applications that are already compiled.
GD extension might be the closest one that rings a bell; If PHP were to manipulate images at vector or bitmap level, it would have been very slow, due to the PHP's additional layer of virtual machine. GD extension, which invokes the compiled binaries, can make use of advanced CPU instructions to perform the same actions..
PHP 7.4 introduces Foreign Functions Interface (by none other than Dmitry's work), that provides a unified interface to invoke arbitrary applications without having to develop a PHP extension. It is possible to integrate traditionally compiled languages such as C and Rust with PHP, thanks to FFI.
Compiling PHP Code
The natural next step of reaching as close as possible to the CPU is skipping the virtual machine, and that is what JIT is.
Just-In-Time compilation is a feature that JavaScript successfully adopted many years ago with the V8 engine. Other languages implement a JIT one way or the other too. The biggest advantage is that the source code still does not require to be pre-compiled, but with a shared cache of compiled machine code, the language can trigger the code to be executed with compiled machine code, compiled for later, or executed without JIT.
LLVM
The LLVM is a popular compiler tool-set, that helps develop the compilers for a majority of AOT languages today.
LLVM's targets include the x86, x86-64, and several other types including graphics processors, web assemblers, ARM, etc.
PHP considered using LLVM, but it was not very fruitful due to the compiler speed being not in favor.
DynASM
DynASM, from LuaJIT project, was much faster for PHP's JIT. Its support for target CPU instruction sets is limited compared to LLVM, but it provides support for x86 and x86-64 instruction sets; the most common ones for a server-side programming language such as PHP.
PHP 8.0's JIT implementation uses DynASM for its code generation. PHP's JIT is bound by the limitations of DynASM for target processor architectures.
How PHP JIT Works
PHP JIT is implemented as a part of Opcache. This keeps JIT separated from the PHP engine.
The three components of JIT is to store, inspect, and seamlessly invoke the code with the virtual machine or directly using the machine code stored in the buffer.
Buffer
JIT Buffer is where the compiled CPU machine code is stored. PHP provides configuration options (opcache.jit_buffer_size
INI setting) to control how much memory should be allocated for the JIT buffer.
Triggers
Triggers in Opcache are responsible in invoking the compiled machine code when it encounters a code structure. These triggers can be a function call entry, a loop, etc.
Tracer
JIT tracer functionality inspects the code before, after, or during its execution, and determines which code is "hot", as in which structures can be compiled with JIT.
Tracer can compile the code as it is being run, when a certain code structure reaches the threshold, that is also configurable.
Tracing JIT and Function JIT
PHP 8.0 adds two modes of JIT operation. This is further customizable, but the most prominent types of JIT functionality are aliased function
and tracing
.
Function JIT
Function JIT mode is a rather simple one in comparison. It JIT compiles a whole function, without tracing for frequently used code structures such as loops inside a function. It still supports profiling for frequently used functions, and triggering a JIT compile or execution of the compiled machine code at before, after, or during the execution of an application request.
Tracing JIT
Tracing JIT, that is selected by default in PHP 8.0, tries to identify the frequently used parts of code, and selectively compiles those structures for the best balance of compilation time and memory usage. Not all programming languages support tracing JIT compilers, but PHP supports tracing JIT right out of the first release, and is selected by default too.
There are several configuration options that enable further tweaking how a hot code structure is determined, such as the number of function calls, number of iterations of a loop structure, etc.
Profiling and Optimizing
JIT can inspect, profile, and optimize the code as it is being run. PHP JIT offers granular control over the thresholds and triggers as to how many invocations make it a worthy candidate to JIT compile into machine code, and it can use the newly compiled code. Subsequent requests can make use of the compiled code if it is present in the buffer too.
PHP's JIT implementation allows to fine tune when JIT should be used (when the script is loaded, after the first run, or during the execution), what (the whole function, or individual code structures), and how the optimizations be made (use of AVX instructions, use of CPU registers, etc.)
JIT-friendly code
JIT benefits heavily when it can offload as much as possible to native CPU registers and instructions. PHP is a weakly typed language, which makes it difficult to infer the type of a variable, and requires more analysis of the variable life-cycle because the type of a variable might change at a later point in the same code structure.
Strictly typed code, and functions with scalar types can help JIT to infer types and make use of CPU registers and specialized instructions where possible. For example, a pure function (that has no side-effects), with strict types enabled and with parameter and return types might make a perfect candidate:
declare(strict_types=1);
function sum(float $a, float $b): float {
return $a + $b;
}
When PHP cannot infer the types, it might not be able to make the best use of the JIT optimizations.
Some of the improvements in PHP 7, in fact, come from these optimizations that it can eliminate dead code and improve reference counting. This means more strictly typed code gives more opportunities for PHP to optimize code at Opcache level, and also at JIT level.
Applications that are IO-bound, such as the ones that extensively use a database, DNS queries, file read/write operations, FTP, sockets, etc. might not see a noticeable difference because more often than not, the IO operations are themselves the bottleneck of such application.
Basic JIT Configuration
By default, JIT is enabled, but it is turned off by limiting the buffer size.
PHP JIT: The Basics For PHP 8.0's JIT overview and configuration options, see PHP 8.0: JIT. This post is about benchmarks, how JIT works, and ideal configuration options.
The simplest setup is to simply set a buffer size for JIT, and JIT will use the sensible defaults it comes with.
opcache.enable=1
opcache.enable_cli=1
opcache.jit_buffer_size=256M
This allocates 256 MB for the JIT buffer, and enables JIT on CLI applications as well.
The opcache.jit
directive allows to fine tune the JIT functionality.
opcache.jit=tracing
opcode.jit
is a somewhat complicated configuration value. It accepts disable
, on
, off
, trace
, function
, and a 4-digit value (not a bit-mask) of 4 different flags in the order.
disable
: Completely disables JIT feature at start-up time, and cannot be enabled run-time.off
: Disabled, but it's possible to enable JIT at run-time.on
: Enablestracing
mode.tracing
: An alias to the granular configuration1254
.function
: An alias to the granular configuration1205
.
PHP JIT accepts tracing
or function
as an easy configuration that represents a combination of configuration.
In addition to the tracing
and function
aliases, the opcache.jit
directive accepts a 4-digit configuration value as well. it can further configure the JIT behavior.
The 4-digit configuration value is in the form of CRTO
, where each position allows a single digit value for the flag designated by the letter.
JIT Flags
The opcache.jit
directive accepts a 4-digit value to control the JIT behavior, in the form of CRTO
, and accepts following values for C
, R
, T
, and O
positions.
CPU-specific Optimization Flags
0
: Disable CPU-specific optimization.1
: Enable use of AVX, if the CPU supports it.
Register Allocation
0
: Don't perform register allocation.1
: Perform block-local register allocation.2
: Perform global register allocation.
Trigger
0
: Compile all functions on script load.1
: Compile all functions on first execution.2
: Profile first request and compile the hottest functions afterwards.3
: Profile on the fly and compile hot functions.4
: Currently unused.5
: Use tracing JIT. Profile on the fly and compile traces for hot code segments.
Optimization Level
0
: No JIT.1
: Minimal JIT (call standard VM handlers).2
: Inline VM handlers.3
: Use type inference.4
: Use call graph.5
: Optimize whole script.
The option
4
under Triggers (T=4
) did not make it to the final version of JIT implementation. It was trigger JIT on functions declared with@jit
DocBlock comment attribute. This is now unused.
Both function
and tracing
JIT configurations make use of CPU instructions sets and CPU register allocations for to make the most of CPU capabilities (C=1, R=2).
opcache.jit=function
function
is an alias to C=1, R=2, T=0, O=5.
The difference with function
configuration is that it is eager to compile the script as soon as possible, and compiles the whole script. It is a more presumptuous and a bold approach, akin to preloading PHP files to Opcache with preloading feature in PHP 7.4.
opcache.jit=tracing
tracing
is an alias to C=1, R=2, T=5, O=4.
With tracing enabled, JIT can be more granular and pick code segments within a function to compile. Ideal candidates would be looping structures, and functions that are called frequently.
This is the default configuration, that it can provide more balance between the performance benefits and compilation overhead.
JIT tracing functionality (T=2, 3, or 5) allows further tuning as to how many invocations it takes for a function to be marked as hot, and then eventually JIT compiled.
Directive | Description | Default value |
---|---|---|
opcache.jit_hot_loop | After how many iterations a loop is considered hot. | 64 |
opcache.jit_hot_func | After how many calls a function is considered hot. | 127 |
opcache.jit_hot_return | After how many returns a return is considered hot. | 8 |
opcache.jit_hot_side_exit | After how many exits a side exit is considered hot. | 8 |
The default values might be the most suitable for almost all applications, and lowering them results in more code structures to be compiled as they reduce the threshold.
Ideal JIT Configuration
More JIT compiled code does not necessarily mean a faster application (as seen in web application benchmarks below). The compilation overhead, coupled with a smaller buffer can make the applications rather slow, due to the time spent on JIT compilation steps.
The opcache.jit
value is better left untouched (default is tracing
) as it already provides a good balance of CPU utilization, memory, and keeping track on which code structures are compiled.
JIT will not gain any meaningful performance benefits for heavily IO-bound applications. Majority of web applications today are in fact IO-heavy, where JIT will not make a difference, let alone a positive one.
For the buffer size, pay attention to not have a too small memory, which can waste the JIT compiled code and result in frequent re-compilations. A too big of a memory can be an overkill too. A value of 50-100% of the current Opcache shared memory for Opcode might be the ideal value for opcache.jit_buffer_size
.
JIT Benchmarks
All the tests below were done on an 8 core 16 thread x86-64 system. The tests however never use integers that require 64-bit registers, to keep the test more relevant to x86 CPUs.
PHP Script Benchmark
PHP source includes two benchmark scripts, that tests various PHP functionality. micro_bench.php
and bench.php
files were put to test on the PHP 8.0 branch (which contains a few bug fixes since PHP 8.0.0 release).
The first test was done with Opcache completely turned off, and the second one with JIT turned off, but opcache enabled.
Both JIT modes bring substantial performance gains, with tracing
mode being a little ahead.
This benchmark hardly represents a real-life PHP application. The repetitive calls to the same function, and the simpler and side-effect less nature of the tested code gives advantage to JIT.
PHP Fibonacci Benchmark
A simple Fibonacci function to calculate the 42nd number in the Fibonacci sequence.
Fibonacci sequences are all about recurring function calls, and does not say a full story of a real-life PHP application either, unless it is of course a Fibonacci calculator application.
Fibonacci: PHP vs Other Languages
The same Fibonacci(42) test was put with the other compiled languages (such as Go, Rust, and C), and Node JS, which has JIT feature too.
PHP 8.0's JIT does not attempt all possible optimizations that other AOT compiled languages can perform. PHP 8.0's JIT however brings a substantial performance boost, with still more leeway to improve.
Web Application Benchmark
It is difficult to predict the impact of JIT because JIT highly depends on the underlying workload. Most of the examples below are the hello-world examples of web frameworks, that do not necessarily represent real-world usage due to the various plugins and caching systems involved.
Applications that make use a database connection will likely have the biggest bottleneck at the database queries. On a web server test that requests per second is measured, the TLS, HTTP, and FPM overheads might far outweigh the performance difference JIT makes.
Laravel (8.4.4) and Symfony (demo 1.6.3, using 5.1.8 components) with their skeleton applications were tested on the same hardware in same scenarios as the previous benchmarks. Both applications were served by the built-in PHP web server, and benchmarked using Apache Bench (ab
), with concurrency of 5, and 100 requests. Average of 5 tests.
Both applications did not receive a noticeable benefit, and in Laravel, the performance was ~2% worse with JIT, likely due to the compiling overhead that did not outweigh the efforts.
Benchmark Each Application
For real performance benefits, each application will need to go under a benchmark to measure if using JIT can make a noticeable benefit.
CLI applications, especially CPU intensive ones will likely gain substantial performance improvements.
For network and file-intensive applications such as Composer and PHPUnit will not likely see a performance gain to as they do not benefit a lot from the machine code improvements. Throw in more SSD/RAM capacity and bandwidth for better results.
Closing Thoughts
JIT is a great step in making PHP perform faster, and make use of the capabilities of underlying hardware. It is many years of efforts, and it already shows substantial improvements in computationally intensive work loads.
There is still a leeway for PHP's JIT to improve, and it will likely only get better from this point forward.
A huge thanks to Dmitry Stogov and Nikita Popov for their amazing work on JIT. Nikita also kindly and quickly reviewed the first portion of this article in how JIT works part. Thank you ❤🙏🏼.