OPCache: Direct execution opcode file without php source code file #6146

chopins · 2020-09-16T16:18:08Z

Function change:

opcache_compile_file(string $file, string $opcode_file = null): bool

Add paramter $opcode_file.
if given $opcode_file , the function will prepend <?phpo to php opcode file and save to $opcode_file
if $opcode_file is null be same as the current state.
<?phpo tag are prepend to file of opcode only when useing opcache_compile_file() and provide second args
if opcode file start is <?phpo, opcache can be executing it without php source file.
opcache.file_cache_only must be enable and opcache.file_cache must be set

OPCache change:

$opcode_file that starts with <?phpo, will remove below verify:

accel_system_id
validate_timestamps

Add `opcache.allow_direct_exec_opcode` configuration options

1.if set 0, default value, same as the current state, can not direct exec any opcode file
2. if set 1, only when opcode file start is <?phpo, direct exec opcode file without php source file

Add `opcache.prohibit_different_version_opcode` option

set 1, default value, different version opcode file exec are prohibited
set 0, different version opcode file will report E_WARNING message

Notice:

if code directory tree change, the PHP magic constant associated with a path will invaild

Example

compile php file myphp.php, code similar to below:

opcache_compile_file('myphp.php',  'myphp.bin.php');

then can exec php myphp.bin.php or include 'myphp.bin.php' without myphp.php
myphp.bin.php will similar to:

<?phpo{phpversionid}OPCACHE575d367cc725713f6f170910d6e9ee5e-------BINARY CONTENT OF OPCODE----

implementation

compile to file:

current opcache store to file:
php--->load code--->compile to opcode --->store to cache system directory-->same path file
path added optional:
php--->load code--->compile to opcode--->save to cache system directory-->same path file-->copy cache file to the specified path

opcache exec process:

current opcache exec:

[php]-->[find cache in cache system directory]-->[found] -->[exec opcode]
                                                                 \--->[not found]--> [exec php source]

path added optional:

[php/phpo]--->[is phpo, <?phpo exist]--->[load the <?phpo file] ---->[exec opcode]
          \---->[not phpo] ---> [find cache in cache system directory]-->[found] -->[exec opcode]
                                                   \------------>[not found] --> [exec php source]-->[auto cache opcode]

sync master

merge

sync

sync master

thg2k · 2020-09-21T11:51:48Z

Any chance we can have something like <?ophp instead of <?pho? I really don't like it but I really like the feature

chopins · 2020-09-22T05:10:42Z

The <?pho is only opcdoe file flag and php do not parse it,
Refer to pyo of python and tag <?php，so use it

nikic · 2020-09-23T10:24:37Z

I'd suggest starting a discussion on the PHP internals mailing list for this feature.

chopins · 2020-09-25T14:57:40Z

@nikic
I'm not join php internals mailing list
so please forwarding this feature

dtakken · 2020-09-30T09:51:34Z

@nikic
I'm not join php internals mailing list
so please forwarding this feature

Anyone is welcome to join: https://www.php.net/mailing-lists.php

sync master

brzuchal · 2020-10-01T10:25:35Z

IMHO the string OPCACHE is a sufficient magic number / magic bytes of hex 4F 50 43 41 43 48 45.

List of file signatures

ext/opcache/tests/opcode_store_specified_file.phpt

ext/opcache/zend_accelerator_module.c

mvorisek · 2020-10-01T11:18:39Z

Are opcodes designed to be cross versions compatible?

chopins · 2020-10-01T14:51:31Z

@mvorisek can not cross-version.
Opcode files are simply cache files that can be moved anywhere.
For example, Java,.NET is version dependent

bozhinov · 2020-12-12T16:57:01Z

@Girgias I was not referring to the speed but to the FFI. And you completely missed my point about having to sell the actual source code. it is not about the obfuscation but protecting what's yours. It is all readable if you ask me.

@IMSoP I will edit this one here cause your latest statement seems like a good closure to this discussion.
but for the record copy and pasting reversed optimized code is not the same as copying the actual code
I guess you work for Zend and we have the opcode cache thanks to you guys. but we struggle to get even minor requests in
stuff that will make it even more useful than it currently is. There are a total of six exported functions according to the docs and I m sure there are a lot more in there we can leverage.

@TysonAndre, thank you sir for spending the time to explain it in simple terms and examples.
I get there are quirks to writing a portable application but the OS and the hardcoded paths is something we (the php coders that don't work on the interpreter) can actually deal with as long as it is documented. We can't do C/C++ but can read docs and write unit tests.

The only points I totally understand are:

It is fairly new
It will cause problems that regular php coder can't trace which will lead to numerous bug being reported and dev time being wasted.

I will back off now.

IMSoP · 2020-12-12T17:05:13Z

And you completely missed my point about having to sell the actual source code. it is not about the obfuscation but protecting what's yours. It is all readable if you ask me.

If you're talking about licensing implications, then what you're selling is the legal permission to use it in certain ways. That's no different whether the code you're distributing it is PHP, opcodes, or your own custom programming language. For that matter, it's no different whether you're distributing C source code or native x86 machine code.

As far as I'm aware, there's no legal definition of "the actual source code" that would make any difference whatsoever.

IMSoP · 2020-12-12T17:12:50Z

There seem to still be some misconceptions about what opcodes are, and therefore what this feature would achieve.

Firstly, PHP opcodes are not native machine code.

To execute PHP opcodes, you need to run the Zend Engine. As far as I know, the only other things that can do anything with opcodes are debugging tools, most of which are compiled on top of Zend Engine anyway.

So opcodes do not make embedding or linking to PHP from other languages or programs any easier. Whether you have PHP source code or opcodes, you will need the Zend Engine to execute them.

Secondly, PHP opcodes are not equivalent to Java bytecode or .Net CIL.

The JVM and .Net runtime were explicitly designed for portability - "compile once, run anywhere". They define a standardised intermediate language, with strong guarantees about compatibility between versions and environments. To repeat: these are not incidental features, they are at the very heart of the design of these technologies.

The Zend Engine in general, and OpCache in particular, has almost the opposite aim: its job is to make code run fast on the current environment. It can and will generate different representations based on factors like:

Changes added in a the latest build of PHP
Extensions currently loaded
Location of the file being compiled (e.g. the __FILE__ and __DIR__ constant folding mentioned by Tyson above)
CPU architecture
Operating system

So opcodes are not, and never will be, a good way to distribute code to multiple targets. If you put opcodes in a PHAR file, that PHAR file is going to be useless to 99.999% of other PHP users, whose environment won't match yours.

Finally, opcodes are not a good way to obfuscate PHP.

Like portability, obfuscation is not a design aim, and it wouldn't make sense to compromise on other aims for that purpose. If you want to obfuscate PHP code, a standalone tool that operates on the PHP source code will be better in almost every way:

It can be distributed and installed independently of PHP. (You might think built-in tools are better, but they mean slower updates, and a more awkward install process.)
It will be portable across all systems, and as many versions of PHP as you want.
It can rename classes, functions, etc which you mark as internal only, by analysing the whole project at once (opcache operates one file at a time).
It can be configured to strip out things which would normally be available in reflection, like docblock comments.
It can, if you want, use obfuscation tricks that have a performance penalty.

TysonAndre · 2020-12-12T18:31:27Z

Smaller bundles: I'd like to see some data, but my suspicion is that the actual "compression" of compiling scripts to opcodes is negligible (not counting the stripping of comments, which can be accomplished with much simpler minifiers)

Additionally, this might increase download sizes for end users because multiple versions may end up being distributed in the same package depending on the supported platforms

(e.g. Include foo.php81_linux.opcache, foo.php82_linux.opcache, foo.php81_windows32.opcache, foo.php82_windows64.opcache, etc.
(For all combinations of 32-bit/64-bit, os, etc. Or maybe publishers would drop 32-bit support)

I don't get to strip all the comments and remove all references to internal dev stuff before I ship it.

Open source tools to automate minifying code already exist - e.g. https://github.com/box-project/box/blob/master/doc/configuration.md#compactors-compactors for phars, and probably many others.

Stripping out tokens of kind T_COMMENT and T_DOC_COMMENT from https://www.php.net/token_get_all would do the same thing,
and standalone tools can be built on parsers such as https://packagist.org/packages/nikic/php-parser (e.g. renaming variables before converting back to php source code (in functions without dynamic variable access, $$, or closure uses), rendering without all/most comments (e.g. everything except @license comments), etc.) or by extending tools that already use those parsers.

TysonAndre · 2020-12-12T18:35:25Z

To some extent, the security of code deployed on public servers is protected, such as shared hosts

There's also security downsides - this makes it much easier for malicious users to obfuscate code (for antivirus products, intrusion detection software, etc) when exploiting vulnerabilities.

It also allows attackers to manually craft files with opcodes that would access invalid memory locations or cause php to execute c functions in unexpected ways (currently, the correctness of opcache is the only thing that ensures that the PHP VM doesn't read or write out of bounds memory location, I don't expect that to change)

This RFC would make it easier to bypass obstacles/protections the administrator of public servers had in place, such as disable_functions, protecting directories, system_id being harder to guess for someone without an account on the server in question, etc.

EDIT: Oh, allow_direct_exec_opcode is disabled by default, only public servers where the administrator enabled the setting (e.g. if they used a product relying on this) would be affected.

https://www.gosecure.net/blog/2016/04/27/binary-webshell-through-opcache-in-php-7/ is one example of that and the current things that make it harder for an attacker to overwrite opcache's opcodes (system id)
It also mentions a tool for visualizing the opcodes from the compiled binary files https://www.gosecure.net/blog/2016/05/26/detecting-hidden-backdoors-in-php-opcache/

(I've never used this tool and I'm unfamiliar with that blog/company, this was the first google result for vulnerabilities with opcache.file_cache)

bozhinov · 2020-12-12T19:01:10Z

@TysonAndre I guess you did not visit the github of that page where there is a py script to scrape the system_id
Also check how it evolved since I posted the initial solution here in 2017: link to php bugs

ext/opcache/zend_file_cache.c

TysonAndre · 2020-12-12T19:53:31Z

@TysonAndre I guess you did not visit the github of that page where there is a py script to scrape the system_id

The link was intended to give background for readers unfamiliar with php's opcode cache and what kind of security drawbacks could be expected from systems running opcodes distributed with php source libraries.

If the attacker that doesn't have an account on your system can run python scripts (as the same user as the web server), you already have an issue.

No, but https://github.com/GoSecure/php7-opcache-override/blob/master/system_id_scraper.py - interesting, it's more predictable that I would have guessed without looking at the code. I would have hoped that immutable system information such as hardware ids could be used if available but I guess the main purpose is just to avoid conflicting with incompatible php opcode versions - hardware ids probably wouldn't work well with docker or virtualization anyway.

Also check how it evolved since I posted the initial solution here in 2017: link to php bugs

For that bug report, I don't think php's maintainers would have interest in exposing functionality that makes it easier to exploit; if a researcher really wanted it, they could publish an external PHP module duplicating the function in opcache to help other security researchers/hobbyists.

But the system_id isn't really related to my concern about this RFC - "The RFC allows attackers to manually craft files with opcodes that would access invalid memory locations or cause php to execute c functions in unexpected ways (on systems where allow_direct_exec_opcode is enabled)"

bozhinov · 2020-12-12T20:13:21Z

The goal back then was to have the opcode cache outside of the htroot and 0 byte php files in the htroot. so how is the attacker getting access to the opcode cache ? the attack scenario assumes you can upload a file outside of the htroot.
I guess you are right though. Putting the pho files in the htroot is a poor choice in terms of security

ext/opcache/zend_file_cache.c

TysonAndre · 2020-12-12T20:37:29Z

An 'unassembler' for opcodes can be written in a weekend (trust me on this, I've done it). Neither this, nor any similar approach will accomplish this motivation.

Agreed.

https://github.com/GoSecure/php7-opcache-override/blob/master/analysis_tools/opcache_disassembler.py exists, for an example (targeting an older php minor version, struct layout and opcodes change in every minor version)

PHP itself comes with the facilities to dump opcodes while compiling through https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.opt_debug_level - I imagine the ability to dump opcodes loaded from a file would additionally be added (and would be simple to add) if it became necessary to understand user-submitted bug reports.

And opcode editors/decompilers would be created and improved upon if obfuscation became common

IMSoP · 2020-12-12T21:41:49Z

I guess you are right though. Putting the pho files in the htroot is a poor choice in terms of security

I'm not sure the concept of an "htroot" has much meaning in a lot of modern applications. Dynamic content is rewritten to run a single "front controller" script, and static content is served from a separate directory; the two could be anywhere on disk. Any ".php" files that somehow get into the static content directory can be configured to go nowhere near PHP, and just serve a 403 Forbidden or 404 Not Found. That has nothing to do with ".php" vs ".pho", just application design and server configuration using the tools we already have.

chopins · 2020-12-13T12:08:15Z

The protection of opcode files is that they are not easily modified by everyone (including those with limited computer knowledge).
Only prevent the good man.
Bind php version will avoid compatibility problems.

TysonAndre · 2020-12-13T15:50:14Z

The protection of opcode files is that they are not easily modified by everyone (including those with limited computer knowledge).

If you mean obfuscation, I'd recommend using a php to php obfuscator as in #6146 (comment)

If you mean security, then if a single person publishes a "phpo" opcode file that can be used as an exploit or proof of concept (or tool to generate those files), anyone with limited computer knowledge could just copy the exploit, or change that for their purposes. The effect would thankfully be limited due to the system ini setting only affecting people using libraries that would use that functionality, but it's a concern if it is widely used.
(e.g. bypassing the php VM sandbox by arbitrary pointer reads/writes to leak raw memory (e.g. user-submitted data or app secrets such as passwords from other accounts on shared hosting), bypassing the php VM sandbox to exploit a running web server, etc)

(currently, the correctness of the files that are generated by opcache is the only thing that ensures that the PHP VM doesn't read or write out of bounds memory location (the php interpreter doesn't check opcodes that are loaded), I don't expect that to change)

Java bytecode has a verifier, and PHP doesn't - there are no plans I know of to add that to php and there are far, far more developers working on Java than on PHP's opcache - https://www.oracle.com/java/technologies/security-in-java.html

What about the concept of a "hostile compiler"? Although the Java compiler ensures that Java source code doesn't violate the safety rules, when an application such as the HotJava Browser imports a code fragment from anywhere, it doesn't actually know if code fragments follow Java language rules for safety: the code may not have been produced by a known-to-be trustworthy Java compiler. In such a case, how is the Java run-time system on your machine to trust the incoming bytecode stream? The answer is simple: the Java run-time system doesn't trust the incoming code, but subjects it to bytecode verification.

The tests range from simple verification that the format of a code fragment is correct, to passing each code fragment through a simple theorem prover to establish that it plays by the rules:

it doesn't forge pointers,

it doesn't violate access restrictions,

it accesses objects as what they are (for example, InputStream objects are always used as InputStreams and never as anything else).

sgolemon · 2020-12-14T17:14:26Z

Hi Sara, looked you up like a total creeper would.

Might I also recommend looking up prior requests for this feature? The motivations for it have been the same, and the reasons why it's not a good idea haven't changed either.

Owning the actual code (which is what the client gets when you sell your code) is not the same as owning something that can be reversed back to code (not the actual one)

Agreed, however (as I already said), there are much simpler ways to do this which have much fewer compatibility implications.

I don't get to strip all the comments and remove all references to internal dev stuff before I ship it. It is not a big thing but then I have to store that version as well.

Let me get this straight.

You can't run a quick script at library bundling time to minimize/obfuscate your code.
You can run a quick script at library bundling time to serialize your code (without obfuscation) to bytecode.

Does that describe what you just said?

You put those opcodes in a phar archive and you get a JAR - and Java must be doing something right, right ?

This one is going to blow your mind. JAR files can be unassembled too. Five seconds of googling found me a handy web-based tool that I need zero technical skill to use.

You have a FFI interface now - so maybe we can use PHP for more than just web? It is fast enough. I m thinking Python and they have PYC

This is a non-sequiter. Yes, you can use PHP for non-web tasks. That has nothing to do with the argument at hand.

Nothing.

bozhinov · 2020-12-15T02:01:49Z

OK so we agree on something and you did get my sample use case right. Thank you.
I m going to shut up now as it would seem we are getting this implemented and I m grateful
fyi I use a Java decompiler (or two) at least once a week.

sync

ramsey · 2021-06-09T21:51:21Z

Can someone summarize the outcome of the discussions here and on the mailing list? It's unclear to me where things ended up. Is this in or out?

IMSoP · 2021-06-09T21:57:14Z

@ramsey The RFC was voted on and unanimously rejected: https://wiki.php.net/rfc/direct-execution-opcode#vote

The most succinct summary is probably the one Sara Golemon put in the voting thread on the mailing list:

Voted no because, as stated during discussion, this is brittle, provides a false sense of security, and doesn't fix any problem.

nikic · 2021-06-10T10:39:09Z

Closing this per above comments (RFC has been declined).

acicali · 2021-08-06T20:53:16Z

Seems like this would primarily benefit someone who wants to hide malware on a compromised server.

hegoku · 2023-03-28T13:21:16Z

Why this RFC is not approved? We need this feature!

IMSoP · 2023-03-28T18:00:36Z

@hegoku There is a summary of why it was rejected two comments above yours, and you scrolled past most of the detailed discussion right here on this page.

Feel free to also read the mailing list archive of the RFC discussion thread and vote announcement thread, which basically cover the same reasons.

chopins added 12 commits September 2, 2020 12:20

Merge pull request #1 from php/master

e3f29b4

sync master

Merge pull request #2 from php/master

f97a395

merge

Merge pull request #3 from php/master

9850245

sync

Merge pull request #4 from php/master

728d914

sync

add <?pho support

b5f86a2

fix unlink bug

92ec6b1

add opcache.allow_direct_exec_opcode option

004b3da

remove opcache code support <?pho

ba5a880

fix bug

e8a2785

add tests

a6e2c52

update

acae2a1

fix bug

699c89e

chopins changed the title ~~Adding <?pho to opcache supports making it run directly without the need for a PHP source file.~~ OPCache: Direct execution exec opcode file without php source code file Sep 18, 2020

chopins changed the title ~~OPCache: Direct execution exec opcode file without php source code file~~ OPCache: Direct execution opcode file without php source code file Sep 18, 2020

chopins added 3 commits September 20, 2020 21:55

Merge pull request #5 from php/master

ee2ba03

sync master

fix conflicting

3bde3c9

fix conflicting

132be8b

Merge pull request #6 from php/master

ee22aa4

sync master

vdelau reviewed Oct 1, 2020

View reviewed changes

ext/opcache/tests/opcode_store_specified_file.phpt Outdated Show resolved Hide resolved

vdelau reviewed Oct 1, 2020

View reviewed changes

ext/opcache/zend_accelerator_module.c Outdated Show resolved Hide resolved

chopins added 2 commits October 1, 2020 22:57

fix text

e6e0c33

merge

f8534c5

TysonAndre reviewed Dec 12, 2020

View reviewed changes

ext/opcache/zend_file_cache.c Show resolved Hide resolved

TysonAndre reviewed Dec 12, 2020

View reviewed changes

ext/opcache/zend_file_cache.c Outdated Show resolved Hide resolved

TysonAndre reviewed Dec 12, 2020

View reviewed changes

ext/opcache/zend_file_cache.c Show resolved Hide resolved

TysonAndre reviewed Dec 12, 2020

View reviewed changes

ext/opcache/zend_file_cache.c Show resolved Hide resolved

chopins added 3 commits December 17, 2020 17:39

Merge pull request #9 from php/master

725f0ef

sync

close fd at failure

0e82b95

Merge branch 'master' into add-pho-support

c44d191

ramsey added the Feature label Jun 9, 2021

nikic closed this Jun 10, 2021

pronskiy mentioned this pull request Nov 5, 2023

Prebuild opcache into the PHAR box-project/box#1128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPCache: Direct execution opcode file without php source code file #6146

OPCache: Direct execution opcode file without php source code file #6146

chopins commented Sep 16, 2020 •

edited

thg2k commented Sep 21, 2020

chopins commented Sep 22, 2020 •

edited

nikic commented Sep 23, 2020

chopins commented Sep 25, 2020

dtakken commented Sep 30, 2020

brzuchal commented Oct 1, 2020

mvorisek commented Oct 1, 2020

chopins commented Oct 1, 2020

bozhinov commented Dec 12, 2020 •

edited

IMSoP commented Dec 12, 2020

IMSoP commented Dec 12, 2020

TysonAndre commented Dec 12, 2020

TysonAndre commented Dec 12, 2020 •

edited

bozhinov commented Dec 12, 2020 •

edited

TysonAndre commented Dec 12, 2020

bozhinov commented Dec 12, 2020 •

edited

TysonAndre commented Dec 12, 2020

IMSoP commented Dec 12, 2020

chopins commented Dec 13, 2020

TysonAndre commented Dec 13, 2020

sgolemon commented Dec 14, 2020 •

edited

bozhinov commented Dec 15, 2020

ramsey commented Jun 9, 2021

IMSoP commented Jun 9, 2021

nikic commented Jun 10, 2021

acicali commented Aug 6, 2021

hegoku commented Mar 28, 2023

IMSoP commented Mar 28, 2023

OPCache: Direct execution opcode file without php source code file #6146

OPCache: Direct execution opcode file without php source code file #6146

Conversation

chopins commented Sep 16, 2020 • edited

Function change:

OPCache change:

Add opcache.allow_direct_exec_opcode configuration options

Add opcache.prohibit_different_version_opcode option

Notice:

Example

implementation

thg2k commented Sep 21, 2020

chopins commented Sep 22, 2020 • edited

nikic commented Sep 23, 2020

chopins commented Sep 25, 2020

dtakken commented Sep 30, 2020

brzuchal commented Oct 1, 2020

mvorisek commented Oct 1, 2020

chopins commented Oct 1, 2020

bozhinov commented Dec 12, 2020 • edited

IMSoP commented Dec 12, 2020

IMSoP commented Dec 12, 2020

TysonAndre commented Dec 12, 2020

TysonAndre commented Dec 12, 2020 • edited

bozhinov commented Dec 12, 2020 • edited

TysonAndre commented Dec 12, 2020

bozhinov commented Dec 12, 2020 • edited

TysonAndre commented Dec 12, 2020

IMSoP commented Dec 12, 2020

chopins commented Dec 13, 2020

TysonAndre commented Dec 13, 2020

sgolemon commented Dec 14, 2020 • edited

bozhinov commented Dec 15, 2020

ramsey commented Jun 9, 2021

IMSoP commented Jun 9, 2021

nikic commented Jun 10, 2021

acicali commented Aug 6, 2021

hegoku commented Mar 28, 2023

IMSoP commented Mar 28, 2023

chopins commented Sep 16, 2020 •

edited

Add `opcache.allow_direct_exec_opcode` configuration options

Add `opcache.prohibit_different_version_opcode` option

chopins commented Sep 22, 2020 •

edited

bozhinov commented Dec 12, 2020 •

edited

TysonAndre commented Dec 12, 2020 •

edited

bozhinov commented Dec 12, 2020 •

edited

bozhinov commented Dec 12, 2020 •

edited

sgolemon commented Dec 14, 2020 •

edited