Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPCache: Direct execution opcode file without php source code file #6146

Closed
wants to merge 26 commits into from

Conversation

chopins
Copy link
Contributor

@chopins chopins commented Sep 16, 2020

Function change:

opcache_compile_file(string $file, string $opcode_file = null): bool
  • Add paramter $opcode_file.
  • if given $opcode_file , the function will prepend <?phpo to php opcode file and save to $opcode_file
  • if $opcode_file is null be same as the current state.
  • <?phpo tag are prepend to file of opcode only when useing opcache_compile_file() and provide second args
  • if opcode file start is <?phpo, opcache can be executing it without php source file.
  • opcache.file_cache_only must be enable and opcache.file_cache must be set

OPCache change:

$opcode_file that starts with <?phpo, will remove below verify:

  1. accel_system_id
  2. validate_timestamps

Add opcache.allow_direct_exec_opcode configuration options

1.if set 0, default value, same as the current state, can not direct exec any opcode file
2. if set 1, only when opcode file start is <?phpo, direct exec opcode file without php source file

Add opcache.prohibit_different_version_opcode option

  1. set 1, default value, different version opcode file exec are prohibited
  2. set 0, different version opcode file will report E_WARNING message

Notice:

if code directory tree change, the PHP magic constant associated with a path will invaild

Example

compile php file myphp.php, code similar to below:

opcache_compile_file('myphp.php',  'myphp.bin.php');

then can exec php myphp.bin.php or include 'myphp.bin.php' without myphp.php
myphp.bin.php will similar to:

<?phpo{phpversionid}OPCACHE575d367cc725713f6f170910d6e9ee5e-------BINARY CONTENT OF OPCODE----

implementation

compile to file:

  1. current opcache store to file:
    php--->load code--->compile to opcode --->store to cache system directory-->same path file
  2. path added optional:
    php--->load code--->compile to opcode--->save to cache system directory-->same path file-->copy cache file to the specified path

opcache exec process:

  1. current opcache exec:
[php]-->[find cache in cache system directory]-->[found] -->[exec opcode]
                                                                 \--->[not found]--> [exec php source]
  1. path added optional:
[php/phpo]--->[is phpo, <?phpo exist]--->[load the <?phpo file] ---->[exec opcode]
          \---->[not phpo] ---> [find cache in cache system directory]-->[found] -->[exec opcode]
                                                   \------------>[not found] --> [exec php source]-->[auto cache opcode]

@chopins chopins changed the title Adding <?pho to opcache supports making it run directly without the need for a PHP source file. OPCache: Direct execution exec opcode file without php source code file Sep 18, 2020
@chopins chopins changed the title OPCache: Direct execution exec opcode file without php source code file OPCache: Direct execution opcode file without php source code file Sep 18, 2020
@thg2k
Copy link
Contributor

thg2k commented Sep 21, 2020

Any chance we can have something like <?ophp instead of <?pho? I really don't like it but I really like the feature

@chopins
Copy link
Contributor Author

chopins commented Sep 22, 2020

The <?pho is only opcdoe file flag and php do not parse it,
Refer to pyo of python and tag <?php,so use it

@nikic
Copy link
Member

nikic commented Sep 23, 2020

I'd suggest starting a discussion on the PHP internals mailing list for this feature.

@chopins
Copy link
Contributor Author

chopins commented Sep 25, 2020

@nikic
I'm not join php internals mailing list
so please forwarding this feature

@dtakken
Copy link

dtakken commented Sep 30, 2020

@nikic
I'm not join php internals mailing list
so please forwarding this feature

Anyone is welcome to join: https://www.php.net/mailing-lists.php

@brzuchal
Copy link
Contributor

brzuchal commented Oct 1, 2020

IMHO the string OPCACHE is a sufficient magic number / magic bytes of hex 4F 50 43 41 43 48 45.

List of file signatures

@mvorisek
Copy link
Contributor

mvorisek commented Oct 1, 2020

Are opcodes designed to be cross versions compatible?

@chopins
Copy link
Contributor Author

chopins commented Oct 1, 2020

@mvorisek can not cross-version.
Opcode files are simply cache files that can be moved anywhere.
For example, Java,.NET is version dependent

@bozhinov
Copy link

bozhinov commented Dec 12, 2020

@Girgias I was not referring to the speed but to the FFI. And you completely missed my point about having to sell the actual source code. it is not about the obfuscation but protecting what's yours. It is all readable if you ask me.

@IMSoP I will edit this one here cause your latest statement seems like a good closure to this discussion.
but for the record copy and pasting reversed optimized code is not the same as copying the actual code
I guess you work for Zend and we have the opcode cache thanks to you guys. but we struggle to get even minor requests in
stuff that will make it even more useful than it currently is. There are a total of six exported functions according to the docs and I m sure there are a lot more in there we can leverage.

@TysonAndre, thank you sir for spending the time to explain it in simple terms and examples.
I get there are quirks to writing a portable application but the OS and the hardcoded paths is something we (the php coders that don't work on the interpreter) can actually deal with as long as it is documented. We can't do C/C++ but can read docs and write unit tests.

The only points I totally understand are:

  • It is fairly new
  • It will cause problems that regular php coder can't trace which will lead to numerous bug being reported and dev time being wasted.

I will back off now.

@IMSoP
Copy link
Contributor

IMSoP commented Dec 12, 2020

And you completely missed my point about having to sell the actual source code. it is not about the obfuscation but protecting what's yours. It is all readable if you ask me.

If you're talking about licensing implications, then what you're selling is the legal permission to use it in certain ways. That's no different whether the code you're distributing it is PHP, opcodes, or your own custom programming language. For that matter, it's no different whether you're distributing C source code or native x86 machine code.

As far as I'm aware, there's no legal definition of "the actual source code" that would make any difference whatsoever.

@IMSoP
Copy link
Contributor

IMSoP commented Dec 12, 2020

There seem to still be some misconceptions about what opcodes are, and therefore what this feature would achieve.


Firstly, PHP opcodes are not native machine code.

To execute PHP opcodes, you need to run the Zend Engine. As far as I know, the only other things that can do anything with opcodes are debugging tools, most of which are compiled on top of Zend Engine anyway.

So opcodes do not make embedding or linking to PHP from other languages or programs any easier. Whether you have PHP source code or opcodes, you will need the Zend Engine to execute them.


Secondly, PHP opcodes are not equivalent to Java bytecode or .Net CIL.

The JVM and .Net runtime were explicitly designed for portability - "compile once, run anywhere". They define a standardised intermediate language, with strong guarantees about compatibility between versions and environments. To repeat: these are not incidental features, they are at the very heart of the design of these technologies.

The Zend Engine in general, and OpCache in particular, has almost the opposite aim: its job is to make code run fast on the current environment. It can and will generate different representations based on factors like:

  • Changes added in a the latest build of PHP
  • Extensions currently loaded
  • Location of the file being compiled (e.g. the __FILE__ and __DIR__ constant folding mentioned by Tyson above)
  • CPU architecture
  • Operating system

So opcodes are not, and never will be, a good way to distribute code to multiple targets. If you put opcodes in a PHAR file, that PHAR file is going to be useless to 99.999% of other PHP users, whose environment won't match yours.


Finally, opcodes are not a good way to obfuscate PHP.

Like portability, obfuscation is not a design aim, and it wouldn't make sense to compromise on other aims for that purpose. If you want to obfuscate PHP code, a standalone tool that operates on the PHP source code will be better in almost every way:

  • It can be distributed and installed independently of PHP. (You might think built-in tools are better, but they mean slower updates, and a more awkward install process.)
  • It will be portable across all systems, and as many versions of PHP as you want.
  • It can rename classes, functions, etc which you mark as internal only, by analysing the whole project at once (opcache operates one file at a time).
  • It can be configured to strip out things which would normally be available in reflection, like docblock comments.
  • It can, if you want, use obfuscation tricks that have a performance penalty.

@TysonAndre
Copy link
Contributor

Smaller bundles: I'd like to see some data, but my suspicion is that the actual "compression" of compiling scripts to opcodes is negligible (not counting the stripping of comments, which can be accomplished with much simpler minifiers)

Additionally, this might increase download sizes for end users because multiple versions may end up being distributed in the same package depending on the supported platforms

(e.g. Include foo.php81_linux.opcache, foo.php82_linux.opcache, foo.php81_windows32.opcache, foo.php82_windows64.opcache, etc.
(For all combinations of 32-bit/64-bit, os, etc. Or maybe publishers would drop 32-bit support)

I don't get to strip all the comments and remove all references to internal dev stuff before I ship it.

Open source tools to automate minifying code already exist - e.g. https://github.com/box-project/box/blob/master/doc/configuration.md#compactors-compactors for phars, and probably many others.

Stripping out tokens of kind T_COMMENT and T_DOC_COMMENT from https://www.php.net/token_get_all would do the same thing,
and standalone tools can be built on parsers such as https://packagist.org/packages/nikic/php-parser (e.g. renaming variables before converting back to php source code (in functions without dynamic variable access, $$, or closure uses), rendering without all/most comments (e.g. everything except @license comments), etc.) or by extending tools that already use those parsers.

@TysonAndre
Copy link
Contributor

TysonAndre commented Dec 12, 2020

To some extent, the security of code deployed on public servers is protected, such as shared hosts

There's also security downsides - this makes it much easier for malicious users to obfuscate code (for antivirus products, intrusion detection software, etc) when exploiting vulnerabilities.

It also allows attackers to manually craft files with opcodes that would access invalid memory locations or cause php to execute c functions in unexpected ways (currently, the correctness of opcache is the only thing that ensures that the PHP VM doesn't read or write out of bounds memory location, I don't expect that to change)

  • This RFC would make it easier to bypass obstacles/protections the administrator of public servers had in place, such as disable_functions, protecting directories, system_id being harder to guess for someone without an account on the server in question, etc.

    EDIT: Oh, allow_direct_exec_opcode is disabled by default, only public servers where the administrator enabled the setting (e.g. if they used a product relying on this) would be affected.

https://www.gosecure.net/blog/2016/04/27/binary-webshell-through-opcache-in-php-7/ is one example of that and the current things that make it harder for an attacker to overwrite opcache's opcodes (system id)
It also mentions a tool for visualizing the opcodes from the compiled binary files https://www.gosecure.net/blog/2016/05/26/detecting-hidden-backdoors-in-php-opcache/

(I've never used this tool and I'm unfamiliar with that blog/company, this was the first google result for vulnerabilities with opcache.file_cache)

@bozhinov
Copy link

bozhinov commented Dec 12, 2020

@TysonAndre I guess you did not visit the github of that page where there is a py script to scrape the system_id
Also check how it evolved since I posted the initial solution here in 2017: link to php bugs

@TysonAndre
Copy link
Contributor

@TysonAndre I guess you did not visit the github of that page where there is a py script to scrape the system_id

The link was intended to give background for readers unfamiliar with php's opcode cache and what kind of security drawbacks could be expected from systems running opcodes distributed with php source libraries.

If the attacker that doesn't have an account on your system can run python scripts (as the same user as the web server), you already have an issue.

No, but https://github.com/GoSecure/php7-opcache-override/blob/master/system_id_scraper.py - interesting, it's more predictable that I would have guessed without looking at the code. I would have hoped that immutable system information such as hardware ids could be used if available but I guess the main purpose is just to avoid conflicting with incompatible php opcode versions - hardware ids probably wouldn't work well with docker or virtualization anyway.

Also check how it evolved since I posted the initial solution here in 2017: link to php bugs

For that bug report, I don't think php's maintainers would have interest in exposing functionality that makes it easier to exploit; if a researcher really wanted it, they could publish an external PHP module duplicating the function in opcache to help other security researchers/hobbyists.

But the system_id isn't really related to my concern about this RFC - "The RFC allows attackers to manually craft files with opcodes that would access invalid memory locations or cause php to execute c functions in unexpected ways (on systems where allow_direct_exec_opcode is enabled)"

@bozhinov
Copy link

bozhinov commented Dec 12, 2020

The goal back then was to have the opcode cache outside of the htroot and 0 byte php files in the htroot. so how is the attacker getting access to the opcode cache ? the attack scenario assumes you can upload a file outside of the htroot.
I guess you are right though. Putting the pho files in the htroot is a poor choice in terms of security

@TysonAndre
Copy link
Contributor

An 'unassembler' for opcodes can be written in a weekend (trust me on this, I've done it). Neither this, nor any similar approach will accomplish this motivation.

Agreed.

https://github.com/GoSecure/php7-opcache-override/blob/master/analysis_tools/opcache_disassembler.py exists, for an example (targeting an older php minor version, struct layout and opcodes change in every minor version)

PHP itself comes with the facilities to dump opcodes while compiling through https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.opt_debug_level - I imagine the ability to dump opcodes loaded from a file would additionally be added (and would be simple to add) if it became necessary to understand user-submitted bug reports.

And opcode editors/decompilers would be created and improved upon if obfuscation became common

@IMSoP
Copy link
Contributor

IMSoP commented Dec 12, 2020

I guess you are right though. Putting the pho files in the htroot is a poor choice in terms of security

I'm not sure the concept of an "htroot" has much meaning in a lot of modern applications. Dynamic content is rewritten to run a single "front controller" script, and static content is served from a separate directory; the two could be anywhere on disk. Any ".php" files that somehow get into the static content directory can be configured to go nowhere near PHP, and just serve a 403 Forbidden or 404 Not Found. That has nothing to do with ".php" vs ".pho", just application design and server configuration using the tools we already have.

@chopins
Copy link
Contributor Author

chopins commented Dec 13, 2020

  • The protection of opcode files is that they are not easily modified by everyone (including those with limited computer knowledge).
  • Only prevent the good man.
  • Bind php version will avoid compatibility problems.

@TysonAndre
Copy link
Contributor

The protection of opcode files is that they are not easily modified by everyone (including those with limited computer knowledge).

If you mean obfuscation, I'd recommend using a php to php obfuscator as in #6146 (comment)

If you mean security, then if a single person publishes a "phpo" opcode file that can be used as an exploit or proof of concept (or tool to generate those files), anyone with limited computer knowledge could just copy the exploit, or change that for their purposes. The effect would thankfully be limited due to the system ini setting only affecting people using libraries that would use that functionality, but it's a concern if it is widely used.
(e.g. bypassing the php VM sandbox by arbitrary pointer reads/writes to leak raw memory (e.g. user-submitted data or app secrets such as passwords from other accounts on shared hosting), bypassing the php VM sandbox to exploit a running web server, etc)

  • (currently, the correctness of the files that are generated by opcache is the only thing that ensures that the PHP VM doesn't read or write out of bounds memory location (the php interpreter doesn't check opcodes that are loaded), I don't expect that to change)

Java bytecode has a verifier, and PHP doesn't - there are no plans I know of to add that to php and there are far, far more developers working on Java than on PHP's opcache - https://www.oracle.com/java/technologies/security-in-java.html

What about the concept of a "hostile compiler"? Although the Java compiler ensures that Java source code doesn't violate the safety rules, when an application such as the HotJava Browser imports a code fragment from anywhere, it doesn't actually know if code fragments follow Java language rules for safety: the code may not have been produced by a known-to-be trustworthy Java compiler. In such a case, how is the Java run-time system on your machine to trust the incoming bytecode stream? The answer is simple: the Java run-time system doesn't trust the incoming code, but subjects it to bytecode verification.

The tests range from simple verification that the format of a code fragment is correct, to passing each code fragment through a simple theorem prover to establish that it plays by the rules:

  • it doesn't forge pointers,
  • it doesn't violate access restrictions,
  • it accesses objects as what they are (for example, InputStream objects are always used as InputStreams and never as anything else).

@sgolemon
Copy link
Contributor

sgolemon commented Dec 14, 2020

Hi Sara, looked you up like a total creeper would.

Might I also recommend looking up prior requests for this feature? The motivations for it have been the same, and the reasons why it's not a good idea haven't changed either.

  • Owning the actual code (which is what the client gets when you sell your code) is not the same as owning something that can be reversed back to code (not the actual one)

Agreed, however (as I already said), there are much simpler ways to do this which have much fewer compatibility implications.

  • I don't get to strip all the comments and remove all references to internal dev stuff before I ship it. It is not a big thing but then I have to store that version as well.

Let me get this straight.

  • You can't run a quick script at library bundling time to minimize/obfuscate your code.
  • You can run a quick script at library bundling time to serialize your code (without obfuscation) to bytecode.

Does that describe what you just said?

  • You put those opcodes in a phar archive and you get a JAR - and Java must be doing something right, right ?

This one is going to blow your mind. JAR files can be unassembled too. Five seconds of googling found me a handy web-based tool that I need zero technical skill to use.

  • You have a FFI interface now - so maybe we can use PHP for more than just web? It is fast enough. I m thinking Python and they have PYC

This is a non-sequiter. Yes, you can use PHP for non-web tasks. That has nothing to do with the argument at hand.

Nothing.

@bozhinov
Copy link

OK so we agree on something and you did get my sample use case right. Thank you.
I m going to shut up now as it would seem we are getting this implemented and I m grateful
fyi I use a Java decompiler (or two) at least once a week.

@ramsey ramsey added the Feature label Jun 9, 2021
@ramsey
Copy link
Member

ramsey commented Jun 9, 2021

Can someone summarize the outcome of the discussions here and on the mailing list? It's unclear to me where things ended up. Is this in or out?

@IMSoP
Copy link
Contributor

IMSoP commented Jun 9, 2021

@ramsey The RFC was voted on and unanimously rejected: https://wiki.php.net/rfc/direct-execution-opcode#vote

The most succinct summary is probably the one Sara Golemon put in the voting thread on the mailing list:

Voted no because, as stated during discussion, this is brittle, provides a false sense of security, and doesn't fix any problem.

@nikic
Copy link
Member

nikic commented Jun 10, 2021

Closing this per above comments (RFC has been declined).

@nikic nikic closed this Jun 10, 2021
@acicali
Copy link

acicali commented Aug 6, 2021

Seems like this would primarily benefit someone who wants to hide malware on a compromised server.

@hegoku
Copy link

hegoku commented Mar 28, 2023

Why this RFC is not approved? We need this feature!

@IMSoP
Copy link
Contributor

IMSoP commented Mar 28, 2023

@hegoku There is a summary of why it was rejected two comments above yours, and you scrolled past most of the detailed discussion right here on this page.

Feel free to also read the mailing list archive of the RFC discussion thread and vote announcement thread, which basically cover the same reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet