Wednesday, 13 April 2016

Breaking Badly


If you're not running PHP 7 already, you are either crazy, or else your unit tests rely on software that I wrote for PHP 5 ... uopz.

uopz is a runtime hacking extension of the runkit and scary stuff genre.

When I first wrote uopz, PHP 5 was almost in a state of equilibrium. There were minor changes effecting the kind of stuff those extensions do, but by the time 5.5 came the platform was more or less stable.

When PHP 7 came along, it was a major roadblock that we could not run our unit tests. Badoo and many other large projects also had the same problem.

While there were tickets open requesting support for PHP 7, I omitted to answer the tickets in favour of working on the code. I wrongly assumed, if you were waiting for updates you would be watching closely. I also omitted to tag any issues in any commits I was making simply because I'm bad at git.

Various tweets and tickets went unanswered ... sorry about that, but ... code and ... me.

For many months, you could compile uopz for PHP 7 and it would "work", but it was terribly unstable.

I have my fingers in many pies, maintain many things; uopz was not the only road block and not the most important thing to do. It did receive attention ... but I will admit, not enough attention.

More precisely, not enough of the right kind of attention. Every time I came to work on uopz I was focused on making it do exactly the same things it done before, in exactly the same way.

Some of the stuff uopz does is semi-ordinary, it even calls Zend API in a lot of cases. However, copying functions (in the bitwise+instruction by instruction sense), manipulating the global function table, and class function tables, is not ordinary; This is what makes uopz or runkit scary, and useful.

PHP 7 is vastly different to PHP 5 internally, the VM is a much more complicated place to try and get work done.

If we think about a year from now, or two years from now:
  • What happens when Zend has a JIT ? 
  • What happens if Opcache makes class entries immutable, and so shares them ?
  • What happens if Opcache makes function entries immutable and re-entrant, and so shares them ?
I do not know the answers to those questions, but they are good questions.

When these thoughts are communicated clearly, it might be obvious that uopz cannot work in the same way, and be stable, or forward compatible.

Function Mockery v5 

 

You do not need to delete, rename, or otherwise modify functions, or function tables.

The purpose of allowing you to delete, or rename, a function was to allow you to create another one in it's place.

The purpose of allowing you to create a new function in the place of the deleted function, was so that you could define new behaviour.

This is complicated by the fact that your new function may need to invoke the original function with certain parameters.

Almost certainly, at some time the original function will be explicitly restored, possibly after a group of tests (tearDownAfterClass perhaps).

Even if restoration is not performed explicitly, at request shutdown everything must be restored - you cannot leave a user function in the function table of an internal class, nor can you delete internal function entries earlier than the engine is expecting.

Simply deleting the function is not an option, you have to keep it, and ideally provide a way to copy it into a closure in userland.

That's a rather roundabout way of doing things, don't you think ?

Here's a better way:
function uopz_set_return(string class, string method, mixed value [, bool execute = false]) : bool;
function uopz_set_return(string function, mixed value [, bool execute = false]) : bool;

This new API does not modify the function table, instead it intercepts the execution of an existing function and allows you to set a return value.

The return value can be any variable, or a Closure to be executed in place of the original function, but still without modifying any function tables:
uopz_set_return('strlen', function(string $string) : int {
 return strlen($string) * 2;
}, true);

var_dump(strlen("four"));
The code above yields:
int(8)
In some cases, you do not want to modify the behaviour of the original function, but rather modify some state or perform some other action upon entry to a particular function:
uopz_set_hook('strlen', function(string $string) {
 echo "Expect: int(4)\n";
});

var_dump(strlen("four"));
The code above yields:
Expect: int(4)
int(4)
Hook and return closures are bound to the current scope at runtime:
class Foo {
 private $bar = true;

 public function qux() {
  return false;
 }
}

uopz_set_return(Foo::class, 'qux', function() {
 return $this->bar;
}, true);

$foo = new Foo();

var_dump($foo->qux());
The code above yields:
bool(true)
Setting hooks and returns should have most use cases covered, still there are times when you need to add a non-existent function:
function uopz_function_add(string class, string method, Closure handler [, int flags = ZEND_ACC_PUBLIC]);
function uopz_function_add(string function, Closure handler [, int flags]);
This new API allows you to do that, it is similar to uopz_function, but can only add functions, it will not replace functions.

I was reluctant to allow adding functions at all, it makes everything slower because it means we have to disable function entry caching. All of the uses we have left now are questionable, I suspect the same is always true.

The majority of the time, you were only adding a function because there wasn't a better way ... use the better way :)

Class Mockery v5

 

Allowing userland code to over ride opcode handlers was always bat shit crazy!

In PHP 5, to provide a mock class at test time, you had to overload ZEND_NEW and change the name of the class.

Not only is that crazy, it's bad.

If you happen to be running tests that create 1000 objects of a particular kind, but could use the same object, you are wasting the resources consumed by the creation of 999 objects. In test suites where we have tens of thousands of tests, this can have a dramatic effect.

In PHP 7 we have anonymous classes, this allows us to have a rather beautiful API:
interface IFoo {
 public function bar() : bool;
}

function consumer(IFoo $foo) : bool {
 return $foo->bar();
}

uopz_set_mock(Foo::class, new class implements IFoo {
 public function bar() : bool {
  return true;
 }
});

var_dump(consumer(new Foo()));
The code above yields
bool(true)
uopz_set_mock can also accept a class name as the second parameter, the following code will behave identically to the code above:
interface IFoo {
 public function bar() : bool;
}

function consumer(IFoo $foo) : bool {
 return $foo->bar();
}

class Mock implements IFoo {
 public function bar() : bool {
  return true;
 }
}

uopz_set_mock(Foo::class, Mock::class);

var_dump(consumer(new Foo())); 
Anonymous classes have superseded the role of uopz_compose, which used to allow composition of classes at runtime, in a rather awkward way. While uopz_compose was a nice toy, we are looking for stability, and forward compatibility, which are guaranteed if we are relying on language features.

Sorry 


I broke BC, and I feel bad about that ... my pain is eased by the thought that I provided a more stable, superior API to work with, that has a chance of being forward compatible with whatever Zend does next.

I also feel bad that I haven't had time to update the documentation for uopz yet; For now the README is the documentation.

If anyone feels like helping with documentation, that would be much appreciated.

Happy testing :)