 |
Fig 1. A sign post to the future (unverified). |
The last few months have been a hectic time in the life of a PHP extension developer.: PHP7 approaches fast, with the first Release Candidate already out in the wild.
PHP7 probably feels like the future to you, but it's my here and now ... I wonder if that makes me a time traveller ... I digress ...
When it became apparent that the changes to PHP7 were rather significant for the kind of extensions I like to write, I was worried: I was worried because all the extensions I maintain, I'm able to maintain based on years of experience with PHP5, which I just didn't have for PHP7.
So I set to reading as much as I could, source code, blog posts and help from other internals contributors helped me stumble my way to familiarity, and with great pride I made an announcement on Twitter:
Outwardly I was happy to have succeeded in my task to make pthreads PHP7 compatible, however inwardly I struggled, I couldn't bring myself to tag a release, it just didn't feel right.
After a few days of reading and re-reading every line of pthreads while postponing tagging a release, I realized that I wanted to rewrite most of pthreads; Compatibility is not enough, worthiness is a better thing to strive for.
I always wanted to be the kind of person who could say "up side your head" or "he/she got game" without it being awkward for everyone present, alas, I am a programmer, with a beard. Humour me, read the next sentence in the knowledge that it fulfils a life long dream.
I think I'm justified in saying, with regard to PHP7 in particular, that she has without a shadow of a doubt, got game.
A change in attitude brought about PHP7; PHP5 matured with the attitude that if it works it's good enough. Most of the changes for PHP7 were made because of the realization that it was not good enough, and no matter what kind of disruption fixing it would bring, it would be worth it.
With my adjusted attitude, I set to work. I tore out the bad bits and rewrote or removed them until I had what we're going to call pthreads v3.
Removed Things
Before you shout about breaking code, I'm totally justified in removing some stuff because it's bad or stupid. I don't feel guilty about breaking bad or stupid code when I'm providing a superior path.
I want pthreads not only to be useful, but as far as possible I want it to be easy to write
good code, some stuff just provided a direct path to crappy code.
Threaded::from
Not only were there bugs in the implementation, this was simply superseded by PHP7 support for anonymous classes, nice and simple.
Mutex and Cond
I mentioned as often as I could that these were never intended for wide use, they were present at the very beginning to help me develop and debug the extension.
Using mutex and condition variables directly in PHP is dangerous: You do not have enough control over execution to use them safely.
If you lock a mutex and then for whatever reason (perhaps an unexpected exception changes the code path, or a fatal engine error occurs) you do not unlock it, the next context to call lock will deadlock.
They were never required, there is synchronization methods built into Threaded objects.
Threaded::isWaiting
There is something seriously wrong with code designed to detect if an object is waiting for notification, any such code is written misunderstanding synchronization.
Threaded::lock and Threaded::unlock
This amounts to using mutex directly in userland, and presents the same kinds of dangers.
Synchronization of an objects properties or state should be only achieved with Threaded::synchronized.
In pthreads v2, Threaded::synchronized only synchronized state, not the properties table, so lock and unlock seemed reasonable.
Method Modifiers - Special Behaviour
In pthreads v2, a protected method could only be executed by one context at a time and a private method could only be executed inside the thread context.
That seemed useful, however, it was unreliable because of the way functions are cached in the core and called by various extensions in the wild.
Removing this special behaviour allows Zend to cache function pointers normally, making calls to Threaded methods a bit quicker and less complex.
Threaded::synchronized is now suitable for implementing exclusion reliably.
Improved Things
I've tried to look over pthreads with the same kind of critical attitude that brought PHP7 into existence. This has resulted in many improvements and much nicer code, the inner workings of pthreads are even starting to look simple.
Serialization Hack for Threaded Objects
In any version of PHP, objects cannot be shared among many contexts, normal objects are just serialized because it's the safest (and only) thing you can reasonably do.
In pthreads v2, Threaded objects were not serialized, each context with a reference to a Threaded object has a distinct physical object in it's own object store, but they share a physical thread safe property store and some other stuff (state etc). This was facilitated by a hack to the serialization handlers that passed around, I'm ashamed to say - as a string - the physical address of Threaded objects.
This is obviously crap, so I removed that completely. You can now serialize a Threaded object just like any other in pthreads v3.
Threaded Object Iteration
As mentioned, Threaded objects don't use a normal property store, so iteration needs implementing for these objects.
In pthreads v2, when you begin iterating over a Threaded object the entire custom property store is converted to a normal HashTable of zvals for Zend.
This is not so bad for tiny objects, but it is bad for complicated or big objects, or loops where you intend to break out early.
In pthreads v3, when you begin iteration over a Threaded object, just the keys are copied for the iterator and each value in the property store is converted JIT.
This keeps memory usage down and is much more suitable in the general case.
Threaded Object Monitor
I've pinched the term Monitor from Java. I'm sure you can find a detailed explanation if you're interested in the fine detail, suffice to say for now that a monitor is a mutex, condition, and state, that are used to implement critical sections - where only one thread may travel that path at a time.
In pthreads v2, there was no uniform monitor, there were separate structures, with separate locks as well as a lock on the object itself.
This was pretty terrible of me. Indefensible.
In pthreads v3 there is one monitor for each object, not only does that make the internals easier to understand and debug, it makes synchronization much more powerful and predictable in userland.
Worker and Pool
In pthreads v2, using a Worker directly was a tricky business; The programmer needs to retain a reference to any object who may be executed, if they don't, faults will follow swiftly.
This was in part solved by the provided Pool implementation which maintained references for the programmer, but because PHP (everything is a hashtable) it done so consuming a lot of memory that was difficult to keep track and control of.
The Pool::collect method allowed the programmer to forcibly free memory for a long running process, you pass Pool::collect a function that accepts a Threaded object and return true if the engine can free that object. All objects are passed to the collector regardless of if they have been executed or not.
In pthreads v3, Pool::collect has been moved to Worker::collect, and it's now Worker that maintains references to objects on the stack, this means that you no longer need to retain references to objects who may be executed.
Worker::collect is more efficient because you only traverse a list of garbage, in addition the structure is no longer a HashTable, but a custom structure, fit for purpose.
What's Next ?
As of today, I haven't tagged a release, I'm still testing all these changes with the help of a few dedicated users, consider this an invitation to
join in the testing.
I'm pretty happy with the shape of things now, a release will be tagged soon, and I can move onto the next extension.
I'm going to go back to code now ...