Saturday, 1 May 2021

Worthiness



My first commit to pthreads was August 28th 2012, while the initial implementation was written rapidly, it had been on my mind daily - because of Java - for maybe a few years. Almost a decade I've spent trying to show that threads in PHP are possible, that despite what everyone says, PHP was an excellent candidate for threads precisely because it's shared nothing. I've convinced very few people in that time.

When it was obvious that PHP 8 was going to deploy a JIT, allowing PHP to execute on the CPU directly, pthreads was an unwieldy beast, beset by many bad decisions I made during it's development that made it impossible to consider it a candidate for inclusion in php-src.

I set about writing a new API, I dropped the Java inspired OO model and came up with a CSP model - a la go. I called it parallel and had every intention of proposing to include it in PHP 8. I went to great lengths to make sure it was compatible with the JIT, including arguing the case for the JIT to even have thread safety support initially. 

The first time I saw PHP executing in parallel, directly on the CPU it was such a reward and I was super excited for the future.

Parallel concurrency is not the only kid on the block, and it's not even the kid that most PHP programmers are familiar with.

The domain of web things is mostly covered by the umbrella of IO - most web apps are IO machines, doing many database and API calls, all requiring lots of socket programming. In this domain of IO, the kind of concurrency that scales and is useful is asynchronous concurrency.

Without diagrams (which you may find on my blog somewhere) and going into too much detail about the differences between asynchronous and parallel concurrency, asynchronous concurrency is the thing that allows curl_multi, a thing we're all familiar with, to work.

You might ask: "Why bother to give parallel concurrency any attention at all if asynchronous concurrency is what people find useful?"

PHP claims to be a general purpose scripting language, and asynchronous concurrency has one single use (IO), it's not general purpose in any sense of the word. If your code is doing anything that is CPU bound, you do need parallel concurrency to take advantage of the hardware you have. Now, because of the way PHP is deployed (as part of a stack of software) and the way it scales (adding machines) it's true that PHP is spread out over your hardware quite nicely, and when the hardware becomes overloaded, we just add more hardware. Nevertheless, I viewed parallel concurrency as a way to expand the horizons of PHP, to help it be the general purpose language it claims to be.

So ... that's the past covered ...

The Future

It wasn't very long ago that I was writing optimistically about the future ... my optimism has vanished, and here's why ...

Fibers were recently merged into php-src. Fibers help to achieve asynchronous concurrency, they are a kind of green thread - that's to say user space threads that are scheduled by the users code cooperatively. They are not useful for parallel concurrency, at all.

They are complicated though ... They give yet another way for the programmer (or the maintainer of the library they are using) to implement asynchronous concurrency in their applications.

Fibers are squarely aimed not at the general user, but at the maintainers of frameworks and libraries, it's highly unlikely you will be using Fibers yourself.

Despite what the RFC for Fibers claimed about its compatibility with parallel, it is not compatible with parallel.

Parallel implements CSP using normal threading primitives, the moment you try to mix CSP, Fibers, and Parallel, you will either crash, deadlock, or your head will simply overheat, and you will die.

Okay, you may not die ... 

Stepping back from the issues of concurrency and looking at the bigger picture for a moment ... PHP is supposed to be simple, that is what has given it endurance. 

When I say simple, it's important to point out that, I don't just mean that it's simple for the programmer that uses PHP, but also the programmers that maintain it: PHP is not maintained by 50 people with degrees coming out the wazoo, working in nice air conditioned offices, and being compensated to the tune of hundreds of thousands of dollars a year. With the exception of two people (varying by ~-+2 over the years), PHP is maintained by uncompensated volunteers. Whether they have degrees or not (I don't), doesn't really matter because they have real lives, and don't have the time to research and learn things that are not of immediate concern in their normal jobs.

In recent years PHP has become a very complicated thing, there are now parts of the source code that are only really understood by a few people. Which puts PHP in a precarious position.

Fibers are just the most recent complicated chunk of code that most of the people that voted it in don't understand, and could not write, which we are all now burdened with.

Back to concurrency ... It's true that you could write a version of parallel that worked with fibers, you could re-implement CSP to be compatible with both fibers and threads. It would be another huge chunk of code that nobody really understands, that if it was merged, it would be merged on the back of the thoughts of one or two people and understood by just as many.

I love PHP, it's given me the best parts of my career so far. I don't want to do it harm: Developing parallel into the kind of thing that would be required to achieve full integration in the form of a M:N threading model, or even minimal support for fibers, would do PHP harm. It would be yet another chunk of code that not enough people understand. It would make understanding PHP code in 5 years time a whole bunch harder.

I can see no future for parallel concurrency in PHP today ...

It should be on the record that I think merging fibers into core was a premature mistake - It has no user base, and hardly any prospective user base. It's a complicated thing that doesn't actually bring anything new to the table. With the advent of JIT, PHP really was shaping up to be a general purpose language. Fibers are a block in the road, proclaiming that asynchronous concurrency - which again, you could already achieve - is still what is important, and despite the ability to execute instructions on the CPU, your code is only worthy of one core.

The other model has won, it's goodbye from parallel PHP ... 

Of course, it's not goodbye from me ... I'll put my efforts into understanding more of the language we now have.

Peace out, Phomies.