Tuesday, 4 May 2021

Avoiding Busses

 

Fig 1. A Bus

It's always been the case that there are certain parts of PHP source code that only a few people understand. The Karma system used to help us determine where a contributor could commit code in the source tree; If you had /Zend karma, you had a clue about Zend. Among those people with /Zend karma, some people understood more than others.

This was a perfectly sustainable way of developing the language, because while /Zend is complicated, it's written in a language that everybody working on a C project understands. In principle, we can take people who know a little C and turn them into a /Zend karma worthy workhorse for PHP, able to produce patches, and fixes, and features. Indeed, we have done, and are still doing that in the incubator that is Stackoverflow chat.

Many moons have passed ... What do you think the bus factor of PHP is today ?

2

Maybe as few as two people would have to wake up this morning and decide they want to do something different with their lives in order for the PHP project to lack the expertise and resources to move it forward in its current form, and at current pace.

Just focus on that number for a few seconds ... think of the number of people whose livelihoods depend on PHP, the number of mortgages, car payments, school fees, entire payrolls ...

It's the scariest number 2 I have ever seen.

The Two

Everybody who follows the development of PHP knows who these two people are. 

They are Dmitry Stogov, and Nikita Popov.

I don't want to toot my own horn, but I want to make a couple of things clear: I consider myself an asset to the project, I spend a lot of time on PHP, my employers are gracious enough to allow me to do that, but I still do have a normal job to do, not to mention a life. 

Most contributors don't get to spend much, if any, work time on PHP, they are doing it in their spare time - the time they have available for reading endless code is limited, because they have real lives.

There are many people that, like myself, are assets to the project, and if we lost them, we would suffer.

But the difference between myself and Nikita or Dmitry cannot be understated. 

I've been watching Nikita since he started to become involved in PHP; It takes about 10 seconds to realize that you're in the presence of someone that not only is highly qualified - although he was still finishing his education when he became involved - but also highly skilled. Simply, a brilliant mind. 

Dmitry, who has been around for much longer, is another brilliant mind type. He's written stuff that I struggle to model in my head, although I understand the language he's using. It's extremely frustrating to know people like this exist. To list the things that Dmitry has done for the project would be boring to read, and take up too much of your life. Suffice to say that the cleverest parts of PHP tend to have Dmitry's name at the top, including the JIT. Dmitry's name is often followed by Nikita's ..

The JIT

The JIT itself has a bus factor of 1. Nikita understands much more of it today than when it was merged, but Dmitry is the person that works on the JIT; In order to work on a JIT requires a special skill set that you only really develop when you have been working (with high focus) on JIT's or very closely related tech (compilers, assembly, etc) for years, and at this point Dmitry is the only person who has been doing that, and we don't want Nikita to change his focus.

You might think, and we were all sold, that this isn't really a problem because the JIT is self contained in an extension, and can be removed or disabled.

Well, it can't. The moment we merged it, it became a core part of the thing we call PHP. PHP has a JIT, and there's no possible future where we can just remove it.

As an illustration of just how complicated the JIT is: Recently there has been some (very interesting to watch) work going on to bring support for the JIT to arm64. This was proposed and initially implemented by engineers from arm. While Dmitry and the arm engineers have been working on the branch, I've seen the arm engineers struggle to understand and even make mistakes. Minor mistakes, that can only be spotted by Dmitry, nevertheless, these people could not be better qualified to do what they are doing, and it's so hard to understand and get right, that you can be at the very top of your game, in exactly the right field, and get it wrong.

Porting to a new platform is almost certainly the most complex sort of work that you can do on the JIT, and the number of times this will happen is obviously limited, it's a special case. I mention it only to illustrate the kind of complex the JIT is; We don't have to worry that we can't port to new platforms.

When it comes to the JIT, we just have to accept that the skillset required here is rare, and we'll be lucky if its bus factor ever rises above 1.

I'm hopeful that it could: In one possible future, there are so many contributors being paid to work on PHP, that it may give leave to those who are paid full time to focus entirely on the JIT.

We should also recognize that working in close proximity to the JIT, as Nikita and other contributors are, might never equip you to the level where you could say, give it new features or fix very complicated bugs.

The Rest

Nikita has always had a high impact on the project, but since he was employed precisely to impact the project his output is quite remarkable. There's barely a minute of the day where he is not reviewing a thing, writing a thing, fixing a thing, or planning to write a thing, review a thing, or fix a thing. This is obviously great for the project.

There are also several other contributors whose output is high considering they're mostly doing it in their own spare time, and we're all grateful for every minute they spend.

For whatever reasons, many of the people that I still think of as having /Zend karma have gone away, or their output has reduced to almost nothing. I can say from personal experience that it's been a difficult few years to stay relevant, first with NG, then the JIT ... so maybe that explains some of it.

What we've learned since Nikita was employed, is that this is the pace we need ... If he went away now, I doubt if all of the other contributors combined could pick up the slack that would be left. 

This is how I arrive at the number 2.

So, What ?

That number, 2. This is not an acceptable bus factor for a project the size and importance of PHP. 

With every passing RFC, the project gets a little more complicated and has no more ability to maintain that complexity.

So, two things:

Think Differently

I think we ought to approach proposals a little differently in future. 

The overall complexity of the project has grown considerably in the last ten years, and we're all behaving as if we have a feature starved language, trying to cram in just as much syntax and feature additions as we were ten years ago.

We have to look at things in light of the bus factor, which is at the moment, too low. 

We have to look at things in light of the complexity we've already added to the language, some of it needlessly.

In the past, I think most people voted on the basis of what was good for them and their projects. At this point, this is irresponsible. There are not enough people bothering to vote for this to work any more.

I think, voters are now obliged to vote on the basis of what is actually best for the project, with an eye on the future but also in light of the past, in light of the current bus factor, and not based on what they think is good for them.

It's not that we shouldn't have new features, it's that we should weigh the advantage of that feature against the disadvantages we currently face, and try not to fix your thinking in the current moment, and around your current concerns.

I think it's also important that we either abstain from voting on things we don't understand and don't have time to research in order to understand, or, having done the research, vote against it on the basis that we can't understand it.

I think, the tendency to think you're doing the best thing, even if you don't fully understand it, is pernicious, and has proved itself as such.

If you're a voter reading this, you already know I don't have any special powers to convince you of any of this. These are just my thoughts, they might not even make any sense to you ... and nobody has to listen to them ...

So really, just one thing:

Help

It is of the utmost importance that we build our developer base. If you have any knowledge, even cursory, of PHP, or maybe you can write in C and have a willingness to learn, please approach whoever pays your salary and see if you can get some time, that you are compensated for, to help the project your business relies on.

We can raise that bus factor, but even with as many contributors as we have working in their spare time, it doesn't buy us enough focus to get above 2. We don't necessarily need people who work full time on PHP, although that'd be nice. But we need focus we can rely on, that we know will be there next month, next year, and that focus needs to be paid for by the companies that rely on PHP.

I'll just say 2 again ... 2 ...

Peace out, Phomies

Saturday, 1 May 2021

Worthiness



My first commit to pthreads was August 28th 2012, while the initial implementation was written rapidly, it had been on my mind daily - because of Java - for maybe a few years. Almost a decade I've spent trying to show that threads in PHP are possible, that despite what everyone says, PHP was an excellent candidate for threads precisely because it's shared nothing. I've convinced very few people in that time.

When it was obvious that PHP 8 was going to deploy a JIT, allowing PHP to execute on the CPU directly, pthreads was an unwieldy beast, beset by many bad decisions I made during it's development that made it impossible to consider it a candidate for inclusion in php-src.

I set about writing a new API, I dropped the Java inspired OO model and came up with a CSP model - a la go. I called it parallel and had every intention of proposing to include it in PHP 8. I went to great lengths to make sure it was compatible with the JIT, including arguing the case for the JIT to even have thread safety support initially. 

The first time I saw PHP executing in parallel, directly on the CPU it was such a reward and I was super excited for the future.

Parallel concurrency is not the only kid on the block, and it's not even the kid that most PHP programmers are familiar with.

The domain of web things is mostly covered by the umbrella of IO - most web apps are IO machines, doing many database and API calls, all requiring lots of socket programming. In this domain of IO, the kind of concurrency that scales and is useful is asynchronous concurrency.

Without diagrams (which you may find on my blog somewhere) and going into too much detail about the differences between asynchronous and parallel concurrency, asynchronous concurrency is the thing that allows curl_multi, a thing we're all familiar with, to work.

You might ask: "Why bother to give parallel concurrency any attention at all if asynchronous concurrency is what people find useful?"

PHP claims to be a general purpose scripting language, and asynchronous concurrency has one single use (IO), it's not general purpose in any sense of the word. If your code is doing anything that is CPU bound, you do need parallel concurrency to take advantage of the hardware you have. Now, because of the way PHP is deployed (as part of a stack of software) and the way it scales (adding machines) it's true that PHP is spread out over your hardware quite nicely, and when the hardware becomes overloaded, we just add more hardware. Nevertheless, I viewed parallel concurrency as a way to expand the horizons of PHP, to help it be the general purpose language it claims to be.

So ... that's the past covered ...

The Future

It wasn't very long ago that I was writing optimistically about the future ... my optimism has vanished, and here's why ...

Fibers were recently merged into php-src. Fibers help to achieve asynchronous concurrency, they are a kind of green thread - that's to say user space threads that are scheduled by the users code cooperatively. They are not useful for parallel concurrency, at all.

They are complicated though ... They give yet another way for the programmer (or the maintainer of the library they are using) to implement asynchronous concurrency in their applications.

Fibers are squarely aimed not at the general user, but at the maintainers of frameworks and libraries, it's highly unlikely you will be using Fibers yourself.

Despite what the RFC for Fibers claimed about its compatibility with parallel, it is not compatible with parallel.

Parallel implements CSP using normal threading primitives, the moment you try to mix CSP, Fibers, and Parallel, you will either crash, deadlock, or your head will simply overheat, and you will die.

Okay, you may not die ... 

Stepping back from the issues of concurrency and looking at the bigger picture for a moment ... PHP is supposed to be simple, that is what has given it endurance. 

When I say simple, it's important to point out that, I don't just mean that it's simple for the programmer that uses PHP, but also the programmers that maintain it: PHP is not maintained by 50 people with degrees coming out the wazoo, working in nice air conditioned offices, and being compensated to the tune of hundreds of thousands of dollars a year. With the exception of two people (varying by ~-+2 over the years), PHP is maintained by uncompensated volunteers. Whether they have degrees or not (I don't), doesn't really matter because they have real lives, and don't have the time to research and learn things that are not of immediate concern in their normal jobs.

In recent years PHP has become a very complicated thing, there are now parts of the source code that are only really understood by a few people. Which puts PHP in a precarious position.

Fibers are just the most recent complicated chunk of code that most of the people that voted it in don't understand, and could not write, which we are all now burdened with.

Back to concurrency ... It's true that you could write a version of parallel that worked with fibers, you could re-implement CSP to be compatible with both fibers and threads. It would be another huge chunk of code that nobody really understands, that if it was merged, it would be merged on the back of the thoughts of one or two people and understood by just as many.

I love PHP, it's given me the best parts of my career so far. I don't want to do it harm: Developing parallel into the kind of thing that would be required to achieve full integration in the form of a M:N threading model, or even minimal support for fibers, would do PHP harm. It would be yet another chunk of code that not enough people understand. It would make understanding PHP code in 5 years time a whole bunch harder.

I can see no future for parallel concurrency in PHP today ...

It should be on the record that I think merging fibers into core was a premature mistake - It has no user base, and hardly any prospective user base. It's a complicated thing that doesn't actually bring anything new to the table. With the advent of JIT, PHP really was shaping up to be a general purpose language. Fibers are a block in the road, proclaiming that asynchronous concurrency - which again, you could already achieve - is still what is important, and despite the ability to execute instructions on the CPU, your code is only worthy of one core.

The other model has won, it's goodbye from parallel PHP ... 

Of course, it's not goodbye from me ... I'll put my efforts into understanding more of the language we now have.

Peace out, Phomies.