Tuesday, 23 November 2021

2 - 1

Fig 1: The PHP Foundation

I wrote some time ago about the Bus Factor of PHP, the number I came up with was 2.

Just after I wrote about this, JetBrains reached out to me and others and we discussed the idea of starting a foundation. From the very start it has been about raising money to employ people, to bring the bus factor up to a normal level.

Quick diversionary note about bus factor calculations: The number 2 surprises some people; They think all the work going on far exceeds the work of two people, but calculating the bus factor is not about the sum of the total work going on. It is about how much knowledge is wrapped up in any individual, it's about "key players", and the effect it has when key players leave.

To say that 2 - 1 is not 1 would be quite insulting, not unlike PHP, but nevertheless insulting. 

Nikita parts with PHP as the bearer of a lot of knowledge, a large portion of that knowledge is not distributed among other current contributors. The very wide ranging deeply intimate knowledge that Nikita has is there for the taking, the only barrier to most people with the appetite is time: They have to work a job, look after their pet fish, tell their children not to draw on the walls, and other such normal life activities.

This morning, our bus factor is fast approaching one: While Nikita will not likely disappear into thin air and is at least temporarily available, eventually we should expect his output in terms of lines of code to reduce to nil.

The impact that Nikita has had on the lives of developers may be obvious, what may not be obvious is that for internals Nikita raised the bar for the rest of us, not only in what he has achieved but in every aspect of how he has conducted himself. I should declare with everyone else that I admire him, immensely. I look forward to mentoring whomever comes next to maintain the same level of thoroughness, thoughtfulness, and hopefully help to enable and inspire them to be as brilliant.

Let's talk about money

The number one source of financial support for most of the history of PHP has been Zend. Their commitment to PHP has been unwavering, they still employ Dmitry to work on performance (which means mostly opcache and the JIT). In the past they have employed more than one person to work on core.

Other companies have also allowed employees to work on core, I won't try to list them for fear of making a mistake and leaving someone off the list. 

Whenever a company allows an employee to work on PHP it's a big win for all of us, the company included.

Let's talk about influence

Zend have no special powers to influence the direction of the language as a whole, they have no special powers to unilaterally make decisions - although on solely technical questions regarding implementation details, they have earned something like a veto - In that, if Dmitry says something is either harmful (and he cannot reduce or advise how to reduce that), technically flawed, or unfeasible we are all going to listen to him. In the same breath anyone may make similar arguments. It's just that operationally we look to Zend for those arguments, as a matter of fact.

To say they have not influenced the direction of the language as a whole would just not be true. Indeed, they have: Many parts of the language and its internals have been shaped by the fact that Zend pushed for them, enabled by their budget and dedicated engineer or engineers.

It's also true that in the relatively short time that Nikita was with JetBrains, they too had some kind of influence, to say they didn't would be to say there has been no difference between Nikita's output before and during his employment with JetBrains.

Everyone has been sticking to the rules (the RFC process) since its introduction. Nevertheless it is quite obvious that if you buy the time of dedicated individuals with knowledge and the appetite for it, you may push the language forward in a way that is obvious, that everyone feels. 

Let's talk about the future

The Foundation represents a new way to push the language forward. It provides us the mechanism by which to raise the bus factor, so that we never face the problems we face today, and have faced in the past.

There is nobody (that I know of) that is waiting to step into Nikita's shoes. However, the foundation gives us a way to recruit willing contributors, to pay for their education and development as internals engineers, to form a group of such engineers.

Although the details are unclear at this moment, what is clear is that at some time in the not so distant future there will be a group of stakeholders and a group of dedicated engineers working together, listening to each others concerns, deciding between them, albeit indirectly, how to push forward.

This is extremely exciting.

Let's talk about details

I'm sure many readers will want me to expand on the possible interactions between internals and The Foundation.

The Foundation and internals are separate entities, as separate as JetBrains and internals.

The Foundation are empowered to hire whomever they chose to work on whatever they want to work on, but that does not imply that the work must be accepted at RFC time.

The Foundation have 6 months to formalise their working procedures; During that time conversations will be held with internals and the level of interaction which may be anywhere from none to some will be decided by normal means (RFC).

Let's talk about applying

I speak now to all past and current contributors ... 

We are not looking to replace Nikita, nor necessarily your current employer. What we are looking for is individuals with the pre-requisite knowledge to achieve initial goals which you are free to decide. 

This is as low as we can set the bar, and we will work with you to raise that bar: The Foundation has built in a group of fellows that are experienced in all aspects of PHP and its development. We will make ourselves available to mentor anyone the Foundation engage, you'll be supported by us in particular but also the entire internals community.

The Foundation represents an excellent opportunity for budding internals developers and experienced developers alike, an opportunity that we have never been able to offer before.

Take the opportunity !
 
For more info about The PHP Foundation, including how to apply: See the JetBrains announcement
 
Special thanks goes out to everyone that has been involved including all founding sponsors, but in particular Roman Pronskiy and others at JetBrains who have worked very hard to make this possible in a very short time.
 
Peace out, phomies :)

Thursday, 22 July 2021

Docs Not Included

 

Fig 1. Some Docs

One of the best assets we have is the extensive user land documentation for PHP. The documentation is maintained by several teams of developers working on one or more translations of the raw docbook format: A huge amount of effort is expended on the maintenance of user land documentation.

Unfortunately, the source code of PHP is not documented in the same way, in fact, it's barely documented at all.

The topic of how to get started as a beginner comes up rather a lot. 

The best answer depends on your experience; If you are already a confident C programmer, you're not going to need that much help, a good source search tool and a little bit of research and you're probably going to be able to do what you wanted to do without any help.

Those less experienced, typically PHP-first C-last programmers, less confident, or more friendly people looking to do something with PHP should find someone to mentor or help them. 

If you show up in room 11 on stackoverflow and wait five minutes, someone from internals will be along shortly on almost any day of the week.

That's bizarre, why don't you document PHP internals ?

The fact is that by the time you are in a position to document anything, you are far more useful doing other things than you are wasting your time writing documentation for people that will soon be able to write documentation (or not) for themselves.

In addition, complete documentation for internals is a never-ending task and constantly moving target: We try to make the most out of our chances to break ABI, there's almost a constant stream of work going on in some areas - for example, at the moment the JIT, a few weeks ago Fibers.

There is one pretty good attempt at internals documentation on Internals Book, but it took a team of people to put it together and it's perpetually out of date.

This means that code is de facto documentation for PHP, we don't put effort into documenting PHP because we are working on the code, and don't and won't have resources to do anything else; Anything you need to find out is to be found by understanding code. If you can't understand, come and get some help.

I recently expressed this in a chat room to someone who I personally spent many many hours trying to help, only for them to say this in public:

Which is honestly a chicken-shit, cowardly, irresponsible, immature response. Code that isn't documented is bad code. Period. Good naming and such is *part* of documentation, but it is not all of documentation.


When I tried to defend the position that documentation is code for PHP, my response was voted down.

I'm not saying that this is an ideal situation to be in, but it is in fact the situation we are in. 

I don't view it as a huge problem either: Those people that don't need comments every few lines in source to find their way around are not going to be helped by littering source code with information already contained in implementation. Those people that feel like they would be helped by such information are likely to still need someone to hold their hand while they're learning, and we're more than willing to do that hand holding.

While we don't have the resources to document internals, very many of us make ourselves available to help newbies, this we do have the resources for, even if nobody is being paid to make themselves available.

People > Docs

In any case, people are much better than docs. 

Documentation cannot reword itself, explain itself, it can't correct inevitable explanatory mistakes, show examples, tell you about history, or do anything that will really help you to understand if reading the code itself doesn't help you to understand.

That's all I have to say about that right now ...

Peace out phomies :)

Wednesday, 30 June 2021

Only Complete Applications

Fig 1. PFA RFC


Today the vote will close on Partial Function Application, and the feature has been refused.

It was less effort to write a debugger for PHP than it was to make partial function application work! 

The debugger, a few of us wrote in a few days. Partial application took up many weeks of my life, including most of the night time.

It actually ran me into the ground. 

I'm not a very good communicator. I like blogs because I can perform an editorial process, re-arrange my thoughts and move toward the perfect words. 

But, other humans, in general, baffle me. I can't tell how other people think, and what they know, so I can't understand what they are saying or what they need me to say a lot of the time. When you ask me a question, even if I definitely know the answer, I'll spend some time paralyzed by anxiety, to some degree caused by all my previous failures to communicate with humans properly.

A lot of questions were being asked of me the whole time, and this took a toll ... simply put, I spent quite a lot of this time anxious, exhausted, and sad.

In addition, I became physically sick toward the end with suspected Covid. But, I tested negative, had a 24 (maybe 20) hour break and more or less carried on at the same pace.

It is a huge understatement to say I put a lot of effort into this thing ...

In that time, we had at least two different full implementations of the idea. 

The first implementation had only one placeholder, the complexity of the implementation at this point was quite low. However, it resulted in semantics that apparently didn't make sense to anyone. The main gripe being that the placeholders semantics depended on its position in a list of arguments.

Any implementation of this is so involved with details of the engine that inevitable bike shedding ensued.

Rather than looking around at other languages, we decided we needed multiple placeholder symbols, support for partial application of the new operator, out of order application by supporting named arguments, and even named placeholders - essentially changing the feature into something other than partial application - that's actually very definitely function (API) redefinition.

Between the first and last implementation, I attempted to move toward the semantics people wanted, while retaining as much simplicity in the implementation as possible.

At one point, I had an implementation that supported all of the crazy things people were bike shedding about, including re-ordering named parameters (necessary for named placeholder support). 

What became clear in this time is that we needed to define the semantics in such a way that limits complexity.

We dropped named placeholders, and settled on two symbols with easy to understand semantics. While we're left with semantics and rules that you can write in a few lines, it only limits complexity, you are still left with something complicated.

Then we get to the last implementation, which I stayed awake for more than 30 hours, while sick, to write.

I made some glaring (to some people) omissions, but overall we had a solid implementation with easy to understand semantics and soon after the vote started.

Why did this fail ?

If you asked me why it failed, I would have to say bike shedding is somewhat to blame.

Read the next sentence with the logical bit of your brain:

People who don't know how to implement something are not well equipped to decide how that something should work.

This seems to be an obvious truth, but may come across as elitist ... I don't actually care. 

Elitism is good - You want an elite doctor to perform your eye surgery, you want elite scientists doing the research that will save the world. I'm all for elitism ...

I'm not saying we shouldn't listen to feedback - I did remember, even when it was detrimental to the implementation, not to mention detrimental to my physical and mental health.

I know that the people in the bike shed, making suggestions, making complaints, in some cases explicitly bike shedding ("I know this is bike shedding, but"), they think they are helping the conversation along by talking about things they understand, while ignoring all the stuff they don't understand outside the bike shed. 

I'm willing to admit that sometimes they do move the conversation along, but think it's by accident; The conversation would have moved along anyway, possibly faster if they hadn't intervened.

Here, bike shedding resulted in a lot of wasted time, that's a fact.

The other reason is complexity. There are two distinct kinds of complexity here:
  • language complexity - what do people that are writing PHP have to know in order to understand code containing partial application ?
  • implementation complexity - what do people that are maintaining, debugging, or developing the engine have to know ?
When it comes to language complexity, this is mostly determined by semantics. Once we landed on semantics we can explain in a few sentences, we've reduced that as much as we can.

When it comes to internal, implementation complexity ...

Why is this complicated ?

We can all knock up a class in 10 minutes that performs something that looks a bit like partial application, we can do that in userland.

What it won't be, is partial application: You cannot do the things we do internally from userland; You don't have the ability to rebuild prototypes (in any sensible way), manipulate the stack, interact with the GC in certain ways, and a list of a million other things.

The fact is that any proper implementation of partial application is inherently complicated.

Some of that complexity is due to the semantics you choose for placeholder symbols (even when reduced as much as possible), and some is due to the interactions between the engine and this new kind of object, created by interrupting a call where the engine does not expect this kind of interruption.

If we're going to have a proper implementation of partial application, that retains type information, is efficient (cumulative, as partial application is meant to be), and has semantics that are useful and easy to understand, then the implementation carries with it complexity that cannot be reduced.

Was the right decision made ?

Yes.

Although other contributors to the RFC were focused on the use case of pipes, I don't really even like pipes.

My motivation for doing any of this, is that it was interesting to write. My motivation for wanting it to be actually merged is that I'm interested in the use cases that would have been found beyond those we suggested. I don't know what they look like, and guess I'll never find out.

It's highly likely those use cases, which I imagined existed, simply do not exist.

Those voters that could think of use cases, but voted no because they couldn't look past language or implementation complexity, were absolutely right to do so.

Complexity must be justified, and if it isn't, we should not add the feature.

That's all I have to say about that right now ...

Peace out phomies :)

Monday, 28 June 2021

Literally Internals

Fig 1. Some Magnified Strings

How much magnification does it take to make something quite tidy, like a piece of string, look an utter mess ?

There is an RFC in progress called is_literal which I'm providing the implementation for.

I want to talk a little bit about that ...

Where did it start ?

You imagine, I'm going to talk about the RFC now. But I'm not; It started 25 and some odd years ago.

When we come to write an RFC, we have to deal with PHP the way that it is today, after more than a quarter of a century of development. In particular, and most importantly, we have to deal with extremely aggressive optimizations to the source that have been performed since NG, we also have to deal with optimizations performed by Opcache and the subtle differences that Opcache introduces. 

It can be difficult to make changes in this system without inadvertently effecting other parts of the system - and so code - that is not even using the feature.

Sometimes, it's possible to add something quite complicated and not have an effect on the complexity or functionality of the rest of the engine. 

For example, as complicated as the internals of the Fiber implementation are; They are self contained, mostly don't effect other parts of the engine, and we're not responsible for the maintenance of the most complicated code it uses. There is still complexity above and beyond the code we don't own (boost owns it), but it looks manageable because it's contained.

What is going on ?

The is_literal RFC seeks to provide a tool for userland that can help to avoid injection vulnerabilities, where strings composed of literal values and user provided input (strings) may lead to injection.

It would seem simple enough to make a flag on literal values, that the programmer typed in their code, and allow them to detect the literalness of any variable at runtime.

But we have all of history bearing down on us, and it's not so straight forward.

We began to focus on strings alone, the reason is that we can find space on the structure that represents a string to set a flag, and avoiding user input strings is obviously required for any implementation.

Because of optimizations in NG - scalars with types below string are stored on the stack, and are not refcounted - and one optimization that came after, there is no usable space on every variable for a flag.

There is space, but in order to use it, we would have to disable an optimization that assumes there is only one flag set in the only place where we could set a flag. This would not be an acceptable implementation detail, and so is not possible.

For string support to be generally useful, the engine must produce a literal where all of the input to an instruction (or function) is literal. This allows the programmer to reason easily about how concatenation (or other functions that are literal aware) work.

Concatenation is how people tend to build their queries, even if they are using parameterized queries, even if they are using a query builder, concatenation is still used.

What's the problem ?

Early on in the discussion, a couple of people requested that we allow strings and integers to be concatenated to produce a literal. When this was requested, at least one person who requested it knew that we wouldn't be able to track the source of integers.

I didn't like this idea at first, it's less than pure. Nevertheless, I spent rather a long time thinking about it, before determining that nothing dangerous can happen - if we're talking about injection - if you concatenate a string and an integer, it cannot lead to injection.

At compile time, before any user input has been provided, the engine may optimize certain concatenations and even function calls, that produce literal strings, and may contain types other than string. In other words, the engine is allowed to concatenate whatever it wants if it can determine there are no side effects and it has all of the information available to perform the concatenation (or indeed, function call) early. None of these concatenations may include user input. So it's "safe" in the narrow sense that the programmer provided all of the data, there may be a mistake in the query, but an injection is not possible.

I came around to seeing that including support for integers, even though we are not able to track their source, creates some symmetry - runtime concatenation or calls will behave the same way as the compiler and opcache with regard to string and integer values.

A wave of fear washed over internals, and some very loud people objected, and coloured the conversation.

I'm not a security expert, just a code monkey. I tried to reason with some of these people and it failed hard.

So I reached out to somebody that everybody recognizes as a security expert in the PHP ecosystem, Scott Arciszewski from Paragon Initiative Enterprises. 

I was quite ready to admit I was wrong when I asked them the question "Is it reasonable to include support for concatenating a string and an integer even when the source of the integer is unknown?"

Here's an excerpt from their response to the mailing list:

Injection attacks (SQL injection, LDAP injection, XSS, etc.) are, at
their core, an instance of type confusion between data and code. In
order for the injection to *do* anything, it needs to be in the same
input domain as the language the code is written in. Try as you might,
there is no integer that will, upon concatenation with a string,
produce a control character for HTML (i.e. `>`) or SQL (i.e. `'`).

I really thought this would help. I know that lots of the people reading internals aren't security experts, and clear words, and clear thoughts, from someone who is should help them to make good decisions.

It didn't really help, people just started to argue  ... which I found embarrassing ... 

Because of a bad naming decision (for a little while, the RFC was called is_trusted), there is this idea that if is_literal (or whatever you want to call it) returns true, the value is safe to use in all circumstances, that not only should it be free of injection, but it should be free of mistakes.

What we wanted at this point was to rebrand, we wanted to frame the thing we are introducing as the concept of Nobility. The name and idea having been suggested by Scott.

We never got to do this, but I think it would have been our best move. We get to define what nobility is, what kind of data it includes, and how they interact (or fail too, because noble) with other variables.

Instead, we had to remove the support for integers that made the feature easier to reason about and more generally useful.

Where are we now ?

Without support for integers, we are left with something that may look inconsistent if you pay close enough attention.

We cannot disable the optimizations in the compiler or opcache that lead to the production of literal values inclusive of integers (and other types). That would obviously not be an acceptable implementation detail.

So now, whether or not the engine produces a literal depends on the very fine details of how you wrote the code or performed a call. Without detailed knowledge of the engine, this makes is_literal look unpredictable and difficult to reason about.

In addition, we've broken a basic, and safe (in the narrow sense we are talking about) use case - you can no longer rely on the concatenation of a string and an integer producing a literal.

What do we do now ?

I'm not sure. 

I'm disappointed that the expert opinion I solicited did not change the direction of the conversation. If you're not going to listen to an expert in the field of security, about security things, I think you're not really going to listen to anyone, you consider yourself the expert maybe.

I would like to be free to define the concept of nobility, and I'd like people to approach that discussion armed with the expert advice we've had, and free of the notion that we are trying to protect you from mistakes in general.

I'm unsure of our next move ...

Peace out Phomies.

Saturday, 19 June 2021

Wasting Time

 

Fig 1. A Bin

Most days, I try to find some time to work on PHP. I consider it my mission to push this thing forward. Recently, I've also made it my mission to get your boss to pay you to push this thing forward. 

One of the problems with this, that I hear all the time:

What if we allow one of our employees to spend a bunch of time working on a feature only for that feature to be refused ?

It looks like, I'm asking you to potentially waste your company resources on things that might never get voted into PHP.

How it Works

For those of you that don't know, I'm going to lay out the path for a feature, from inception to inclusion:
  1. You think of a feature (or borrow one from another language)
  2. You approach internals (by sending a mail to a mailing list or opening a PR)
  3. You try to gather consensus on the mailing list and the PR
  4. You request access to create a Wiki document (an RFC)
  5. You spend two weeks (at least) discussing the addition on internals and responding to feedback
  6. You open a vote, which lasts two weeks.
A two thirds majority (two yes votes for every no vote) is required for the feature to pass and be included.

Minimally then, I'm asking you to spend at least a month, likely taking up at least some time every day to respond to conversation on internals or queries on the pull request.

At the end of this, if the feature doesn't get voted in, have you wasted your time ?

The Question

Whether or not you feel like your time has been wasted depends on what you think you are doing when you propose an RFC.

There are a lot of questions that can be answered instantly, for example: Who is the best power ranger ? It's the pink one, obviously. Other questions may not be answered in your lifetime, such as Can we eradicate cancer ?

The question you ask when you propose an RFC is somewhere in between, it's going to take at least a month to answer, and is more complicated than choosing the best power ranger.

The question is this:
Do we want to include this feature, as proposed, at this time ?

If you consider this the question, and consider the RFC process the means by which to answer the question, whatever the answer to the question is, you haven't actually wasted any time - You set out to answer a question and done so.

The Reality

Obviously, you are likely to be biased as the proposer of the feature, and would prefer a positive response. But, it's important to note that a negative result doesn't equate to "We never want this feature".

There may be things you can do to change your proposal so that it is more palatable for internals, and so more acceptable at a later time.

When you spend a month or more working on something, and thinking about it, you become invested in the idea, and I recognize that. But, you are one person (or a small group of people) acting on behalf of millions. When things don't go the way you would prefer, you have to accept that you are simply wrong.

There's nothing wrong with being wrong, it just means there's more to learn, more work to do, or more to understand. Since learning, working, and understanding are things that, as programmers, we enjoy, it may even be better to be wrong than right, in a sense.

It may not be obvious, but even if your feature doesn't get in, you have pushed PHP forward a little bit: By dispersing your ideas you may inspire others, you may have come up with the answer to a question that hasn't been asked yet, and you've answered the question you set out to answer.

The code you wrote may end up in the bin, that is an unavoidable fact. 

However, that doesn't at all mean that you've wasted your time.

That's all I have to say about that right now.

Peace out phomies :)

Wednesday, 9 June 2021

Untangling Fibers

Fig 1. Some Fiber

 

Fibers are going to be available from PHP 8.1, they were voted in 50 to 14 ... I was one of the 14.

I've said before that I think merging Fibers was a mistake, but at this point, it doesn't matter what I think or thought: They are in fact part of the source code, and we have to work on this code together. So, I've tried to familiarize myself with evolving implementation details, be involved in the conversation, review pull requests, and generally make myself as useful as possible.

During the discussion phase of the RFC, the Swoole maintainers made clear that they did not approve of the implementation for various technical reasons.

There now appears to be some perceived friction with internals.

I want to deal with a couple of comments that I've heard repeated, or sentiments expressed:

Swoole is being treated with mistrust, possibly because it's Chinese.

This is a bizarre statement, and doesn't bear any resemblance to the truth.

While there may be some language barriers that make it hard for us to communicate in human words, especially on very technical matters: All of us speak the language of C. Even if nobody from Swoole spoke a word of English, we could, and would try to communicate in code.

There are also many members of internals whose first language is not English. There are pull requests that come with no words at all for that reason - albeit much simpler in general. Nevertheless, it illustrates that we do not need to understand each others tongues as much as you might think.

The idea that we would, should, or are exhibiting mistrust toward anyone because of their country of origin is abhorrent.

Swoole have years of experience in this field and we are discounting their opinions.


Swoole is a very clever extension, very clearly written by skilled and driven individuals, it's also a much more complete solution. I've also heard suggestion that because it is a complete solution it makes sense to adopt it.

Swoole maintainers have many years of experience in the field of developing an extension that provides a complete solution to the problems of co-op PHP. 

However, when internals votes, and votes with the kind of turnout the Fibers RFC had, we do so with maybe a couple of hundred years of collective experience in the field of developing a programming language.

Developing a language is the relevant field here. 

When we are writing extensions and we make them do bad things - and I know a little bit about extensions that do bad things - we make our peace with it, because they make our cool idea work.

We don't import that level of magic into PHP, it just isn't going to and should not happen.

Internals voted on the simple, bare bones implementation of Fibers quite purposely. When we done that, we left open many questions that the Fiber RFC did not intend to answer, purposely, around how to actually deploy Fibers in an application.

The adoption of a thing that looks like (all of, or any significant part of) Swoole, whether it comes in one part or one hundred parts is utterly out of the question because there is no mandate for that.

This does not mean that we don't find their feedback important, and useful: It means, quite clearly, there is some disagreement about the question of preparing the rest of PHP for Fibers. Swoole having found answers to those questions naturally think they have the correct answers, because they work.

I'll just re-iterate, and clarify: What happens in extensions, stays in extensions ...

What is actually happening ?


Usually, we like an implementation for an RFC to be ready at merge time. 

In this case, because of the obvious need to develop the internal API and implementation details of Fiber, we decided to merge the implementation as it was, and iterate on that implementation with small pull requests - like most projects, we prefer small focused pull requests.

We want to move the implementation in a direction that is useful to everyone, within the confines of the mandate that the RFC gave us.

We are trying hard to do that, there's no ulterior motive to exclude anyone, and no mistrust, or anything of the sort.

We are simply trying to work together on this thing .... that is all.

That's all I have to say about that right now ...

Peace out, phomies :)

Tuesday, 4 May 2021

Avoiding Busses

 

Fig 1. A Bus

It's always been the case that there are certain parts of PHP source code that only a few people understand. The Karma system used to help us determine where a contributor could commit code in the source tree; If you had /Zend karma, you had a clue about Zend. Among those people with /Zend karma, some people understood more than others.

This was a perfectly sustainable way of developing the language, because while /Zend is complicated, it's written in a language that everybody working on a C project understands. In principle, we can take people who know a little C and turn them into a /Zend karma worthy workhorse for PHP, able to produce patches, and fixes, and features. Indeed, we have done, and are still doing that in the incubator that is Stackoverflow chat.

Many moons have passed ... What do you think the bus factor of PHP is today ?

2

Maybe as few as two people would have to wake up this morning and decide they want to do something different with their lives in order for the PHP project to lack the expertise and resources to move it forward in its current form, and at current pace.

Just focus on that number for a few seconds ... think of the number of people whose livelihoods depend on PHP, the number of mortgages, car payments, school fees, entire payrolls ...

It's the scariest number 2 I have ever seen.

The Two

Everybody who follows the development of PHP knows who these two people are. 

They are Dmitry Stogov, and Nikita Popov.

I don't want to toot my own horn, but I want to make a couple of things clear: I consider myself an asset to the project, I spend a lot of time on PHP, my employers are gracious enough to allow me to do that, but I still do have a normal job to do, not to mention a life. 

Most contributors don't get to spend much, if any, work time on PHP, they are doing it in their spare time - the time they have available for reading endless code is limited, because they have real lives.

There are many people that, like myself, are assets to the project, and if we lost them, we would suffer.

But the difference between myself and Nikita or Dmitry cannot be understated. 

I've been watching Nikita since he started to become involved in PHP; It takes about 10 seconds to realize that you're in the presence of someone that not only is highly qualified - although he was still finishing his education when he became involved - but also highly skilled. Simply, a brilliant mind. 

Dmitry, who has been around for much longer, is another brilliant mind type. He's written stuff that I struggle to model in my head, although I understand the language he's using. It's extremely frustrating to know people like this exist. To list the things that Dmitry has done for the project would be boring to read, and take up too much of your life. Suffice to say that the cleverest parts of PHP tend to have Dmitry's name at the top, including the JIT. Dmitry's name is often followed by Nikita's ..

The JIT

The JIT itself has a bus factor of 1. Nikita understands much more of it today than when it was merged, but Dmitry is the person that works on the JIT; In order to work on a JIT requires a special skill set that you only really develop when you have been working (with high focus) on JIT's or very closely related tech (compilers, assembly, etc) for years, and at this point Dmitry is the only person who has been doing that, and we don't want Nikita to change his focus.

You might think, and we were all sold, that this isn't really a problem because the JIT is self contained in an extension, and can be removed or disabled.

Well, it can't. The moment we merged it, it became a core part of the thing we call PHP. PHP has a JIT, and there's no possible future where we can just remove it.

As an illustration of just how complicated the JIT is: Recently there has been some (very interesting to watch) work going on to bring support for the JIT to arm64. This was proposed and initially implemented by engineers from arm. While Dmitry and the arm engineers have been working on the branch, I've seen the arm engineers struggle to understand and even make mistakes. Minor mistakes, that can only be spotted by Dmitry, nevertheless, these people could not be better qualified to do what they are doing, and it's so hard to understand and get right, that you can be at the very top of your game, in exactly the right field, and get it wrong.

Porting to a new platform is almost certainly the most complex sort of work that you can do on the JIT, and the number of times this will happen is obviously limited, it's a special case. I mention it only to illustrate the kind of complex the JIT is; We don't have to worry that we can't port to new platforms.

When it comes to the JIT, we just have to accept that the skillset required here is rare, and we'll be lucky if its bus factor ever rises above 1.

I'm hopeful that it could: In one possible future, there are so many contributors being paid to work on PHP, that it may give leave to those who are paid full time to focus entirely on the JIT.

We should also recognize that working in close proximity to the JIT, as Nikita and other contributors are, might never equip you to the level where you could say, give it new features or fix very complicated bugs.

The Rest

Nikita has always had a high impact on the project, but since he was employed precisely to impact the project his output is quite remarkable. There's barely a minute of the day where he is not reviewing a thing, writing a thing, fixing a thing, or planning to write a thing, review a thing, or fix a thing. This is obviously great for the project.

There are also several other contributors whose output is high considering they're mostly doing it in their own spare time, and we're all grateful for every minute they spend.

For whatever reasons, many of the people that I still think of as having /Zend karma have gone away, or their output has reduced to almost nothing. I can say from personal experience that it's been a difficult few years to stay relevant, first with NG, then the JIT ... so maybe that explains some of it.

What we've learned since Nikita was employed, is that this is the pace we need ... If he went away now, I doubt if all of the other contributors combined could pick up the slack that would be left. 

This is how I arrive at the number 2.

So, What ?

That number, 2. This is not an acceptable bus factor for a project the size and importance of PHP. 

With every passing RFC, the project gets a little more complicated and has no more ability to maintain that complexity.

So, two things:

Think Differently

I think we ought to approach proposals a little differently in future. 

The overall complexity of the project has grown considerably in the last ten years, and we're all behaving as if we have a feature starved language, trying to cram in just as much syntax and feature additions as we were ten years ago.

We have to look at things in light of the bus factor, which is at the moment, too low. 

We have to look at things in light of the complexity we've already added to the language, some of it needlessly.

In the past, I think most people voted on the basis of what was good for them and their projects. At this point, this is irresponsible. There are not enough people bothering to vote for this to work any more.

I think, voters are now obliged to vote on the basis of what is actually best for the project, with an eye on the future but also in light of the past, in light of the current bus factor, and not based on what they think is good for them.

It's not that we shouldn't have new features, it's that we should weigh the advantage of that feature against the disadvantages we currently face, and try not to fix your thinking in the current moment, and around your current concerns.

I think it's also important that we either abstain from voting on things we don't understand and don't have time to research in order to understand, or, having done the research, vote against it on the basis that we can't understand it.

I think, the tendency to think you're doing the best thing, even if you don't fully understand it, is pernicious, and has proved itself as such.

If you're a voter reading this, you already know I don't have any special powers to convince you of any of this. These are just my thoughts, they might not even make any sense to you ... and nobody has to listen to them ...

So really, just one thing:

Help

It is of the utmost importance that we build our developer base. If you have any knowledge, even cursory, of PHP, or maybe you can write in C and have a willingness to learn, please approach whoever pays your salary and see if you can get some time, that you are compensated for, to help the project your business relies on.

We can raise that bus factor, but even with as many contributors as we have working in their spare time, it doesn't buy us enough focus to get above 2. We don't necessarily need people who work full time on PHP, although that'd be nice. But we need focus we can rely on, that we know will be there next month, next year, and that focus needs to be paid for by the companies that rely on PHP.

I'll just say 2 again ... 2 ...

Peace out, Phomies

Saturday, 1 May 2021

Worthiness



My first commit to pthreads was August 28th 2012, while the initial implementation was written rapidly, it had been on my mind daily - because of Java - for maybe a few years. Almost a decade I've spent trying to show that threads in PHP are possible, that despite what everyone says, PHP was an excellent candidate for threads precisely because it's shared nothing. I've convinced very few people in that time.

When it was obvious that PHP 8 was going to deploy a JIT, allowing PHP to execute on the CPU directly, pthreads was an unwieldy beast, beset by many bad decisions I made during it's development that made it impossible to consider it a candidate for inclusion in php-src.

I set about writing a new API, I dropped the Java inspired OO model and came up with a CSP model - a la go. I called it parallel and had every intention of proposing to include it in PHP 8. I went to great lengths to make sure it was compatible with the JIT, including arguing the case for the JIT to even have thread safety support initially. 

The first time I saw PHP executing in parallel, directly on the CPU it was such a reward and I was super excited for the future.

Parallel concurrency is not the only kid on the block, and it's not even the kid that most PHP programmers are familiar with.

The domain of web things is mostly covered by the umbrella of IO - most web apps are IO machines, doing many database and API calls, all requiring lots of socket programming. In this domain of IO, the kind of concurrency that scales and is useful is asynchronous concurrency.

Without diagrams (which you may find on my blog somewhere) and going into too much detail about the differences between asynchronous and parallel concurrency, asynchronous concurrency is the thing that allows curl_multi, a thing we're all familiar with, to work.

You might ask: "Why bother to give parallel concurrency any attention at all if asynchronous concurrency is what people find useful?"

PHP claims to be a general purpose scripting language, and asynchronous concurrency has one single use (IO), it's not general purpose in any sense of the word. If your code is doing anything that is CPU bound, you do need parallel concurrency to take advantage of the hardware you have. Now, because of the way PHP is deployed (as part of a stack of software) and the way it scales (adding machines) it's true that PHP is spread out over your hardware quite nicely, and when the hardware becomes overloaded, we just add more hardware. Nevertheless, I viewed parallel concurrency as a way to expand the horizons of PHP, to help it be the general purpose language it claims to be.

So ... that's the past covered ...

The Future

It wasn't very long ago that I was writing optimistically about the future ... my optimism has vanished, and here's why ...

Fibers were recently merged into php-src. Fibers help to achieve asynchronous concurrency, they are a kind of green thread - that's to say user space threads that are scheduled by the users code cooperatively. They are not useful for parallel concurrency, at all.

They are complicated though ... They give yet another way for the programmer (or the maintainer of the library they are using) to implement asynchronous concurrency in their applications.

Fibers are squarely aimed not at the general user, but at the maintainers of frameworks and libraries, it's highly unlikely you will be using Fibers yourself.

Despite what the RFC for Fibers claimed about its compatibility with parallel, it is not compatible with parallel.

Parallel implements CSP using normal threading primitives, the moment you try to mix CSP, Fibers, and Parallel, you will either crash, deadlock, or your head will simply overheat, and you will die.

Okay, you may not die ... 

Stepping back from the issues of concurrency and looking at the bigger picture for a moment ... PHP is supposed to be simple, that is what has given it endurance. 

When I say simple, it's important to point out that, I don't just mean that it's simple for the programmer that uses PHP, but also the programmers that maintain it: PHP is not maintained by 50 people with degrees coming out the wazoo, working in nice air conditioned offices, and being compensated to the tune of hundreds of thousands of dollars a year. With the exception of two people (varying by ~-+2 over the years), PHP is maintained by uncompensated volunteers. Whether they have degrees or not (I don't), doesn't really matter because they have real lives, and don't have the time to research and learn things that are not of immediate concern in their normal jobs.

In recent years PHP has become a very complicated thing, there are now parts of the source code that are only really understood by a few people. Which puts PHP in a precarious position.

Fibers are just the most recent complicated chunk of code that most of the people that voted it in don't understand, and could not write, which we are all now burdened with.

Back to concurrency ... It's true that you could write a version of parallel that worked with fibers, you could re-implement CSP to be compatible with both fibers and threads. It would be another huge chunk of code that nobody really understands, that if it was merged, it would be merged on the back of the thoughts of one or two people and understood by just as many.

I love PHP, it's given me the best parts of my career so far. I don't want to do it harm: Developing parallel into the kind of thing that would be required to achieve full integration in the form of a M:N threading model, or even minimal support for fibers, would do PHP harm. It would be yet another chunk of code that not enough people understand. It would make understanding PHP code in 5 years time a whole bunch harder.

I can see no future for parallel concurrency in PHP today ...

It should be on the record that I think merging fibers into core was a premature mistake - It has no user base, and hardly any prospective user base. It's a complicated thing that doesn't actually bring anything new to the table. With the advent of JIT, PHP really was shaping up to be a general purpose language. Fibers are a block in the road, proclaiming that asynchronous concurrency - which again, you could already achieve - is still what is important, and despite the ability to execute instructions on the CPU, your code is only worthy of one core.

The other model has won, it's goodbye from parallel PHP ... 

Of course, it's not goodbye from me ... I'll put my efforts into understanding more of the language we now have.

Peace out, Phomies.