Wednesday, 25 September 2019

Flossing Thoroughly

Fig 1. Some Floss

In general, we initially contribute to open source software for selfish reasons: Perhaps we are interested in some code, or we want or need to fix something, or we just want to write something cool. Whatever the motivation, we imagine that we are going to extract more in benefits than it costs us to contribute ... even if the benefits are just internet points.

I haven't said anything surprising ... It seems obvious that people that contribute to open source are getting something out of it.

This, I think, is what makes business feel comfortable in failing to compensate those people that work in the open on software which they depend.

It should be no surprise that contributing can become a burden, which we tend to bear silently.

The Preface

How often do you go into your living room, or garden, or to a field, and just stand there, doing absolutely nothing ?

I never do this, and nor does my wife, as far as I know. I think it's because free time doesn't actually exist ...

We can say for sure that humans require certain things to function at an optimum level. Take away one of these things and this affects our ability to perform, to interact, to function in general.

Maybe you want to say that free time is that time which we spend engaging our hobbies ...

Preventing burn out is extremely important. Some industries have laws preventing employees from working for too many hours at once, in effect regulating how much sleep they must have.

Unfortunately, we have no such direct protections. While it's true that in general most of us should be protected by some kind of employment law (or contractual clauses), it's also true that most of us have heard a manager talk with pride on the subject of being able to push their team to work unreasonable hours in the pursuit of achieving goals which they determined.

Having hobbies is a part of a healthy work ethic that managers should encourage, and workers should be unapologetic for.

Maybe you want to say that free time is that time which we spend with our families or friends ...

We may be responsible for some family members, calling any time we spend with our children free time is nonsense: Parenting doesn't just happen, you actually have to put effort in, and that's what we are doing, all the time.

Time we spend with our significant others or friends is necessary for the maintenance of those relationships. The loss of any one of these relationships can potentially destroy our ability to function, perhaps beyond repair.

Tech is a very unbalanced industry in terms of diversity. I think this makes it all the more important that people that work within the industry have strong influences from outside. Obviously, this doesn't solve the problem in our industry, but it's interesting to ponder what difference it would make if the influence of peer groups were taken away.

Even if you are one of those street performers that take up temporary residence in city centres the world over on weekends to entertain us with juggling and mime, perhaps while (miming) riding a little bike .... maybe it takes you three hours to get ready to do that because you're a perfectionist and makeup is hard ... that's not free time, that's time you need, to be you ...

The Deal

Most human beings that manage to survive to adulthood have to earn a living to support themselves and possible dependants.

As adults we strike a deal with one or many employers, or otherwise establish an income or several income streams. We trade some of our lives for these streams of income so that we have money for makeup, those little white gloves you need to mime stuff (and perhaps little bikes).

All of the rest of our time on earth is not free time in any sense, it belongs to us ... it belongs to our friends, our family.

The Problem

When we start a project, or on our journey of contributing, we are essentially gifting the community with some code, or some work of some kind. We trade some of our life for whatever the perceived benefits of contributing will be.

Maybe we miss a weekend out with our friends, or we miss dinner with the kids for a few days to make that initial contribution ... We accept that because we obviously enjoy what we are doing, and secondarily (for most people) we want to try to provide something useful.

Now, in those cases where a contribution needs maintenance - you publish a component/library/extension/whatever - we obviously know that this means an ongoing commitment is required. The fact of the matter is, our calculation of this ongoing commitment is just another quote or guesstimate that we get wrong, and that can only be expected.

Way before you actually "succeed" - your project becomes popular, or input becomes sought after - maintaining the project, or otherwise continuing to contribute begins to steal time from other activities. The more successful you are, the more incorrect your guesstimate and so the more pronounced the problem.

This is not part of the deal: We are essentially functioning as we would for an employer, but nobody is paying us.

The Solution

It's quite simple: We need to start compensating those people who maintain or otherwise heavily contribute to projects our businesses rely on.

We need to do that because these people are essentially in our employ ... and that's the deal ...

The Conclusion

I hope you look around at your stack and think about how your (employer's) business would continue to function if some open project disappeared, or otherwise stopped being developed.

I hope I have made the position that people who work in the open don't need to be compensated, something between awkward and morally (or socially) untenable, obviously destructive, and self destructive.

I hope you look around your community, and identify those people that deserve to be compensated.

You can compensate me personally via Patreon.

Peace out, phomies ...

Thursday, 12 September 2019

Missing Bits

Fig 1. A tweet I tweeted
It's 1989, I'm 5 years old, my hair is terrible and my shorts are as long as my socks. It's the end of my third day of school, and as I queue to leave the classroom, I notice a boy standing at the doorway holding a large colourful bag. As the queue progresses, gossip fills the air and excitement grows. I get to the front of the queue and am told to put my hand in the bag and grab something. I did so, quickly and rushed, or was pushed out of the door by the queue ... I'm outside, and can hardly believe that I, without having to perform any parlour tricks like tidying my bedroom, have in my possession a free bag of candy.

Getting things for free is really awesome. But, we're not five years old anymore and understand that nothing is really free, least of all candy. Somebody pays for everything; The only questions are who, and how.

The freedom that allows us to give software away for free is one of the most valuable things we have, for those of us that have it.

We can't know when we decide to release some software how popular it will become. We release it in the hope it's useful, and in the hope that any sustenance the project requires beyond that which we are willing (or rather practically able) to provide will, in the long term, be provided organically by the community that uses it.

This ideal typically works: While no money is changing hands, the community is paying, with time and expertise, to push the software forward.

The opposing forces in OSS are the force applied by peers to provide ever greater software on one side, and on the other side the forces applied by having to hold down a full time job combined with the force applied by all or one of your family, friends, and train shop, to spend more time with your family, friends, or toy trains respectively.

For every project that has reached equilibrium and sustains itself ideally, there are hundreds that don't because they can't. Eventually these projects are at risk of being poorly maintained, abandoned, or worst of all are no longer able to provide what they provide today for free.

I'm going to take Xdebug as an example here, but the same is applicable to many open source projects that we all use ...

When Derick released the first version of Xdebug 17 years ago, he started out a provider. While it took a lot of expertise to release the first version, nobody at that time would have called him an expert.

For the first couple of years until 2004, Xdebug was developed in his spare time. In 2004-2009 he worked at eZ Systems and was allowed to work on Xdebug as part of his contract. The time allowed or used was roughly (and from fuzzy memory of a long time ago) 10-20%.

When I asked Derick if during those 5 years, he worked on Xdebug less during his free time, he said no "because there is fuck all to do in Norway". I think this outlook is mostly a result of his cheerful disposition. I've asked a few people that have been allowed to work on OSS as an employee the same question and got the same response - it doesn't tend to reduce the amount of free time you use for OSS.

For the rest and the vast majority of the last 17 years, every patch, fix, improvement, and new version of Xdebug was written, tested, and released when Derick should have been doing normal life things.

At a certain point, and as a result of the entire community asking questions of Derick and demanding solutions and improvements to the problem of debugging PHP software, he transitioned to a domain expert.

It so happens that I personally have some expertise in this area, and so when I say that the number of domain experts in this field can be counted on your fingers, you can believe me: It's not an exaggeration to say that we have in Derick a world class domain expert.

In these circumstances, where the skills and knowledge required to maintain some software simply do not exist in the wider community, and are contained only within the expert (or possibly experts) we have created, equilibrium is obviously not achievable.

Through sheer passage of time and number of users the pressure on Derick, or any OSS project maintainer in a similar position, has increased to almost unmanageable proportions that could not have been forseen.

I've used Xdebug as an example here because I know a bit about the area, and it's possibly the most glaring example in the ecosystem of a project that without funding will struggle to be maintained at a reasonable pace, and can't reasonably be developed or improved. The fact is that many such projects, pillars of the ecosystem which we all rely on to make a living, are in the same position.

I hope, I've said enough words to convince you that making monetary contributions or approaching your employer with a request to make monetary contributions to projects that your income or business relies upon is the right thing to do, and in some cases necessary for the ongoing health of the project.

Identifying maintainers that need help is simple - It's the maintainers that are asking for help ...

Peace out, phomies ...

Thursday, 29 August 2019


Fig 1. Some Bearings
There is a lot of talk about the direction of PHP as a language, there have been discussions on internals and open letters from community bloggers ...

I think, actually, there is rather a lot of confusion in the air, and so I'm seeking to cleanse that air with this post.


A lot of the discussion is based on an assertion made during the P++ discussion that there are two camps of developers, one that prefers the language to retain as much of its dynamic nature as it has today, and another who would like to move away from the historically dynamic nature of PHP towards a more strict language.

This assertion is palpably false, and I have seen almost every internals developer that has been involved in the conversation recently point out this falsehood.

You can tell by looking at the history of RFCs that these factions do not in fact exist, you can tell that in every case, a majority of people voted for features that make PHP more strict (importantly, as an opt-in feature), a majority introduced types, a majority does everything. We move forward together, or not at all.

When you take away the problem of imaginary factions, a lot of the discussions going on seem to have no basis at all, including the P++ discussion itself.


Like many open source projects, PHP internals have a bunch of processes we are supposed to follow for fixing bugs and introducing features into the language.

The rules were written many years ago - arguably for a totally different, pre social coding world - we mostly do a good job of following the rules as they are written.

Recently I pushed through some changes to the rules to make sure that there is always sufficient voting time and majority (2/3) support for a change before it's accepted.

It's important to point out that the rules are not exhaustive, a lot of how we behave is determined by convention. You can argue against this and say that we should try to exhaustively enumerate every possible action a contributor might take and document exactly how they should take it ... it would be a pretty silly argument, but you could make it ...

Recently an RFC was conducted to deprecate and remove short PHP tag syntax, it is unlucky that the contributor that conducted the RFC was new to internals, and that they weren't aware of convention.

Conventionally, when it comes to deprecation and removal of features, we tend to decide if we're going to deprecate something, and removal follows if we accept that the feature needs to be deprecated. Deprecation does not disable a feature, it raises a deprecation notice only.

Unfortunately, without awareness of this convention, the conductor of the RFC created a vote that asked separately if we should deprecate and disable short tags in 7.4 (to which the answer was yes), and if we should remove short tags in a 8.0 (to which the answer was yes).

This result is in part the basis for the faction claim: It looks like there are a bunch of us that would prefer to just remove cruft, and balls to the developer, and a bunch of us that want to take a more gentle path and deprecate things before removal, or not deprecate them at all.

In actual fact, everyone was likely confused about the questions being asked, and the vote should have been stopped before it reached completion the first time round.

What we have here is a failing of our processes and nothing more. I and likely others are considering how we might avoid this very same failure in the future. It seems desirable at this time to introduce a formal deprecation policy, this both achieves the goal of avoiding this very same failure, and can potentially increase confidence when it comes to adopting new versions of PHP.


You can't stand on the ocean shore and grab a cup full of water and conclude that there is nothing more to find in the ocean than there is in the cup.

You can't look at the last two or three discussions and make any conclusions based upon them, and formulating plans based on those conclusions makes absolutely no sense whatsoever.

As already mentioned, you can look at the broad history of PHP as recorded by RFC's and make determinations about where PHP might head in the future, but taking a small sample can't work.


Now, I want to reply to some specific points raised in the open letter to internals ...

Internals have been discussing voting mechanics and what to do with controversial RFCs for months now.
Shouldn't we start looking at how other communities do this? For sure PHP can't be the only open source language out there?
First, for the sake of clarity. You must be careful how you determine something to be controversial. Loud, is not the same as controversial. Attracting a lot of attention from two or three contributors does not make a controversial topic, it makes loud contributors.

Whatever, it's a fact that we've been concerned with our processes and have made some changes - It's somewhere between difficult and impossible to call anything with 2/3 majority support garnered over a period of at least two weeks controversial.

The time and effort it takes to change our processes is considerable, and only becomes a priority when it's obvious that our processes are failing, or have the potential to fail and do damage.

PHP isn't special in this regard, while the details vary, all open source projects roughly work the same. They determine a set of rules for them which everyone has to follow which includes a workflow, everyone tries to follow it, and when it's broken it gets fixed.

the same discussions happen over and over again on a weekly or monthly basis without any progress

I'm sure that you have a sample of data that shows you this, or you surely wouldn't have made this claim. But history disagrees with you, almost all of the big features introduced into PHP in recent years have gone through several iterations before being accepted, for scalar types it was 5 iterations I think. With every iteration, with every "no", progress is being made.

When you open an RFC you are essentially asking "Is solution X the best way to approach problem Y?". The answer yes, is just as valid as the answer no. It just so happens that the RFC should come prepared with solution X, so the answer yes seems like the better and it certainly is the faster route.

people are personally attacking others regularly
I dispute the use of the word regularly, but it occurs to me that my ability to dispute it may be down to my use of spam filters, and my method of subscription.

It's a matter of fact that some people can't seem to behave themselves on the internet, while I'm sure (read: must believe) they are reasonable people in real life. These people make themselves obvious very quickly and prove they have nothing much to say. Like any form of communication you are free to block that which you do not want to hear, you should not be afraid to do that if it increases your own happiness.

It's common now, even among seasoned contributors, to use the very excellent interface to consume internals mailing lists, and I'm not sure if this has a block/silence feature, maybe the owners could be approached about that ...

an insignificant RFC takes months of discussion and requires a re-vote after being accepted

I think I've covered this under Failings ...

 there aren't any good ways to share constructive feedback apart from the big mailing list
 I can't argue that mailing lists are a good way to communicate, but it's what we have.

However, it's not all we have:

  • You can email any contributor you like
  • You can get involved in the conversation on github pull requests
  • You can go to #php.pecl on IRC
  • You can scream out of your office window - may not work as form of communication, but may relieve stress.
Genuinely constructive feedback will be well received no matter the method of delivery, and no matter how much noise the form of communication seems to carry with its signal.

There are several windows on PHP development, and internals is only one ... There is nothing wrong with approaching a smaller group of developers by email, or in IRC, or on Stackoverflow, so that you can further formulate your feedback with the input of other internals developers before sending to internals for mass consumption. Of course it's not obvious, but many of us operate like this. It's perfectly healthy, there are discussions going on all the time, everywhere internals developers gather.

the group of voters doesn't seem to be an accurate representation the actual PHP community

This is regularly mentioned, and I think I'll be the first one to point out that to the extent which that is true, it is the communities fault.

It takes hardly any effort to earn yourself a voice as a voting internals contributor, you don't need to know C, you don't need to spend hours and hours every day working for free, you have to expend hardly any effort.

You surely want voters to have earned a vote, to somehow have a vested interest in the future of PHP. They are the people that have votes, I fail to see the problem.

If you're going to make the argument that not everyone has time to find a way to contribute to PHP, but they still deserve a voice, then you have to consider that if they have no time to contribute to earn the voice, they have no time to contribute to exercise it responsibly.

Voting takes more time than clicking yes or no, participating in one vote, any vote, is going to take more time than it takes to earn the voice.

Having said all that, there are still provisions to give voting powers to representatives from the PHP community (lead framework devs and such) who otherwise do not contribute to internals, and some such voters exist. If you want to say there are not enough such representatives, then I want to ask what more can we do than make the provision ?

If there are representatives in the community who want to engage in internals but don't have a vote, then they need to make this known ... we can only make the provision.

Peace out, phomies :)

Tuesday, 20 August 2019


Regarding contribution to open source projects

The following scenario has been played out countless times, by a large number of [first time] contributors, to a large number of OSS projects:

  • Find OSS project we love
  • Attempt to navigate [source of] project
  • Find this confusing
  • Decide that there needs to be some "cleanup"
  • Start making pull requests to cleanup [the source code]
  • Pull requests are ignored, refused, or argued against
  • Dismay

Most of the time, we quickly get a feel for the kind of pull requests that will be accepted, everybody moves on ...

But I wonder if they move on before putting any effort into explaining why this type of contribution is not always met with jumps for joy ...

I wonder if we can't say some things about all open source projects, that might guide new contributors to making the kind of contributions that have a good chance of being accepted. Perhaps, more importantly, making contributions in such a way that the (new) contributor becomes an asset for the project.


Communicating with a group of people at once, often about complicated technical matters, is not an easy thing, no matter the calibre of the programmer.

The following sections will seek to provide guidance regarding communications with existing contributing members of a project.

Signal to Noise

Contributing members of a project generally have a steady signal to noise ratio, that's to say, on a yearly (or quarterly, whatever) basis, they make about the same amount of noise, as they do contribute to progress. The ratio varies, depending on the contributor, but steadiness makes for normality, so typically we all get used to even the loudest contributors.

New contributors on the other hand, if nothing else than by virtue of the fact that they have more questions to ask than anybody else, tend to have an unsteady signal to noise ratio, and because the entities you are dealing with are human, this simply annoys them.

The people being annoyed, they very well remember being just as annoying as you are being, so don't take it to heart ...

But maybe there are some things you can do as a new contributor to try and keep a steady ratio ...

Mental Metabolism

New contributors are in learning mode, no matter how familiar they are with the subject matter, language, or patterns used in the project.

In learning mode, we tend to ask as many questions as occur to us, and seek answers as quickly as possible.

This tends to result in noisy communications: A thing to consider is that for every communication you might be sending notification to every subscriber to the medium, be it a github repository, mailing list, forum, or whatever.

Unless you are providing a patch for a life support system and lives are in danger, there is absolutely nothing that can't wait 24 hours.

By waiting a day between your communications, you are affording other contributors time to have their say (and possibly answer questions before you have had the opportunity to ask them). You are also providing yourself much more thinking time, possibly increasing the quality of your communication, and certainly reducing noise.

O(1) Communications

New contributions tend to garner responses or input from several sources, and having waited 24 hours since your last communication, you might have a pile of communications to respond to.

You can further reduce noise levels by responding to everybody at once, don't be afraid to tag people, segment, or otherwise format communications to achieve this.

Once you have the response for the day written out, you might try to compress it by finding overlap, you might even try to preempt what will happen next and try to provide answers to the coming questions in advance.

Source of Truth

Healthy projects tend to develop experts that don't necessarily have any involvement in contributing directly to the project: There are many many more Ruby experts than there are Ruby contributors, mutatis mutandis, the same is true for any project you like. These experts tend to be available outside of official channels, sometimes in groups, sometimes even in realtime.

Let opening an issue, pull request, or otherwise sending "official" communications be a last resort. Realise that the project you are dealing with is never the only source of truth, and that by finding other sources you can further and greatly reduce your noise level.

Choosing Battles

Be ready to gracefully walk away from any contribution or idea you had, be ready before you send your first communication to just be wrong.

Begin wrong is fine, that's how we learn ... It's sometimes hard to see that in the moment though, and find ourselves fighting tooth and nail for what we think is correct.

Realise that your first contributions are not likely to be your most important. You're dealing with humans, and if you make a lot of noise at an early stage, your voice will carry less weight later on.

The Pull Request

Whether communication should precede a pull request is sometimes determined by the rules of the project.

If you are making this decision, then consider how well the project deals with and responds to pull requests. 

If there seems to be evidence that the project struggles to deal with pull requests and if there is a channel for communications other than the channel opened with the pull request, consider communicating before opening a pull request.

Cost and Value

A pull request must result in a net gain for the project.

There are so many ways to achieve a gain that it would be pointless to try to enumerate them. The most common path to a loss is as a result of miscalculation of cost.

As the author of the pull request, you might want to say that the cost associated is whatever it cost you to write the pull request.

But that's not the only cost associated, here's a better starting place for a list of costs:

  • The time it took you to write the pull request
  • The time it's going to take someone, or some people to review the pull request
  • The time it's going to take someone to merge the pull request
  • The time it's going to take someone, or some people to document your changes
  • The time it's going to cost anyone familiar with whatever you are changing

All of this, before the correctness of the pull request even matters. 

Of course, correctness does matter, as well as the content of the pull request ... In some cases, a pull request will have an ongoing, or ongoing costs associated with it. 

Generally, if that ongoing cost is going to be incurred by humans we call it technical debt.

Generally, if that ongoing cost is going to be incurred by the machines executing the code, we refer to it as implementation complexity, or some other fancy term.

These costs hard to quantify, and there is likely disparity between your method of calculating these costs and the methods of existing contributors.

The value your pull request provides must exceed the sum of these costs (as calculated by existing contributors) to be worthwhile, and you should be utterly confident that it does that before you open the pull request.

Final Words

Don't let anything, including the words herein, deter you from your goal of becoming a contributor. 

Achieving your goal will be your reward ...

Sunday, 28 July 2019

Nailed Lids

Fig 1. A Black Box

When we are developing, we go to great lengths to take measurements and gather insights with every kind of testing under the sun, coverage, reviews and so on. These measurements, insights, and processes give us the confidence to take what we made to production.

Production ... a kind of black box, with the lid nailed shut.

You can take a crow bar and pry open one corner of the lid, and using the torch on your phone illuminate one corner of the box. Common crow bars are the Xdebug profiler and XHprof ... These are all great tools, but they are still only illuminating a tiny part of the box, one process at a time.

A technology gaining popularity in recent years is Application Performance Monitoring. APM solutions, typically by taking a large hammer to the box, can provide valuable insight. The large hammer leaves its mark though.

Note: I can only see the source code of open source APM agents.

All of these solutions have one very major drawback: They undertake their work in the same thread as is supposed to be executing PHP - it doesn't matter how well written the code is, or how low the vendor claims the overhead is. It is a mathematical certainty that using these solutions will have a detrimental impact on the performance of the code they are meant to be profiling and or providing insights about.

Clever APM agents will send data in parallel, nevertheless, the majority of their work is still done in the PHP thread.

All of these solutions do one or both of:

  • override the executor function
  • overwrite functions in function tables

Without going into too much detail: Ordinarily the vm is stackless, that means when the executor function (the vm) is entered and a function is called in the code being executed, the executor is not re-entered. Setting the executor function breaks this behaviour, meaning that recursion can lead to stack overflow.

Overwriting functions used to be the bread and butter of hacky extensions like runkit and uopz, and it used to be simple. With the advent of immutable classes, it's not so simple anymore - the function table of the class you are editing, and functions therein, may reside in opcache's SHM. Changing that memory is considered illegal. The VM aggressively uses caching to avoid hashtable lookups, changing a function that exists in the run time cache of another function will lead to faults if that cache entry is subsequently referenced.

A quick word on XHprof and (some) derivatives ... these use the RDTSC instruction as a timing source, have a quick read of the Wikipedia, this hasn't been a good idea in a very long time. They do indeed set affinity to maximize reliability, nevertheless the fragility of using this is unquestionable, and more modern portable API's exist ... nevertheless, it works, and I don't hear everyone being confused that their profiles don't make sense, so more of a technical gripe than anything.

Note: Tideways no longer uses RDTSC, but does use the modern equivalent.

Of course, you can find safe ways to overwrite functions, and maybe a recursive executor is not so terrible for you ...

Conventional wisdom is that if you want to trace or otherwise observe the runtime of PHP, you have to use the hooks that Zend provides and your knowledge of how the Zend layer works. As a result many extensions do these things, or otherwise have a similar detrimental impact on the performance of code. But, they are generally aimed at development time, not production. Doing these things in one process so that Xdebug can debug (or profile) it, pcov can provide coverage for it, or uopz can let your 100 year old tests run is not so bad, a reasonable price to pay for the value being extracted.

Doing these things to a few processes at a time in production, such that APM solutions have enough of a stream of data to provide valuable insights, might also be reasonable. Similarly an APM agent may be extremely lightweight and perform something more akin to the function of a request logger than that of a profiler, limiting their ability to provide insight but making them suitable for production.


First some words about the differences between our development and production environments ...

Our development and staging environments may well operate at capacity, they may well have no spare cores, and no spare cycles - they have every core pinned at 100% usage or close and no capacity to create more processes.

Our production environments must by definition have the ability to deal with production demand. While every core that is running a PHP process might be pinned at 100% or close, we have spare cores and or idle processes.

Getting the lid off the box ...

Stat is a super modern, high performance provider of profile information for production. It uses parallel uio and an atomic ring buffer to make profile data for a set of PHP processes available in realtime over a unix or TCP socket.

Stat does all its work in parallel to PHP, which overcomes the first major drawback of any existing solution. It has no need to set an executor, or otherwise interfere with the runtime of PHP.

Stat is a work in progress, and it may be a month or more before the first release happens, however, if anyone wants to get started on working on any user interfaces (which I will not be writing), I'd be happy to start collaborating on that immediately.

You can find a bit more information in the readme.

That's all I have to say about that right now ...

Wednesday, 17 July 2019

Trimming the Phat

Fig 1. A very fancy Tomb

We all think we know how dead code elimination works, we can just reference code coverage, or run static analysis, or rely on our own internal model of the code, which is always absolutely perfect ...

Dead can mean multiple things when we're talking about code, at least:
  • Compiler context - dead code is "unreachable", it can never be executed
  • Execution context - dead code is "unused", it has not been, or is not going to be called
The distinction between compile and execute in PHP is somewhat blurred, as a result, some dead code detection that should be part of the compiler have traditionally been part of code coverage. In the latest versions of PHP, opcache eliminates unreachable code.

Static analysis and coverage reports can tell you about dead code in those narrow scopes defined, but there is another sense in which code might be considered dead:
  • Code that is in fact unused in production
My manager recently asked me to come up with something so that we can detect dead code in this production sense.

I'm quite adept at bending PHP to my will, however, this task presents some not insignificant challenges. Normally, when we want to abuse PHP in some strange way, we're doing so in the name of testing. 

Testing is a nice quiet place, where there's only one process to care about, not much can go wrong. If you are careful, you can write some really nice tooling, the overhead is very acceptable, and people rave about it on twitter (pcov).

Production on the other hand is a scary place, where mistakes may cost a lot of money, where there are in the order of hundreds of processes to care about: Extracting statistical information from hundreds of processes without adversely affecting their performance is not a simple task.


Tombs is my solution to the problem of detecting code that is unused in production. Code that even though may be reported as reachable, covered, or used, is in fact never called in production.

There's something quite pleasing about the requirements for a task translating almost perfectly into the solution. The requirements for Tombs were:
  • Must not use more CPU time in PHP processes than is absolutely necessary (i.e. be production ready)
  • Must report statistics centrally for every process in a pool
The first requirement, aside from the obvious, means that Tombs needs to have an API that faces the system rather than user land PHP, we can't inject code into production so the processes that gather statistics must be separate and might be on different machines entirely.

The second requirement means that Tombs needs to use shared mapped memory, like O+ or APC(u).

O+ and APC(u) both achieve safety in their use of shared memory by multiple processes using mutual exclusion - implemented either as file locks, pthread mutex, or the windows equivalent - this makes perfect sense for them. It means that even though many processes may compile the about to be cached file, or execute the function that returns the about to be cached variable, only one process can insert the file or variable into shared memory.

Reporting live statistics about a system is similar to trying to count the number of birds in flight over the earth - it will change while reporting. In this environment, mutex makes very little sense, what we need here is a lock free implementation of the structure that stores information we need to share. We need to know that no matter how large the set of data being returned, we don't have to exclude other processes from continuing to manipulate that data concurrently.

Using Tombs

Simply load Tombs in a production environment and without modifying any code, allow normal execution to take place over the course of hours, days, or weeks. 

Now, when you open the Tombs socket the data returned represents the functions and methods that have not been executed by any process in the pool since Tombs was started.

Using this data, you can now make decisions about the removal or refactoring of code to reduce or hopefully eliminate dead code.

If you use Tombs, reach out to me and tell me how it worked out for you ....

Saturday, 30 March 2019


Fig 1. A chap performing the Detroit JIT
Unless you have been living under a rock, or are from the past (in which case, welcome), you will be aware that a JIT is coming to PHP 8: The vote ended, quietly, today, with a vast majority in favour of merging into PHP 8, so, it's official.

Throw some crazy shapes in celebration, suggestion given in Fig 1, and it's even called "The (Detroit) JIT" ...

Now sit down and read the following myth busting article, we're going to clear up some confusion around what the JIT is, what it will benefit, and delve into how it works (but only a little, because I don't want you to be bored).

Since I don't know who I'm talking to, I'm going to start at the beginning with the simple questions and work up to the complex ones, if you already are sure you know the answer to the question in a heading, you can skip that part ...

What is JIT ?

PHP implements a virtual machine, a kind of virtual processor - we call it Zend VM. PHP compiles your human readable script into instructions that the virtual machine understands (we call them opcodes), this stage of execution is what we refer to as "Compile Time". At the "Runtime" stage of execution the virtual machine (Zend VM) executes your code's instructions (opcodes).

This all works very well, and tools like APC (in the past) and OPCache (today) cache your code's instructions (opcodes) so that "Compile Time" only happens when it must.

First, one line to explain what JIT is in general: Just-in-time is a compiler strategy that takes an intermediate representation of code and turns it into architecture dependent machine code at runtime - just-in-time for execution.

In PHP, this means that the JIT treats the instructions generated for the Zend VM as the intermediate representation and emits architecture dependent machine code, so that the host of your code is no longer the Zend VM, but your CPU directly.

Why does PHP need a JIT ?

The focus of the PHP internals community since slightly before PHP 7.0 has been performance, brought about by healthy competition from Facebook's HHVM project. The majority of the core changes in PHP 7.0 were contained in the PHPNG patch, which improved significantly the way in which PHP utilizes memory and CPU at its core, since then every one of us has been forced to keep one eye on performance.

Since PHP 7.0 some performance improvements have been made, optimizations for the HashTable (a core data structure for PHP), specializations in the Zend VM for certain opcodes, specializations in the compiler for certain sequences, and a constant stream of improvements to the Optimizer component of OPCache ... and many others besides, too boring to list.

It's a brute fact that these optimizations can only take us so far and we are rapidly approaching, or maybe have already met, a brick wall in our ability to improve it any further.

Caveat: When we say things like "we can't improve it any further", what we really mean is, "the trade-offs we would have to make to improve it any further no longer look appealing" ... whenever we talk about performance optimizations, we're talking about trade-offs. Often, trade-offs in simplicity for performance. We would all like to think that the simplest code is the fastest code, but that simply is not the case in the modern world of C programming. The fastest code is often that code which is prepared to take advantage of architecture dependent intrinsics or platform (compiler) dependent builtins. Simplicity just is not a guarantee of the best performance ...

At this time, the ability for PHP to JIT would appear to be the best way to squeeze more performance from PHP.

Will the JIT make my website faster ?

In all probability, not significantly.

Maybe not the answer you were expecting: In the general case, applications written in PHP are I/O bound, and JIT works best on CPU bound code.

What on earth does "I/O and CPU bound" mean ?

When we want to describe the general performance characteristics of a piece of code, or an application, we use the terms I/O bound and CPU bound.

In the simplest possible terms:
  • An I/O bound piece of code would go faster if we could improve (reduce, optimize) the I/O it is doing.
  • A CPU bound piece of code would go faster if we could improve (reduce, optimize) the instructions the CPU is executing - or (magically) increase the clock speed of the CPU :)
A piece of code, or an application, may be I/O bound, CPU bound, or bound equally to CPU and I/O.

In general, PHP applications tend to be I/O bound - the thing that is slowing them down is the I/O which they are performing - connecting, reading, and writing to databases, caches, files, sockets and so on.

What does CPU bound PHP look like ?

CPU bound code is not something a lot of PHP programmers will be familiar with, because of the nature of most PHP applications - their job tends to be connect to some database, and or possibly a cache, do some light lifting and spit out an html/json/xml response.

You may look around your codebase and find lots of code that has nothing whatever to do with I/O, code that is calling functions completely disconnected from I/O even, and be confused that I seem to be implying that this doesn't make your application CPU bound, even though there may be many more lines of code that deal with non I/O than I/O.

PHP is actually quite fast, it's one of the fastest interpreted languages in the world. There is no remarkable difference between the Zend VM calling a function that has nothing to do with I/O, and making the same call in machine code. There is clearly a difference, but the fact is that machine code has a calling convention, and Zend VM has a calling convention, machine code has a prologue and Zend VM has a prologue: Whether you call some_c_level_function() in Zend Opcodes or machine code doesn't make a significant difference to the performance of the application making the call - although it may seem to make a significant difference to that call.

Note: A calling convention is (roughly) a sequence of instructions executed *before* entering into another function, a prologue is a sequence of instructions executed *at entry* into another function: The calling convention in both cases pushes arguments onto the stack, and the prologue pops them off the stack.

What about loops, and tail calls and X I hear you ask: PHP is actually quite smart, with the Optimizer component of OPCache enabled your code is transformed as if by magic into the most efficient form you could have written.

It's important to note now that JIT doesn't change the calling convention of Zend Functions from the convention established by the VM - Zend must be able to switch between JIT and VM modes at any time and so the decision was taken to retain the calling convention established by the VM. As a result those calls that you see everywhere aren't remarkably faster when JIT'd.

If you want to see what CPU bound PHP code looks like, look in Zend/bench.php ... This is obviously an extreme example of CPU bound code, but it should drive home the point that where the JIT really shines is in the area of mathematics.

Did PHP make the ultimate trade-off to make math faster ?

No. We did it to widen the scope of PHP, and considerably so. 

Without wanting to toot our own horn, we have the web covered - If you are a web programmer in 2019 and you haven't considered using PHP for your next project, then you are doing the web wrong - in this very biased PHP developer's opinion.

To improve the ability to execute math faster in PHP seems, at a glance, to be a very narrow scope. 

However, this in fact opens the door on things such as machine learning, 3d rendering, 2d (gui) rendering, and data analysis, to name just a few.

Why can't we have this in PHP 7.4 ?

I just called the JIT "the ultimate trade-off", and I think it is: It's arguably one of the most complex compiler strategies ever invented, maybe the most complex. To introduce a JIT is to introduce considerable complexity.

If you ask Dmitry (the author of the JIT) if he made PHP complex, he would say "No, I hate complexity" (that's a direct quote).

At bottom, complex is anything we do not understand, and at the moment, there are very few internals developers (less than a handful) that truly understand the implementation of JIT that we have.

PHP 7.4 is coming up fast, merging into PHP 7.4 would leave us with a version of PHP that less than a handful of people could debug, fix, or improve (in any real sense). This is just not an acceptable situation for those people that voted no on merging into PHP 7.4.

In the time between now and PHP 8, many of us will be working in our spare time to understand the JIT: We still have features we want to implement and tools we need to rewrite for PHP 8, and first we must understand the JIT. We need this time, and are very grateful that a majority of voters saw fit to give it to us.

Complex is not synonymous with horrible: Complex can be beautiful, like a nebula, and the JIT is that kind of complex. You can, in principle, fully understand something complex and only make a marginal reduction in the apparent complexity of that thing. In other words, even when there are 20 internals developers who are as familiar with the JIT as Dmitry is, it doesn't really change the complex nature of JIT.

Will development of PHP slow down ?

There's no reason to think that it will. We have enough time that we can say with confidence that by the time PHP 8 is generally available, there will be enough of us familiar with the JIT to function at least as well as we do today when it comes to fixing bugs and pushing PHP forward.

When trying to square this with the view that a JIT is inherently complex, consider that the majority of our time spent on introducing new features is actually spent discussing that feature. For the majority of features, and even fixes, code may take in the order of minutes or hours to write, and discussions take in the order of weeks or months. In rare cases, the code for a feature may take in the order of hours or days to write, but the discussion will always take longer in those rare cases.

That's all I have to say about that ... 

Enjoy your weekend.