Sunday, 28 July 2019

Nailed Lids

Fig 1. A Black Box

When we are developing, we go to great lengths to take measurements and gather insights with every kind of testing under the sun, coverage, reviews and so on. These measurements, insights, and processes give us the confidence to take what we made to production.

Production ... a kind of black box, with the lid nailed shut.

You can take a crow bar and pry open one corner of the lid, and using the torch on your phone illuminate one corner of the box. Common crow bars are the Xdebug profiler and XHprof ... These are all great tools, but they are still only illuminating a tiny part of the box, one process at a time.

A technology gaining popularity in recent years is Application Performance Monitoring. APM solutions, typically by taking a large hammer to the box, can provide valuable insight. The large hammer leaves its mark though.

Note: I can only see the source code of open source APM agents.

All of these solutions have one very major drawback: They undertake their work in the same thread as is supposed to be executing PHP - it doesn't matter how well written the code is, or how low the vendor claims the overhead is. It is a mathematical certainty that using these solutions will have a detrimental impact on the performance of the code they are meant to be profiling and or providing insights about.

Clever APM agents will send data in parallel, nevertheless, the majority of their work is still done in the PHP thread.

All of these solutions do one or both of:

  • override the executor function
  • overwrite functions in function tables

Without going into too much detail: Ordinarily the vm is stackless, that means when the executor function (the vm) is entered and a function is called in the code being executed, the executor is not re-entered. Setting the executor function breaks this behaviour, meaning that recursion can lead to stack overflow.

Overwriting functions used to be the bread and butter of hacky extensions like runkit and uopz, and it used to be simple. With the advent of immutable classes, it's not so simple anymore - the function table of the class you are editing, and functions therein, may reside in opcache's SHM. Changing that memory is considered illegal. The VM aggressively uses caching to avoid hashtable lookups, changing a function that exists in the run time cache of another function will lead to faults if that cache entry is subsequently referenced.

A quick word on XHprof and (some) derivatives ... these use the RDTSC instruction as a timing source, have a quick read of the Wikipedia, this hasn't been a good idea in a very long time. They do indeed set affinity to maximize reliability, nevertheless the fragility of using this is unquestionable, and more modern portable API's exist ... nevertheless, it works, and I don't hear everyone being confused that their profiles don't make sense, so more of a technical gripe than anything.

Note: Tideways no longer uses RDTSC, but does use the modern equivalent.

Of course, you can find safe ways to overwrite functions, and maybe a recursive executor is not so terrible for you ...

Conventional wisdom is that if you want to trace or otherwise observe the runtime of PHP, you have to use the hooks that Zend provides and your knowledge of how the Zend layer works. As a result many extensions do these things, or otherwise have a similar detrimental impact on the performance of code. But, they are generally aimed at development time, not production. Doing these things in one process so that Xdebug can debug (or profile) it, pcov can provide coverage for it, or uopz can let your 100 year old tests run is not so bad, a reasonable price to pay for the value being extracted.

Doing these things to a few processes at a time in production, such that APM solutions have enough of a stream of data to provide valuable insights, might also be reasonable. Similarly an APM agent may be extremely lightweight and perform something more akin to the function of a request logger than that of a profiler, limiting their ability to provide insight but making them suitable for production.

Preface

First some words about the differences between our development and production environments ...

Our development and staging environments may well operate at capacity, they may well have no spare cores, and no spare cycles - they have every core pinned at 100% usage or close and no capacity to create more processes.

Our production environments must by definition have the ability to deal with production demand. While every core that is running a PHP process might be pinned at 100% or close, we have spare cores and or idle processes.

Getting the lid off the box ...

Stat is a super modern, high performance provider of profile information for production. It uses parallel uio and an atomic ring buffer to make profile data for a set of PHP processes available in realtime over a unix or TCP socket.

Stat does all its work in parallel to PHP, which overcomes the first major drawback of any existing solution. It has no need to set an executor, or otherwise interfere with the runtime of PHP.

Stat is a work in progress, and it may be a month or more before the first release happens, however, if anyone wants to get started on working on any user interfaces (which I will not be writing), I'd be happy to start collaborating on that immediately.

You can find a bit more information in the readme.

That's all I have to say about that right now ...

Wednesday, 17 July 2019

Trimming the Phat

Fig 1. A very fancy Tomb

We all think we know how dead code elimination works, we can just reference code coverage, or run static analysis, or rely on our own internal model of the code, which is always absolutely perfect ...

Dead can mean multiple things when we're talking about code, at least:
  • Compiler context - dead code is "unreachable", it can never be executed
  • Execution context - dead code is "unused", it has not been, or is not going to be called
The distinction between compile and execute in PHP is somewhat blurred, as a result, some dead code detection that should be part of the compiler have traditionally been part of code coverage. In the latest versions of PHP, opcache eliminates unreachable code.

Static analysis and coverage reports can tell you about dead code in those narrow scopes defined, but there is another sense in which code might be considered dead:
  • Code that is in fact unused in production
My manager recently asked me to come up with something so that we can detect dead code in this production sense.

I'm quite adept at bending PHP to my will, however, this task presents some not insignificant challenges. Normally, when we want to abuse PHP in some strange way, we're doing so in the name of testing. 

Testing is a nice quiet place, where there's only one process to care about, not much can go wrong. If you are careful, you can write some really nice tooling, the overhead is very acceptable, and people rave about it on twitter (pcov).

Production on the other hand is a scary place, where mistakes may cost a lot of money, where there are in the order of hundreds of processes to care about: Extracting statistical information from hundreds of processes without adversely affecting their performance is not a simple task.

Tombs

Tombs is my solution to the problem of detecting code that is unused in production. Code that even though may be reported as reachable, covered, or used, is in fact never called in production.

There's something quite pleasing about the requirements for a task translating almost perfectly into the solution. The requirements for Tombs were:
  • Must not use more CPU time in PHP processes than is absolutely necessary (i.e. be production ready)
  • Must report statistics centrally for every process in a pool
The first requirement, aside from the obvious, means that Tombs needs to have an API that faces the system rather than user land PHP, we can't inject code into production so the processes that gather statistics must be separate and might be on different machines entirely.

The second requirement means that Tombs needs to use shared mapped memory, like O+ or APC(u).

O+ and APC(u) both achieve safety in their use of shared memory by multiple processes using mutual exclusion - implemented either as file locks, pthread mutex, or the windows equivalent - this makes perfect sense for them. It means that even though many processes may compile the about to be cached file, or execute the function that returns the about to be cached variable, only one process can insert the file or variable into shared memory.

Reporting live statistics about a system is similar to trying to count the number of birds in flight over the earth - it will change while reporting. In this environment, mutex makes very little sense, what we need here is a lock free implementation of the structure that stores information we need to share. We need to know that no matter how large the set of data being returned, we don't have to exclude other processes from continuing to manipulate that data concurrently.

Using Tombs

Simply load Tombs in a production environment and without modifying any code, allow normal execution to take place over the course of hours, days, or weeks. 

Now, when you open the Tombs socket the data returned represents the functions and methods that have not been executed by any process in the pool since Tombs was started.

Using this data, you can now make decisions about the removal or refactoring of code to reduce or hopefully eliminate dead code.

If you use Tombs, reach out to me and tell me how it worked out for you ....