Tuesday, 15 September 2015

What Polly Really Wants

Fig 1. Definition of polyfill according to Wikipedia
I think that a rather narrow definition, nevertheless it shows the origin of the word to be specifically Javascript and client side development.

Slowly but surely, this has become part of the PHP vernacular, everyone has heard of password_compat. A polyfill for PHP7's random_* API is also available.

I think we know what a polyfill is, or can skim the definition ... because I'm going to keep using that word.

Recently a fellow githubber opened an issue for pthreads, they are writing a static analysis suite for PHP, and are attempting to integrate pthreads into their code. First thing to say is, I don't know where that will lead him. But it does give him the problem that a lot of environments don't have pthreads available, and or they aren't using a thread safe interpreter.

In the issue, he made the suggestion that we have a compatibility layer, a polyfill. I confess, this had never occurred to me before.

Not only does it solve his problem but it actually serves as a useful tool ... I shall explain.

The Unanswerable Question

I'm often asked the question "what is a good use case for threading?". I'll stare open mouthed at the asker, or squint and tilt my head awkwardly in response. In a way it's like being asked the question "what is a good use case for objects?".

I'll go on to try to explain this is some awkward way, never really knowing if I am making sense.

I can't answer that question, it doesn't really make sense. Maybe someone can, but not me.

A question that does make sense is "what kind of code lends itself to threading?". To this we can give a pretty good answer.

First we can make the assumption that your are only considering threading because you have a lot of work to do, if that's not the case, do something else. Assuming there is a lot of work to do, then whether you have the kind of task that could gain anything by using threads depends on the nature of the work. This seems kinda obvious, but if every detail is explained, then more people will understand, I think.

If the kind of work you have to do can be broken down into units of work, each fulfilling the following requirements:

  • does not depend on other units of work
  • does not communicate with other units of work

You can be pretty sure that you have a good candidate.

These basic guidelines are extremely useful. Good multi-threaded code avoids synchronization, because there is too much margin for error and the unexpected, even with the best API in the world. Even with the best API, and the best programmers, synchronization (which must occur for units of work to communicate) diminishes performance, creates contention for locks, and is generally a bad thing.

If you are not dependant on the order of execution, you could (and at some point might have to) write your own scheduler (Pool::submit) to exploit hardware, and your software, to the fullest.

If you know enough to see the cases where those rules can be broken, then you don't need my help and can stop reading, I bid you farewell and good luck.

If you are still reading then I must be making sense, so we will now look at how you can use the polyfill as a tool.

New Rule: Training Shoe Slogans Are Always Wrong!

When it comes to researching multi threaded php, you should not "Just Do it".

In fact, there are very few occasions where such an attitude would be advisable, so, new rule.

If you think you have the kind of code that can take advantage, then you should test the waters. Before you change your PHP installation, or development environments, you should try to write your code using the polyfill.

When you start to win, and you have things the way you want them, you can load pthreads. Save for final optimization for real threads, reducing synchronization, and communication with non Threaded objects, if at this time you do not see a visible benefit, the best advice I can give is go in another direction.

Throwing threads at any task doesn't necessarily make it faster, and if at this early stage in development you are not seeing a visible benefit, then in all likelihood, there isn't any benefit to be had, simple as that.

If you do see a benefit, then you don't need to be worried about deployment problems and dependency on pthreads, if pthreads is available the units of work execute in parallel, if not, they don't. Everything works in a predictable way, everywhere.

Final Thought

My final words should pay homage to the observation that the best ideas aren't always our own, I'm really grateful when fellow programmers try to engage on github, or by any other means. They don't have to do that.

So, thank you ovrweb for taking the time.