Tuesday, 28 July 2015

The Universe is NOT Aware

Fig 1: Part of the universe

I'm very mindful of speaking clearly, when I think people might be listening. It's extremely important to communicate ideas precisely, so that they can be understood with minimal effort on the part of your audience.

In my parallel PHP podcast and talk, and a previous blog post, I gave clear explanations of asynchronous and parallel concurrency.

I'm going to assume you haven't seen any of that, and repeat the explanation, with pretty pictures and some more thoughts ...

Let's imagine we have three distinct tasks to execute, the nature of those tasks is unimportant for the explanation.

Synchronous Execution

Following is a diagram of the synchronous execution of those tasks:

Fig 2: Synchronous Execution

This is the model we are all used too; instructions are executed in a linear fashion, one after the other, task after task.

This is easiest for everyone; It's easiest for the programmer using a language, and easiest for the engineers of the language.

Asynchronous Execution

Following is a diagram of the asynchronous execution of those tasks:

Fig 3: Asynchronous Execution

We can see that the instructions for each task are interleaved with each other, because of this, the tasks can be said to run concurrently with respect to each other.

This is what asynchronous concurrency looks like. 

On the face of it, it takes the same amount of time to execute asynchronous code as it does to execute synchronous code.

Asynchronous concurrency has it's most appropriate use case in I/O bound code: Synchronous code that is I/O bound spends a considerable time waiting for hardware to become available. Using non-blocking API's, and interleaving instructions, means that you can eliminate that waiting: When synchronous blocking code would have waited, asynchronous non-blocking code is ready to execute the instructions for another task. This reduces the overall time it takes to execute your tasks.

This is considerably more messy than synchronous code, from both the perspective of programmer and language engineer; Whether it's worth chopping and dicing your instructions such that they can be interleaved is a question you should always ask.

Parallel Execution

Following is a diagram of the parallel execution of those tasks:

Fig 4: Parallel Execution

We can see that the tasks more closely resemble their synchronous counter parts, but run concurrently with respect to time

The only way to do this is by utilizing more than one thread of execution.

This is what parallel concurrency looks like.

In theory, our code is three times faster than it is when executed synchronously. In practice, given the ability to execute in parallel we invariably want our tasks to communicate with each other. This introduces overhead associated with synchronization (locking), not only costing time, but considerable cognitive overhead for any real world parallel code.

Giving a best use-case is much harder than with asynchronous concurrency, because the applicable domain of parallel concurrency is vast in comparison.

What you can say is that you should aim for that diagram as an ideal, you should design your tasks such that they are, as far as possible, isolated and so incur the minimum possible overhead from synchronization.


Applying Knowledge

Imagine, if you will, a scientist who says that since the universe has awareness within it, the universe itself is aware.

This is a pretty blunt example, I'm sure everyone can see the error in that logic. 

But what if the assertion that the universe is aware was accompanied by "a kind of word salad" which somehow lends superficial plausibility to the idea.

I find it hard to imagine there is anyone who cannot understand the key differences between these models. I'm sure it's crystal clear in the readers mind, the differences and properties of these models are easily understood.

What might not be so easily understood, is how to apply that knowledge to a bigger system, with more complex ideas.

How do we describe the execution of a task which was offloaded to Gearman, or some other process; Can't we describe that as asynchronous concurrency (since the operating system might interleave the tasks) ?

That's confusing the universe (system) with what is going on in the universe (system).

No matter the size nor complexity of the system, the definitions given above stand. 

If you are interleaving instructions you are achieving asynchronous concurrency, if you are executing instructions in parallel, you are achieving parallel concurrency.

It has to be that simple; When we skew the lines, we describe something parallel as asynchronous, or vice versa, it tastes like word salad, and makes it difficult to understand what is really going on.

A long time ago, I used the words parallel and asynchronous interchangeably in some example code I have distributed, and I could whip myself for doing so. I done so at a time when I just wasn't aware of the confusion surrounding these terms, and assumed, wrongly, that I would be understood.

If you have a blog, or a podcast, or someone you are educating, or any reason to talk about this stuff, I implore you to triple check your language and ideas, before communicating them, and I'll be doing the same.

1 comment:

  1. To what extent is your argument suggesting that, in order to fully achieve a parallel concurrency model in a given application, one must design processes that are not interdependent upon one another?

    As a casual read, I get the sense that you are advocating a form of coding mentality more so than a form of coding implementation. Maybe with a more thorough read, I'll get something deeper from it.

    Sage counsel, nonetheless...thanks :-)