Tuesday, 18 November 2014

Strictly Research Released

Fig 1. Strict Easter Egg
After much discussion, the research conducted on introducing strict scalar parameter type hints without changing Zend has been bundled into a pecl package.

To install the strict extension for PHP5.4+ execute the following:

pecl install strict-beta

Then append the following to php.ini, or add it to a file in the config file scan directory (ie. modules.d php.d):

extension=strict.so

Windows users will soon be able to download pre-built binaries from the usual places.

The limitation

The only limitation of the extension is that scalars cannot have default values, this is because the parser treats scalar type hints as class hints. The only default value a variable with a class hint is allowed is null.

The suggestion has been made that we could support null as a special case for scalars, only accepting null as a default value, but normally raising an error when a null is passed for any parameter that does not have a default value.

I am still thinking about what to do here.

Casting

Now that strict scalars are supported, the programmer must cast parameters to the correct type.

PHP's casting semantics are ... relaxed. They never fail, and never raise an error; this can lead to some strange and unwanted behaviour.

There is an RFC in progress to address this issue in PHP7, I'm watching that RFC to see what happens.

If that RFC passes then the functions will be imported into the strict extension for the 5 series.

If that RFC fails then I will introduce a strict_cast function in the extension.

Another option is to override the built in casting mechanism with lossless-or-error casting. I intend to research the impact this has.

It is obviously going to be faster if casting is performed by operators rather than function calls, however, it might have unwanted side effects.

Return Types

There is an RFC in progress for 7 that introduces return types, when this RFC is passed I will introduce support for scalar return types in the strict extension.

Wednesday, 12 November 2014

Strictly Research Continued

Fig 1. Strict Easter Egg
This week I have been experimenting with adding strict scalar type hints to PHP7 via an extension.

Scalar type hints are something that some people want, and they have good reason to want them, but I think the majority can live without them.

Internals don't seem to be able to agree on the best way to implement scalar type hints, there have been multiple RFC discussions and implementations.

The first try at an extension used auto-boxing and casting magic, and was rather slow.

The latest revision doesn't use any auto-boxing, casting, or exceptions.

This means that the following code:
function test(integer $int, 
              double $dbl, 
              boolean $bool, 
              string $str, 
              resource $fp) {

    var_dump($int, $dbl, $bool, $str, $fp);
}

test(1, 2.2, true, "four", STDIN);

Will work as expected, without the need to cast parameters to their scalar types in the function body.

Zend emits a recoverable error when a type mismatch occurs, using exceptions was therefore dropped in favour of being compatible with Zend, and expectations.

This is a much faster and much simpler implementation, and should be forward compatible with Zend getting strict scalar type hints at some time in the (distant) future.

The strict extension could also define a strict_cast utility function for lossless-or-error casting, we'll see what happens next.

The extension is available here.

Monday, 10 November 2014

Strictly Research

Fig 1. Strict Easter Egg
Recently this screen shot was posted on twitter, there was a thread on reddit, and there was much nattering.

While I value very highly the fact that PHP, at it's heart, is a dynamic language, why shouldn't we be able to use strict ?

If a programmer wants to use strict type hints, then it doesn't need to effect all those people not using strict type hints.

Strict parameter type hints in PHP7 ?

So, I decided to put some effort into writing an extension for PHP7 that will introduce the ability to perform strict type hinting without changing the engine.

The extension uses autoboxing and Zend magic to coerce hinted scalars to complex types, providing mechanics to cast back to scalar for Zend.

This means that the following code works exactly as you expect it too:
use strict\Integer;

function add(integer $one, integer $two) {
    return $one + $two;
}

var_dump(add(10, 20));

While this code:
use strict\Integer;

function add(integer $one, integer $two) {
    return $one + $two;
}

var_dump(add("10", "20"));

Will produce something like the following error:
Fatal error: Uncaught exception 'strict\TypeException' with 
   message 'illegal implicit cast to integer from string' in file:4
Stack trace:
#0 file(4): strict\Integer->__construct('10')
#1 file(8): add('10', '20')
#2 {main}
  thrown in file on line 4

Finally, the following code:
use strict\Integer;

function add(integer $one, integer $two) {
    return (double) $one + (double) $two;
}

var_dump(add(10, 20));

Will also fail, with something like the following error:
Fatal error: Uncaught exception 'strict\TypeException' with 
   message 'illegal cast to double from integer' in file:5
Stack trace:
#0 file(8): add(Object(strict\Integer), Object(strict\Integer))
#1 {main}
  thrown in file on line 5

We can see from the examples above that casting rules are strict in the true sense of the word, any cast to another type, implicit or explicit will fail.

Strict parameter type hinting in other languages allows you to document and write very precise and easy to understand API's.

We certainly shouldn't change the nature of PHP, but we should bring the benefits of strict types into PHP if we can, and it seems like we can.

Your very own boxes !

The extension comes with autoboxes for scalar types, however should the programmer want some particular functionality on the box, such as a fancy string API, the programmer can declare their own String class.

The following code demonstrates designing your own boxes:
use strict\Autobox;

class String extends Autobox {

    public function __construct($value) {
        $this->setValue(
            Autobox::string, $value);
    }
    
    public function reverse() {
        return new String(strrev($this->getValue()));
    }
}

function reverse(string $str) {
    return $str->reverse();
}

var_dump((string) reverse("7PHP"));

The output will be string(4) "PHP7"

API

The Autobox class has the following API:
class Autobox {
    final protected function setValue(integer $type, mixed $value);
    final public function getValue();
    final public function getType();
}

The method Autobox::setValue should only be called by constructors (although this rule is not enforced), and can only be called once.

Try it

It is time to be brave, and try it for yourself: http://github.com/krakjoe/strict

Disclaimer: I don't know if this research is going anywhere, was fun though :)

Saturday, 8 November 2014

Future Notes

Fig 1. Go PHP "7"
Now that the master branch of PHP (PHP7) is stabilizing, it is time for those of us who maintain extensions to think about beginning the task of upgrading the source code to work with new API's and conventions in PHP7.

Without going into too much detail about why PHP7 is the way it is, I'm going to go through the major differences extension maintainers or authors need to be aware of.

The Zend Object

Those of us that don't hate our users provide Object Orientated API's, you might say that from the offset PHP7 is different.

When an extension registers a class entry in the MINIT routine, it sets a handler named create_object.

A typical create_object handler for a 5 series extension looks something like:
zend_object_value my_create_object(zend_class_entry *ce TSRMLS_DC) {
    zend_object_value retval;
    MY *object = (MY*) emalloc(sizeof(*object));

    zend_object_std_init(&object->std TSRMLS_CC);
    /* ... */

    retval.handlers = &my_handlers;
    retval.handle   = zend_objects_store_put(
        object,
        my_dtor, my_free, my_clone TSRMLS_CC);

    return retval;
}

We can see that the handler is expected to return a zend_object_value, and is expected to store the object in the object store itself.

While a typical create_object handler for 7 series looks like:
zend_object* my_create_object(zend_class_entry *ce TSRMLS_DC) {
    MY *object = (MY*) emalloc(sizeof(*object));

    zend_object_std_init(&object->std TSRMLS_CC);
    /* ... */

    object->std.handlers = &my_handlers;

    return &object->std;
}

The subtle differences are because a zval in PHP7 has zend_object* in the value union, and zend_object_std_init calls zend_objects_store_put.

In addition, the destroy, free and clone handlers are now part of the object handlers struct (my_handlers), rather than set by the call to zend_objects_store_put.

To fetch the allocated object in PHP5 series, a call to zend_object_store_get_object was required, since the zend_object* is now part of the value union, this call is eliminated in PHP7 series.

Objects stored in the object store in the PHP5 series require that the first member of the struct was a zend_object, for example:
typedef struct _my {
    zend_object std;
    int  my_integer;
    /* other members here */
} MY;

Objects in PHP7 do not have the same limitation. This means that fetching the object allocated by a create_object handler in PHP7, given a zval, can be performed as follows:
PHP_METHOD(My, method) {
    MY *object;

    if (zend_parse_parameters_none() != SUCCESS) {
        return;
    }
    
    object = (MY*) ((char*)Z_OBJ_P(getThis()) - XtOffsetOf(MY, std));    
}

Even simpler, if one sticks to the established convention of having zend_object be the first member in the structure, the object allocated by create_object can accessed as follows:
PHP_METHOD(My, method) {
    MY *object;

    if (zend_parse_parameters_none() != SUCCESS) {
        return;
    }
    
    object = (MY*) Z_OBJ_P(getThis());  
}

Every little counts, the fact that objects can accessed with pointer arithmetic is super cool, and saves many calls to zend_objects_store_get_object for even the simplest extension.

The only other thing to remember is Zend needs to know the offset of the zend_object in your objects structure, if zend_object is not the first member the offset should be stored in the objects handlers at the field named offset. This is typically done during MINIT when handler structures are first created by the extension.

Levels of Indirection

While we are C programmers, and so have intricate knowledge of pointers with triple indirection, indirection has an undeniable cognitive overhead. Many levels of indirection makes it harder to read and debug code, and in my opinion one of the best improvements in PHP7 is to drop the convention that it's okay to work with pointers with many levels of indirection.

This effects everything, from the fact that Z_*_PP macros no longer exist, to the HashTable and other Zend API's having significant changes.

HashTable and Strings

Hash tables are a staple of any extension, and Zend. The API has always felt like it was a compromise, and I'm pleased to observe that PHP7 finally has a nice HashTable API.

The first obvious change is where API functions used to take a char * and an int to represent a string and it's length respectively, PHP7 makes use of the zend_string structure for keys.

The zend_string structure in PHP7 can be refcounted and have hashes pre-calculated, rather cool.

If we look at PHP5 code that performs the familiar operation of fetching from a HashTable:

PHP_METHOD(My, method) {
    char *str;
    int   str_len;
    zval  **value;
    MY    *object = (MY*) zend_objects_store_get_object(getThis() TSRMLS_CC);

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &str_len) != SUCCESS) {
        return;
    }

    if (zend_hash_find(object->table, str, str_len, (void**) &value) != SUCCESS) {
        /* handle failure */
        return;
    }
}

In constrast, the PHP7 code is much less stupid:
PHP_METHOD(My, method) {
    zend_string *str;
    zval        *value;
    MY          *object = (MY*) ((char*)Z_OBJ_P(getThis()) - XtOffsetOf(MY, std));

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "S", &str) != SUCCESS) {
        return;
    }

    value = zend_hash_find(object->table, str);
}

Note that zend_parse_parameters has a new type specifier for a zend_string*.

Sometimes a zend_string will not be available, and it may be inefficient to create one for a lookup or some other HashTable operation, for these cases the HashTable API has a set of functions with _str_ in their name, for example, zend_hash_str_find, which still accept a char* and a size_t.

Where to start ?

I have provided a brief explanantion of the main differences effecting extension maintainers in PHP7, some people might be able to get started with just this information.

I don't know how anyone else learned how to program for Zend, but personally, I read code.

Now is a good time to dig around in some of the headers so you can see in detail what has changed: