gsnedders

And just saying you make me happy seems too much to be socially inapt mind.

PHP Grievances

Tags: June 28, 2009 (2 comments)

Below is a list of things that have annoyed me with PHP while writing various pieces of code, especially SimplePie 2 and the PHP html5lib (this list will be probably be added to over-time):

  • No native Unicode support (how a web facing high-level language can still not support Unicode in 2009 is beyond me: HTML, XML, CSS, and ECMAScript are all layered on top of Unicode, so almost everything going over the web is Unicode). There's no real way to implement Unicode in an interpreted language without taking a large performance hit, and the two main extensions PHP has for dealing with Unicode don't help in a lot of situations (mbstring, for example, can return a string that isn't UTF-8 when you try to convert data to UTF-8; iconv (per spec) fails when it hits an invalid byte, which doesn't work for a lot of web stuff).
  • The library of classes and functions you can actually rely upon is very small, because almost everything can be turned off via --disable-all. This means you end up having to re-implement in conditionals stuff you really ought to be able to rely upon. Even the "PHP Standard Library" can be disabled, which makes it non-standard, and means you can't rely upon it. I'd much rather there was a sane set of extensions that were always enabled and could not be disabled: preferably, everything that ships as PHP should always be enabled and should not be able to be disabled.
  • No native queue structure. Sure, you can use array_shift and array_push, but array_shift is an O(n) operation, and n can get quite large in the real-world, to a large enough extreme that it becomes undesirable from a performance point of view. While PHP 5.3 adds SPLQueue, this suffers from the problem of it being something that can be disabled (see above). I'd expect something as basic as a queue structure to exist in a language in the 21st century.
  • Inability to override object comparison. I may want objects that represent the URIs http://example.com and http://example.com/ to be equivalent, while keeping their original form.
  • Inability to override how objects are type-cast to any type apart from string (bool would be nice, for a start…).
  • ((bool) '0' === false) is an endless source of bugs.
  • There never appears to be that much regard to backwards/forwards compatibility, as it seems there is willingness to break small things in each release causing problems for a lot of people, but never to have more major changes that would break everything, but fix a lot of the problems with the language.
  • When implementing things like the DOM API, there are subtle differences to the spec.
  • XMLWriter doesn't actually necessarily output XML. Bugs reported to say this are bogus, so are we meant to assume that XMLWriter isn't actually meant to be able to be used as an XML serializer that can be relied upon? If that is the case, what are we meant to meant to use? Is it too much to ask for an XML serializer whose output always meets the "document" production in the XML specification (sorry, standard)?
  • There is an overhead of around 70 bytes per array entry, which makes using arrays of codepoints a not-entirely-satisfactory workaround of the language's lack of Unicode support.
  • Learning the argument order of functions takes years to do, if ever, a fact which is in part due to the internal variation, for example between array_search($needle, $haystack) and strpos($haystack, $needle).
  • PHP internally has a function that does type-hinting to some extent (zend_parse_parameters), yet it is repeatedly rejected to have type-hinting in the userland of anything apart from arrays and objects.
  • The filter extension (which apparently people need to use, according to the PHP developers) is buggy to the extreme of being useless. See this for some issues with its IPv6 support (I've also found issues, including regressions from 5.2.6 to 5.2.9… maybe I'll need to implement my own again). Equally, the URL filter uses the parse_url function internally, which the manual notes is not meant to validate the given URL, which inspires confidence in the extension. Why should I use the filter extension when it can be disabled, and when it is buggy? I cannot use it in distributed code that must run consistently on PHP 5.2.0 and above, even if those bugs are fixed in future releases. Likewise, having any bugs makes me weary of using it at all, as it makes me suspect of it having further bugs.
  • The reasons behind some of the most annoying version inconsistency, and one that hit both SimplePie 1 and MagpieRSS badly was data missing all pre-defined XML entities, a bug that was ultimately caused by PHP using an internal libxml2 API, which (unsurprisingly as an internal API) changed in libxml2 2.7. This means that with any version of PHP less than 5.2.9 with a version of libxml2 of 2.7 or above the xml extension is more or less useless. This was, thankfully, redeemed in PHP 5.2.9 by using a public API only added in libxml 2.7.3, so with libxml 2.7.0–2 the xml extension never works.
  • The wonderful zend.ze1_compatibility_mode, when turned on, causes $foo = new ReflectionClass('StdClass'); to throw E_ERROR (i.e., a fatal error).

Comments

  1. Snedders Fan
    says

    Edit commentAugust 28, 2009 20:25:18+01:00

    On arrays, have you checked out ArrayObject in SPL?

  2. Geoffrey Sneddon
    says

    Edit commentAugust 28, 2009 23:29:26+01:00

    ArrayObject requires an extension that can be disabled, has even more memory overhead than native arrays, is slower, fails to work with some functions that expect arrays…

Leave a Reply

Comments are (sometimes) moderated.