ByePie

Tags: , December 28, 2007 (0 comments)

Having put a lot of thought into the matter over the past several weeks, I've made my decision to leave development of SimplePie.

"Why!? Oh Why!?", you scream (well, maybe not, but I'm not a telepathic seer). For a start, I haven't actually really used SimplePie myself since early 2006 (now almost two years ago), and I now have less and less to do with PHP at all (and I totally hate it — a recent bug in SP was caused by the fact that "0" == false — and have therefore moved to (mainly) Python).

Furthermore, over the past year, since March/April, my time has become increasingly limited, and SP has de-facto been one of the things that I have cut a long time ago (the reason for the lack of commits from me much) — the majority of my time is now spent on schoolwork, with what is left over being spent working on various specs (predominantly HTML 5 and Tolerant HTTP Parsing).

However, what does the future of SP hold? Well, various decisions need to be made about the future direction — do you try and improve 1.x further (it was already stretched to breaking point at 1.0, mainly held back by PHP itself — a sad state to be in), or do you start on the vision of SP2? To take the former option, I doubt you could get much further than what is currently planned for 1.2 with the current 1.x base — any further development requires a large amount of reworking the internals of SP (to the extreme of being questionable about whether there is any point of not starting from scratch). The latter option is probably the best (though ideally get 1.1 out as soon as it can be).

One of the aims of SP2 is true modularity — it should be possible to use (and load) nothing more the parser itself (i.e., give it raw XML data, and it gives you an API to access the title, description, etc. as they are in the feed without sanitising them at all) — which has several advantages for deciding any successor to myself: get people to write various modules for it against pre-existing specs (most of which are only drafts and so will need further development over time). What exactly those modules will be I am mainly undecided (though it won't, I assure you, be the more complex parts of the API itself — the design of them is mainly unwritten and comes from knowledge of successes/failures from SP1's API). I will myself continue maintaining a couple of the modules (namely, the Unicode and IRI ones, both of which I use outwith of SP — though more may be added to that list).

I'm more than willing to be around in a consulting role for a while — my contact details are in the footer here, and I'll stay around in the IRC channel for a while — as well as helping people around the SP1 codebase (though I'd like to see that totally feature frozen come the end of January, with a final non-bugfix release from it in February) — which is horrifically uncommented in parts, and uses stupidly complex algorithms in others that without prior knowledge of them make no sense (I've had issues with some myself when coming back to them having not touched them in a while :) ).

Alas, there's too much to write about the vision of SP2, so that will have to be done in another post; until then, g'nite.

Resolving Relative URLs in PHP

Tags: , December 27, 2006 (0 comments)

This is deprecated, and has known bugs. See here for a replacement.

There are plenty of cases for needing to resolve relative URLs - RFC 3986 (Generic URI Syntax) has a whole section on how to go about it. SimplePie has code for this, written by me in it's entirety (although based on the pseudo-code in RFC 3986), used to deal with relative URLs in feeds (which happens to be possible pretty much everywhere). As I am the soul author of it, I've rearranged it slightly into a single function (in SimplePie it's in several methods within a larger class, as most of the methods are also called in other places), and re-licensed it under the 3 clause BSD license, LGPL, and zlib/libpng license (although of course if you redistribute it you must attach the appropriate notice as stated by one of the above licenses).

Without further ado, here's the code:

<?phpfunction absolutize_url($relative$base)
{
    
$relative trim($relative);
    
$base trim($base);
    if (!empty(
$relative))
    {
        
preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/i'$relative$match);
        for (
$i count($match); $i <= 9$i++)
        {
            if (!isset(
$match[$i]))
            {
                
$match[$i] = '';
            }
        }
        
$relative = array('scheme' => $match[2], 'authority' => $match[4], 'path' => $match[5], 'query' => $match[7], 'fragment' => $match[9]);
        if (!empty(
$relative['scheme']))
        {
            
$target $relative;
        }
        else if (!empty(
$base))
        {
            
preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/i'$base$match);
            for (
$i count($match); $i <= 9$i++)
            {
                if (!isset(
$match[$i]))
                {
                    
$match[$i] = '';
                }
            }
            
$base = array('scheme' => $match[2], 'authority' => $match[4], 'path' => $match[5], 'query' => $match[7], 'fragment' => $match[9]);
            
$target = array('scheme' => '''authority' => '''path' => '''query' => '''fragment' => '');
            if (!empty(
$relative['authority']))
            {
                
$target $relative;
                
$target['scheme'] = $base['scheme'];
            }
            else
            {
                
$target['scheme'] = $base['scheme'];
                
$target['authority'] = $base['authority'];
                if (!empty(
$relative['path']))
                {
                    if (
strpos($relative['path'], '/') === 0)
                    {
                        
$target['path'] = $relative['path'];
                    }
                    else
                    {
                        if (
$base['path'] == '/' || empty($base['path']))
                        {
                            
$target['path'] = '/' $relative['path'];
                        }
                        else
                        {
                            
$target['path'] = preg_replace('/^(.*)((\/)([^\/]*))?$/sU''\\1'$base['path']) . '/' $relative['path'];
                        }
                    }
                    if (!empty(
$relative['query']))
                    {
                        
$target['query'] = $relative['query'];
                    }
                    
$input $target['path'];
                    while (!empty(
$input))
                    {
                        
// A: If the input buffer begins with a prefix of "../" or "./", then remove that prefix from the input buffer; otherwise,
                        
if (strpos($input'../') === 0)
                        {
                            
$input substr($input3);
                        }
                        else if (
strpos($input'./') === 0)
                        {
                            
$input substr($input2);
                        }
                        
// B: if the input buffer begins with a prefix of "/./" or "/.", where "." is a complete path segment, then replace that prefix with "/" in the input buffer; otherwise,
                        
else if (strpos($input'/./') === 0)
                        {
                            
$input substr_replace($input'/'03);
                        }
                        else if (
$input == '/.')
                        {
                            
$input '/';
                        }
                        
// C: if the input buffer begins with a prefix of "/../" or "/..", where ".." is a complete path segment, then replace that prefix with "/" in the input buffer and remove the last segment and its preceding "/" (if any) from the output buffer; otherwise,
                        
else if (strpos($input'/../') === 0)
                        {
                            
$input substr_replace($input'/'04);
                            
$target['path'] = preg_replace('/(\/)?([^\/]+)$/U'''$target['path']);
                        }
                        else if (
$input == '/..')
                        {
                            
$input '/';
                            
$target['path'] = preg_replace('/(\/)?([^\/]+)$/U'''$target['path']);
                        }
                        
// D: if the input buffer consists only of "." or "..", then remove that from the input buffer; otherwise,
                        
else if ($input == '.' || $input == '..')
                        {
                            
$input '';
                        }
                        
// E: move the first path segment in the input buffer to the end of the output buffer, including the initial "/" character (if any) and any subsequent characters up to, but not including, the next "/" character or the end of the input buffer
                        
else
                        {
                            if (
preg_match('/^([^\/]+|(\/)[^\/]*)(\/|$)/'$input$match))
                            {
                                
$target['path'] .= $match[1];
                                
$input substr_replace($input''0strlen($match[1]));
                            }
                            else
                            {
                                
// We've ended up in a recursive loop, so do what we otherwise never will: return false.
                                
return false;
                            }
                        }
                    }
                }
                else
                {
                    if (!empty(
$base['path']))
                    {
                        
$target['path'] = $base['path'];
                    }
                    else
                    {
                        
$target['path'] = '/';
                    }
                    if (!empty(
$relative['query']))
                    {
                        
$target['query'] = $relative['query'];
                    }
                    else if (!empty(
$base['query']))
                    {
                        
$target['query'] = $base['query'];
                    }
                }
            }
            if (!empty(
$relative['fragment']))
            {
                
$target['fragment'] = $relative['fragment'];
            }
        }
        else
        {
            
// No base URL, just return the relative URL
            
$target $relative;
        }
        
$return '';
        if (!empty(
$target['scheme']))
        {
            
$return .= "$target[scheme]:";
        }
        if (!empty(
$target['authority']))
        {
            
$return .= "//$target[authority]";
        }
        if (!empty(
$target['path']))
        {
            
$return .= $target['path'];
        }
        if (!empty(
$target['query']))
        {
            
$return .= "?$target[query]";
        }
        if (!empty(
$target['fragment']))
        {
            
$return .= "#$target[fragment]";
        }
    }
    else
    {
        
$return $base;
    }
    return 
$return;
}
?>

RFC3339 in PHP

Tags: , March 8, 2006 (5 comments)

This is deprecated, and has known bugs. See here for a replacement.

Having searched around for any function to parse RFC3339 dates (used in Atom) in PHP, and failing to find any decent one, I wrote my own. In short, all it does is rearrange the date to a format strtotime() understands.

<?phpfunction parse_date($date)
{
    if (
preg_match('/([0-9]{2,4})-([0-9][0-9])-([0-9][0-9])T([0-9][0-9]):([0-9][0-9]):([0-9][0-9])(\.[0-9][0-9])?Z/i'$date$matches))
    {
        if (isset(
$matches[7]) && substr($matches[7], 1) >= 50)
            
$matches[6]++;
        return 
strtotime("$matches[1]-$matches[2]-$matches[3] $matches[4]:$matches[5]:$matches[6] -0000");
    }
    else if (
preg_match('/([0-9]{2,4})-([0-9][0-9])-([0-9][0-9])T([0-9][0-9]):([0-9][0-9]):([0-9][0-9])(\.[0-9][0-9])?(\+|-)([0-9][0-9]):([0-9][0-9])/i'$date$matches))
    {
        if (isset(
$matches[7]) && substr($matches[7], 1) >= 50)
            
$matches[6]++;
        return 
strtotime("$matches[1]-$matches[2]-$matches[3] $matches[4]:$matches[5]:$matches[6] $matches[8]$matches[9]$matches[10]");
    }
    else
    {
        return 
strtotime($date);
    }
}
?>

I actually wrote this for SimplePie, and like the rest of SimplePie, is released under the Creative Commons Attribution License 2.5 it is released under the zlib/libpng license.

ROT47 with PHP

Tags: October 10, 2005 (3 comments)

Some of you may of heard of ROT13 - a simple form of encryption. It uses the 26 letters of the alphabet, splits it in half, and replaces each letter with the letter thirteen places down the alphabet because it is split in half, you can simply use ROT13 on the encrypted string to get it back to normal. In PHP, the function str_rot13() does this.

ROT47 takes ROT13 one step further, instead of using the 26 letters of the alphabet, it uses ASCII codes 33 through 126, making the outputted string far less easy to decrypt in your head. There are a total of 96 characters and to encrypt a string, it replaces each character by whatever character is 47 charaters further on down the list. Like ROT13, ROT47 can just be run on an encrypted string back to normal. Unlike ROT13, PHP does not support it. Here's my basic script:

<?phpif (!function_exists('str_rot47'))
{
    function
str_rot47($str)
    {
        return
strtr($str, '!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~', 'PQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNO');
    }
}
?>

XHTML/HTML Followup

Tags: , August 18, 2005 (2 comments)

First off, to anyone new around here, I recommend you go read the original XHTML/HTML post and all the comments.

Having read several hundred lines more, I have come to the conclusion that converting XHTML 1.1 to HTML 4.01 Strict is an unnecessary complication. While sending the slightly modified XHTML 1.1 as HTML 4.01 is a complex and debatable topic that I'll try and stay out of in this post, we will, however, need to occasionally touch it.

To start off with, let's quote Steven Pemberton, the chair of the W3C HTML Working Group from the W3C mailing list in 2000:

David,

The HTML WG has discussed this issue: the intention was to allow old (HTML-only) browsers to accept XHTML 1.0 documents by following the guidelines, and serving them as text/html. Therefore, documents served as text/html should be treated as HTML and not as XHTML. There should be no sniffing of text/html documents to see if they are really XHTML.

Note that there are some semantic differences between HTML documents and XHTML documents: there are specific CSS rules that only apply to HTML (and not XHTML), and the DOM has different effects (for instance, the element names are returned in uppercase for HTML, and lower case for XHTML).

Best wishes,

Steven Pemberton
Chair, W3C HTML WG

This clearly lays out the fact that sending XHTML as text/html was what was intended for legacy support, however, the W3C note on XHTML Media Types make it plain and obvious that XHTML 1.1 should not be sent as text/html - this leaves us in a dilemma, we're meant to send XHTML as text/html for legacy support, but not send XHTML 1.1 as text/html.

So, taking our conclusion as we're meant to use a doctype switcher to switch between XHTML 1.1 served as application/xhtml+xml for browsers that allow it, as well as the validator, and serve everything else XHTML 1.0 Strict served as text/html, as long as we meet the HTML Compatibility Guidelines (appendix C of the XHTML 1.0 specification).

So, as ever, I'm going to post a PHP version, and ask anyone who can to port this to other serverside languages, and send it to me, so I can post it here giving them credit.

<?phpif ((stristr($_SERVER["HTTP_ACCEPT"], 'application/xhtml+xml'))  || (stristr($_SERVER["HTTP_USER_AGENT"], 'W3C_Validator')) || (stristr($_SERVER["HTTP_USER_AGENT"], 'WDG_Validator'))) {$mime = 'application/xhtml+xml';} else {$mime = 'text/html';}header ("Content-type$mime");if ($mime == "application/xhtml+xml") {echo '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.1//EN"        "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">';} else {echo '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">';}?> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>  <meta http-equiv="content-type" content="<?php echo $mime; ?>; charset=utf-8" />

Page:  1 2 3