Archive for the ‘PHP’ Category.

PHP CLI: A Cinderella Story

This past Tuesday, Mike Lively and I presented "PHP CLI: A Cinderella Story" at the 2008 DC PHP conference. The presentation introduced the advantages of moving the heavy lifting from your web pages into backgrounded CLI scripts and ranged from the basics of writing backgrounded CLI scripts in PHP to more advanced multi-process and distributed processing.

The slides are embedded below.



We're still preparing the code examples that we were unable to show during the presentation. Stay tuned.

A Magical Lullaby

When you want to take control over the state of your objects as they're serialized, PHP provides two magical functions to help you. As the documentation states:

The intended use of __sleep is to commit pending data or perform similar cleanup tasks. Also, the function is useful if you have very large objects which do not need to be saved completely. Conversely, [...] the intended use of __wakeup is to reestablish any database connections that may have been lost during serialization and perform other reinitialization tasks.


Unfortunately, there's one gotcha that severely limits the usefulness of this magic. Unless the __sleep method returns an array of the member names that you want serialized, the entire object gets serialized as null. This requirement is not only contradictory to the stated "intended use", it's nearly impossible to satisfy.

To explore this further, consider the example given in the documentation: a simple database wrapper. Now I'm not really sure why you'd want to serialize a database connection, but we'll assume you have to. You can't actually serialize the underlying resource that the connection is using, so you'll need to disconnect it before serialization occurs. That's all we need the __sleep method to do.

We could -- like the example -- disconnect and return a hard-coded list of our properties. Maintenance hell. So let's write a function that will properly build a list of all the properties (public, protected, AND private) in a child and all its ancestors.

<?php
    class Connection {
        [...]

        public function __sleep()
        {
            mysql_close($this->link);
            return $this->getPropertyNames();
        }
       
        public function getPropertyNames(array $filter = NULL)
        {
            $rc = new ReflectionObject($this);
            $names = array();

            while ($rc instanceof ReflectionClass)
            {
                foreach ($rc->getProperties() as $prop)
                {
                    if (!$filter || !in_array($prop->getName(), $filter))
                        $names[] = $prop->getName();
                }

                $rc = $rc->getParentClass();
            }

            return $names;
        }

        [...]
    }
?>


What a nightmare, and just to close a connection. That's why I created a patch that will allow __sleep to return NULL, in which case the object is serialized as usual. You can find it (for the PHP_5_3 branch and HEAD) attached to my message to the internals list. Let's hope it gets committed.

Multi-Process Persistent Applications: Part 2 - Multiple Children

Introduction


In the first part of this tutorial, we covered the basics of forking and detaching from the terminal. That tutorial was very procedural and straight forward. In this tutorial we're going to be dealing with multiple child forks. This part is going to be significantly more complicated than the previous, so don't feel bad if you begin to feel overwhelmed at some point. It will all come to you in time!

The Base Class: Forking, Signal Handling


The first step in handling multiple forks is going to be isolating the forking mechanism in a class, so that we can utilize it many times over. The way I'm going to do this is by creating a base class that will handle the forking itself, and create a loop where child classes can implement functionality. That is, my first class ForkedProcess will be abstract and never instantiated. It will expect a child class to implement a method which will be executed N times per second. It is at this point of execution that the child class can choose to do work, or it can do nothing. This will be the persisting loop of that particular process, and upon being broken, that process will exit.

So, first off, let's create a class. I have created a fairly basic implementation which we will later expand to handle more complicated functionality. For now, our goal is merely to create a class which handles forking. It is not yet ready to be used.

ForkedProcess.php
<?php
    declare (ticks = 1);
    abstract class ForkedProcess
    {
        protected $continue_execution;
        protected $detached = FALSE;
        protected $sleep_time;
        protected $PID;
        protected $signal_cache = array();

        /**
        * @param int $sleep_time Time to sleep between each poll
        */
        public function __construct($sleep_time = 100000)
        {
            $this->sleep_time = $sleep_time;
            $this->signal_cache = array_fill(0, 64, FALSE);
        }

        /**
        * Function that stores signals very quickly. Acts as the signal handler for php.
        *
        * @param int $signal
        */
        protected function handleSignal($signal)
        {
            $this->signal_cache[$signal] = TRUE;
        }

        /**
        * @param int $signal
        */
        protected function enableSignal($signal)
        {
            pcntl_signal($signal, array($this, "handleSignal"));
        }

        /**
        * returns TRUE if the signal has been received. If it has been received,
        * the signal is reset in the signal cache.
        *
        * @param int $signal
        * @return bool
        */
        protected function hasSignal($signal)
        {
            if ($this->signal_cache[$signal])
            {
                $this->signal_cache[$signal] = FALSE;
                return TRUE;
            }
            return FALSE;
        }

        /**
        * @param string $message
        */
        protected function debug($message)
        {
            echo "{$this->PID}\\".get_class($this)."> " . $message . "\n";
        }

        /**
        * Create background fork.
        * @return int The PID of the child process
        */
        public function fork($detach = TRUE)
        {
            $PID = pcntl_fork();

            if ($PID == -1)
            {
                throw new Exception("Unable to fork");
            }
            else if ($PID > 0)
            {
                return $PID;
            }

            $this->enableSignal(SIGTERM);
            $this->enableSignal(SIGINT);

            if ($detach == TRUE)
            {
                if (posix_setsid() == -1)
                {
                    throw new Exception("Unable to detach from controlling terminal!");
                }
                $this->detached = TRUE;
            }

            $this->PID = posix_getpid();
            $this->continue_execution = TRUE;
            $this->onStartup();

            while ($this->continue_execution)
            {
                if ($this->hasSignal(SIGTERM) || $this->hasSignal(SIGINT))
                {
                    $this->quit();
                }
                $this->tick();
                usleep($this->sleep_time);
            }

            exit(0);
        }

        protected function quit()
        {
            $this->continue_execution = FALSE;
            $this->onExit();
        }

        protected abstract function onStartup();
        protected abstract function onExit();
        protected abstract function tick();
    }
?>


You'll notice a few very important additions to the functionality, as well as some simple event handler prototypes which do nothing. The big addition here is the handling of signals. Signals are sent to the process by the operating system and act as instructions to do something. Usually, this is a kill order. There are constants in PHP which represent integer values of the signals typically sent. To see a full list of the signals on your system, you can run the following command:

root@localhost ~ # kill -l
1) SIGHUP      2) SIGINT      3) SIGQUIT      4) SIGILL
5) SIGTRAP      6) SIGABRT      7) SIGBUS      8) SIGFPE
9) SIGKILL    10) SIGUSR1    11) SIGSEGV    12) SIGUSR2
13) SIGPIPE    14) SIGALRM    15) SIGTERM    16) SIGSTKFLT
17) SIGCHLD    18) SIGCONT    19) SIGSTOP    20) SIGTSTP
21) SIGTTIN    22) SIGTTOU    23) SIGURG      24) SIGXCPU
25) SIGXFSZ    26) SIGVTALRM  27) SIGPROF    28) SIGWINCH
29) SIGIO      30) SIGPWR      31) SIGSYS      34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX


The signals we care the most about are SIGTERM (OS telling us to exit) and SIGINT (Similar to SIGTERM but usually caused by Ctrl+C from the terminal). Typically, PHP handles these for us. The reason we're taking over this process is to create a standard way of handling shutdown tasks within an object, without using register_shutdown_function, destructors, or anything like that. We specifically want to do fork-specific shutdown tasks which should not happen in certain situations. The base class itself does not have any shutdown tasks, but it expects the child class to implement a couple of functions: onExit() and onStartup() which is meant to be overriden in child classes, so that those classes can do special clean up. The ForkedParentProcess, for example, will want to clean up child processes before it is allowed to exit. PHP provides access to this functionality through the pcntl_signal() function, which we wrap for ease of use. An important thing to remember is that a signal handler should always exit as quickly as possible. That is why we keep a private variable ($signal_cache) which represents any signals received.

Another thing you should notice is this line:
declare(ticks = 1);

PHP has a method of invoking function known as tick functions. I won't go into grave detail here as it's not really important, but you can read about them in the documentation here.

Now that we have our base functionality, we need to define what types of forked processes we need. The way we handle multiple forked child processes is by having a common parent which manages all of these children. This way, we can issue a kill to the parent process, and it will happily tell all of its children to exit as well. In addition to that necessary functionality, it provides a common point of communication. If we have work to delegate, the parent process can handle that. So, we need a type of forked process that can, itself, create forks. We can implement this by extending the base forked class to create a ForkedParentProcess. We will also implement a ForkedWorkerProcess. Below is a diagram to help illustrate the relationships of these processes.


Basic Process Layout


In this diagram, the Starter Process is the process which merely starts, forks and backgrounds the Controlling Process, which is a ForkedParentProcess. The Starter Process then terminates, leaving the Controlling Process to create its own pool of children and to more or less be our persistent process. The Worker Processes in the diagram are represented in our code by the ForkedWorkerProcess. If the Controlling process exits, the children must be instructed to exit as well, since they are useless without a parent and will become rogue processes if we do not deal with them properly. Below is another diagram which shows the flow of the individual processes, from inception to termination.


Forking Flow


Parent Process, Child Reaping


Now that we have a firm understanding of how our processes should work, we can move forward actually writing the code for them. In this next section of code, I will introduce the concept of reaping which is the process of properly cleaning up exited child processes. There's a lot of code below, so take some time to soak it up. I will explain it all.

ForkedParentProcess.php
<?php
    require_once 'ForkedProcess.php';
    require_once 'ForkedWorkerProcess.php';

    /**
    * Represents a process which spawns other (worker) processes
    */
    class ForkedParentProcess extends ForkedProcess
    {
        /**
        * Max number of workers to maintain
        * @var int
        */
        private $worker_count;

        /**
        * PIDs of our child workers
        * @var array
        */
        private $children = array();

        /**
        * @param int $worker_count Number of workers
        * @param int $sleep_time Number of microseconds to sleep between each tick
        */
        public function __construct($worker_count = 5, $sleep_time = 100000)
        {
            parent::__construct($sleep_time);
            $this->worker_count = $worker_count;
        }

        /**
        * Called when the process is exiting
        */
        public function onExit()
        {
            $this->killChildren();
        }

        /**
        * Called when the process has just started
        */
        public function onStartup()
        {
            $this->debug("Alive");

            while (count($this->children) < $this->worker_count)
            {
                $worker = new ForkedWorkerProcess();
                $this->children[] = $worker->fork(FALSE);
            }

            $this->enableSignal(SIGCHLD);
        }

        /**
        * Issues SIGTERM to all workers
        */
        protected function killChildren()
        {
            foreach ($this->children as $child_pid)
            {
                posix_kill($child_pid, SIGTERM);
            }
        }

        /**
        * Find any exited workers and clean them up
        *
        */
        protected function reapChildren()
        {
            while ($return_code = pcntl_wait($status, WNOHANG | WUNTRACED))
            {
                $this->debug("reaping child " . $return_code);
            }
        }

        /**
        * Called each cycle
        */
        protected function tick()
        {
            if ($this->hasSignal(SIGCHLD))
            {
                $this->reapChildren();
            }
            /**
            * TODO: Distribute work.
            */
        }
    }
?>


Most of this should be fairly obvious. We have an object which represents the leader of a pool of workers. When the process first starts up, it spawns all of its workers. An interesting piece of code here is the concept of "reaping", which is the act of cleaning up a process which you own, after it has exited. When a child process has exited, a signal is sent to the parent process by the operating system: SIGCHLD. We look for this signal, and when it occurs, we clean up the child. This is done using the process control function pcntl_wait() (http://php.net/pcntl_wait). We're not expecting a specific child to exit, and we don't want to block if no process has exited. This function will clean up all dead child processes. If you fail to clean up child processes, you will be left with defunct processes (also called zombie processes).

The Worker Process


The following code is for the worker process. It's more or less empty, but it prepares us for the next steps of this system.

ForkedWorkerProcess
<?php
    require_once 'ForkedProcess.php';

    /**
    * Class representing a single worker process
    */
    class ForkedWorkerProcess extends ForkedProcess
    {
        /**
        * Called when the process starts
        */
        public function onStartup()
        {
            $this->debug("Alive");
        }

        /**
        * Called when the process is exiting.
        */
        public function onExit()
        {

        }

        /**
        * Called each cycle
        */
        public function tick()
        {
            /**
            * TODO: Check for work
            */
        }
    }
?>


Pretty self-explanatory. You notice I left behind a couple "TODO" points in the code. Those represent the place where we're going to implement the interprocess communication (IPC) functions. For now, we simply have a parent and a bunch of workers. We do not have a way to tell our workers to do things. I will cover this in the next section.

Checkpoint: Testing Our Classes



We have 3 classes above. Let's try a little test. Create a new php file.

example3.php
<?php
    require_once 'ForkedParentProcess.php';

    $parent = new ForkedParentProcess(3);
    $parent->fork();
?>


If you run this script you should see output similar to this:

root@localhost ~/process_tutorial # php example3.php
20634\ForkedParentProcess> Alive
20635\ForkedWorkerProcess> Alive
20636\ForkedWorkerProcess> Alive
20637\ForkedWorkerProcess> Alive


In short, you have spawned a parent process (20634), with 3 child processes (20635 through 20637). Your process IDs will naturally be different, but the idea is the same. If you use the ps command, you can view your PHP processes. See below:

root@localhost ~/process_tutorial # ps -ef | grep php
root    20634    1  0 10:31 ?        00:00:00 php example3.php
root    20635 20634  0 10:31 ?        00:00:00 php example3.php
root    20636 20634  0 10:31 ?        00:00:00 php example3.php
root    20637 20634  0 10:31 ?        00:00:00 php example3.php


If you issue a kill to the parent process, it should exit and take its children with it:

root@localhost ~/process_tutorial # ps -ef | grep php
root    20634    1  0 10:31 ?        00:00:00 php example3.php
root    20635 20634  0 10:31 ?        00:00:00 php example3.php
root    20636 20634  0 10:31 ?        00:00:00 php example3.php
root    20637 20634  0 10:31 ?        00:00:00 php example3.php
root    20649 13958  0 10:35 pts/0    00:00:00 grep --color php
root@localhost ~/process_tutorial # kill 20634
root@localhost ~/process_tutorial # ps -ef | grep php
root@localhost ~/process_tutorial #


Conclusion


In summation, this article has covered a variety of features. We learned how to manage multiple children, handle signals and properly clean up after our child processes. We have laid the framework for a basic workload distribution system. This is a very useful starting point, but it is missing a key element: The ability to communicate. Once the child processes have spawned, they become autonomous, taking direction from no one. In the next part of this tutorial, I will continue developing these classes (and adding a couple new classes) to provide this functionality. I will introduce the concept of message queues.

Multi-Process Persistent Applications: Part 1 - Forking

Introduction


Recently I was confronted with a problem. We needed to develop an application which could handle large amounts of data, fire off thousands of requests to other servers via http every few minutes, and not let any one thing slow this whole process down. That last part is important. The data being funneled is coming from multiple customers, and heading out to multiple destinations. Moreover, there's an intermediate step that requires the data to be pre-packaged. That step required communication with a large set of database of customers. So, you can see a few things from this.

Our application takes multiple inputs, processes the data against multiple databases and then sends that data to multiple outputs. As you can see, this creates a huge number of potential bottlenecks. I don't want customer A's data slowing down customer B's data, and likewise I don't want the performance of Output A getting in the way of Output B's performance. How do we handle this? There's a number of ways to do this. The solution I went with was one of forking and shared memory, and everything that comes with that (such as System V IPC message queues and Semaphores).

Some Questions Answered


Where do you start? The first step is to install the necessary PHP extensions, and then get a basic understanding of forking.

What happens when you fork? Generally, forking is a sort of "process copying". This is dissimilar from how you'd probably handle this Windows, where creating a separate thread is the way to do it. There are threading libraries for linux, but none of which are well implemented and supported in PHP.

Does it work under Apache? Because forking creates a copy of the process this will not work unless you are running from the command line in a CLI version of PHP.

Is forking expensive? Yes, forking is an expensive process. The best way to handle forking is by forking early, and keeping processes running, rather than creating and terminating processes frequently. If you fork very infrequently, however, it may be more economical to start them and stop them. You will likely have to tune this to your specific application. There are ups and downs to both. Forking more means using more CPU, keeping processes around means more RAM. The decision is yours.

Enabling the Right Extensions


There are a few extensions that will be useful in this adventure. You need not install all of them immediately, but it's worth getting them all out of the way ahead of time. The only extension you need to fork is pcntl (http://php.net/pcntl), the process control extension. I will frequently also use functions that exist in the posix (http://php.net/posix), so install that as well. Later on in this tutorial I'll be covering shared memory, message queues and semaphores, which are all covered by the sysvipc (http://www.php.net/manual/en/ref.sem.php) extension.

Step 1: A Simple Forking Application


Below is an example a very simple forking application. The program starts, forks, and both processes report some information about themselves before exiting.

example1.php
<?php
    $PID = pcntl_fork();

    if ($PID == -1)
    {
        // We were unable to fork
        die("Error forking.");
    }
    else if ($PID > 0)
    {
        echo "I am the parent process, and my ID is: " . posix_getpid() . ". My child ID is: $PID\n";
    }
    else
    {
        echo "I am the child process, and my ID is: " . posix_getpid() . ". My parent ID is: " . posix_getppid() . "\n";
    }
?>

Put this code in a simple file, let's call it example1.php. You should see output similar to this when executing it:
root@localhost ~/process_tutorial # php example1.php
I am the child process, and my ID is: 1487. My parent ID is: 1486
I am the parent process, and my ID is: 1486. My child ID is: 1487

We used a few important functions here. pcntl_fork() is the function used to fork. What happens here is the process is split, and in the parent process, shown here as process id #1486, the call to pcntl_fork() returns the process ID of the child process. If the return value is -1, something has gone haywire. This is likely a system configuration issue, and not something I can really assist with. In the child process, shown here as process id #1487, the call to pcntl_fork() returns 0. The return value of this function is ultimately how you direct code flow in the two processes. The other two important functions we used are posix_getpid() and posix_getppid(), which return the ID of the current process and the ID of the parent to this process, respectively.

Step 2: Starting a Background Process and Exiting


One of the common things people use forking for is to make an application that automatically backgrounds itself. This is fairly common practice and you see it in a lot of applications, such as VPNC. This allows an application started from the command line to persist even after the controlling terminal has exited. A few things have to happen. Firstly, the parent can just exit. However, the child process will still be attached to the terminal from which the parent was executed.

In order to detach, we make use of a POSIX linux command called setsid(). In PHP, we are given access to this function through the POSIX extension, and the name of the function is posix_setsid(). What this function does, in general terms, is turns the process from which it is being called into an independent process with its own process group. You're probably asking why we need to fork in the first place. Why can't we just detach the process we started from the terminal? The answer to that question lies in the man pages.

The setsid() function shall fail if:


[EPERM]
The calling process is already a process group leader, or the process group ID of a process other than the calling process matches the process ID of the calling process.

That is, when we start php using the command line, we are effectively creating a new process, which is the leader of its process group, and thus we must make another process using pcntl_fork() if we wish to detach.

In the example below, we create a new process, detach it from the controlling terminal, and exit the parent process. The child process will exit after 30 seconds. You could also issue a SIGTERM to the process with the kill command in linux.

example2.php
<?php
    $PID = pcntl_fork();

    if ($PID == -1)
    {
        // We were unable to fork
        die("Error forking.");
    }
    else if ($PID > 0)
    {
        // Exit the parent process
        die("Parent exiting. Process started in background (pid: $PID)");
    }
    echo "Detaching from terminal.\n";

    // Create a new session (Detach from terminal)
    if (posix_setsid() == -1)
    {
        die("Unable to detach from controlling terminal!");
    }

    // Wait 30 seconds before exiting so we view this process running in the background, for testing purposes.
    $timer = 30;
    while ($timer--) sleep(1);
?>


Executing this code should present an output similar to this:
root@localhost ~/process_tutorial # php example2.php
Detaching from terminal.
Parent exiting. Process started in background (pid: 2131)

root@localhost ~/process_tutorial # ps -ef | grep php
root      2131    1  0 12:43 ?        00:00:00 php example2.php

And about 30 seconds later, the process is gone:
root@localhost ~/process_tutorial # ps -ef | grep php
root@localhost ~/process_tutorial #

Your background process has started, spent 30 seconds doing some things, and exited. That is the basic way to fork and background.

Conclusion


In the next part of this series I'll talk about how to manage multiple child processes. In the third, I will cover how to communicate between processes using message queues. The fourth part will cover shared memory and semaphores.

WAP: Part 6 - Microbrowser content in WML / XHTML MP

WML


WML is probably the lowest common denominator of microbrowser content formats. If you can assume you're not going to support older phones that only can display WML, you might decide to only display your content in XHTML MP (Mobile Profile).

WML is a different paradigm of web design, which encompasses multiple pages (cards) contained in a single document (deck). A brief and easy introduction into WAP and WML can be found at w3schools. Read all of it.

Once you've read all of that, there's more in-depth WML information and tutorials at http://www.developershome.com/wap/wml/. Specifically you'll at least want to check out how to submit form data in WML.

WML in Firefox


To test WML in Firefox, you'll want two plugins: WML Browser and Modify Headers. Modify Headers is used to append 'text/vnd.wap.wml' to the Accept header -- indicating that Firefox somehow now knows how to render WML. You can also use Modify Headers to change the User-Agent header, which is not strictly needed, but if your WAP site is utilizing the WURFL to determine device capabilities, then it may be useful to 'virtually' test different devices (see my previous WURFL article). WML has an extremely strict syntax, so it is best to test pages in Firefox using the WML Browser as it will more verbose in reporting errors, where as your phone will probably just display nothing.

XHTML MP


XHTML MP is really just strict XHTML with some limitations. You must always close your tags, they must be in lower case, attributes must be
enclosed in quotes, etc. etc. Go through the documentation provided here for details.

To test XHTML MP in a 'big' browser you don't have to do anything in particular since XHTML MP is really a subset of XHTML. If you're using something like the WURFL to redirect to a 'full' site based on User-Agent, you might add something in your code to force to XHTML MP mode.

Dynamically changing format with XSLT


You could probably dynamically change from WML to XHTML MP by using XSLT but I've found it simpler to maintain seperate files. In a
multi-page site this could quickly get out of hand. So for a primer here is an article that uses PHP, WURFL & XSLT to translate pages between XHTML MP and WML.

Articles In This Series:


WAP: Part 1 - MultiTech USB GPRS Modem in Linux
WAP: Part 2 - Send SMS from Kannel
WAP: Part 3 - WAP Push with Kannel & PHP
WAP: Part 4 - Send SMS from PHP
WAP: Part 5 - Customizing content with WURFL
WAP: Part 6 - Microbrowser content in WML / XHTML MP

WAP: Part 5 - Customizing content with WURFL

We've covered (in-depth) the delivery side of getting URLs to phones using Kannel. Now it's time to cover the content you'd like to
display on the phone. One tool that I've found to be the best single database of what phones can do is the WURFL. WURFL stands for Wireless Universal Resource File, and it is simply and XML 'database' of phones, their families, and their capabilities. You can get the XML file here. There's also a WURLF PHP toolkit for incorporating the WURFL into your project, download it here.

Installing and Configuring the WURFL


To start, unzip the wurfl toolkit and then download the wurfl file and unzip it as well. Then set up the WURFL config file and update the
WURFL cache.

/ $ cd
~ $ mkdir -p ~/src/wurfl
~ $ cd ~/src/wurfl
~/src/wurfl $ wget http://downloads.sourceforge.net/wurfl/wurfl_php_tools_21.zip
~/src/wurfl $ unzip wurfl_php_tools_21.zip
~/src/wurfl $ wget http://wurfl.sourceforge.net/wurfl.zip
~/src/wurfl $ unzip wurfl.zip
~/src/wurfl $ vi wurfl_config.php


wurfl_config.php
<?php
...
define("DATADIR", dirname(__FILE__));
define("WURFL_PARSER_FILE", DATADIR . 'wurfl_parser.php');
define("WURFL_CLASS_FILE", DATADIR . 'wurfl_class.php');
...
?>

Updating the cache


To build the cache, run update_cache.php.
~/src/wurfl $ php update_cache.php


If you're planning on running update_cache.php from a web browser after periodically updating wurfl.xml, you may want to change
permissions so they're apache friendly:
/ # cd &lt;wurfl dir&gt;
wurfl # chown -R :apache .
wurfl # chmod 664 cache.php
wurfl # chmod 2775 .
wurfl # sudo -u apache php ./update_cache.php

WURFL and the toolkit simply work by analizing the User-Agent header to determine what type of device is connecting and what capabilities it has. The cache groups phone families logically so it only loads a portion of the WURFL database... only that which is needed. It also stores device info from previously connected devices, to negate the need of an additional lookup in the future.

Using the WURFL


Here is a simple example of using the WURFL to determine if the "device" is a WAP browser (a phone) or not (a full-fledged browser
such as IE or Firefox).

<?php

require_once 'wurfl_config.php';
require_once WURFL_CLASS_FILE;

//new up the wurfl class
$device = new wurfl_class();

//have it load it's information based on the user agent
$device->GetDeviceCapabilitiesFromAgent($_SERVER['HTTP_USER_AGENT']);

if(!$device->browser_is_wap)
{
    header('Location: http://www.yahoo.com');
    exit;
}

?>

Note that many "smart" phones will not return TRUE for 'browser_is_wap'. You may or may not want to send them to a full-sized site due to screen size, download speeds, etc. There is a boatload of additional information that can be gleamed from the WURFL PHP toolkit from inspecting the $device->capabilities array.

For determining content format, as I see it you have 3 basic choices:
  1. Full-fledged pages displayed by smart phones or "big" browsers (from above example)

  2. XHTML Mobile Profile (MP), determined by $device->capabilities['markup']['preferred_markup'] = 'html_wi_oma_xhtmlmp_1_0'

  3. WML, determined by $device->capabilities['markup']['preferred_markup'] = 'wml_1_1'

There are other capabilities such as 'flash_lite' which I will not get into as I'm focusing on the lowest common denominator of phone, hoping to deliver the most content quickly, easily, and reliably.

Another thing to consider that is not neccessarily dependent on the format is the screen size. I used $device->capabilities['display']['max_image_width'] and 'max_image_height' to scale images to the phone's screen. You can also use the $device->capabilities['image_format'] array to determine what type (jpg, gif, wbmp, etc.) of images the phone can display.

In my next article I'll talk about the ups and downs of microbrowser content in WML / XHTML MP.

Articles In This Series:


WAP: Part 1 - MultiTech USB GPRS Modem in Linux
WAP: Part 2 - Send SMS from Kannel
WAP: Part 3 - WAP Push with Kannel & PHP
WAP: Part 4 - Send SMS from PHP
WAP: Part 5 - Customizing content with WURFL
WAP: Part 6 - Microbrowser content in WML / XHTML MP

WAP: Part 4 - Send SMS from PHP

Finally, let's send SMS messages to mobile phones with links in them. This is the most simple and most reliable way to get a URL to a phone. Most modern phones are smart enough to be able to pick out a URL and give you an option to visit it. Granted, the "GOTO" option may be buried in a menu somewhere, but for now it's the best we can do to reach the broadest audience.

If you haven't set up Kannel yet, you'll want to do that. You can refer to my previous article WAP: Part 2 - Send SMS from Kannel. It includes information on a basic setup and also how to send a test SMS message on the command line so you know it works before we get into doing with PHP.

This process is also the easiest to construct using PHP. Since Kannel does all of it's calls using the HTTP protocol, we'll simply turn on allow_url_fopen in our PHP ini and then we can use file() to make the Kannel requests. You could also use curl or a socket connection if you'd like.

Since this example so simple, I'll let you download the example code to look at the actual call, but essentially we're doing:

<?php

$result = file(
    'http://localhost:13013/cgi-bin/sendsms' .
    '?user=sms_user' .
    '&pass=sms_pass' .
    '&to=7025551212' .
    '&text=Go+to+Yahoo%21+on+your+phone+by+visiting+http%3A%2F%2Fwap.yahoo.com'
);

?>

If you're using the example index to send, don't forget to change the message to include the URL in the message itself. Unlike sending a
WAP Push message, there is no URL directly associated with the message.

Download the example

Articles In This Series:


WAP: Part 1 - MultiTech USB GPRS Modem in Linux
WAP: Part 2 - Send SMS from Kannel
WAP: Part 3 - WAP Push with Kannel & PHP
WAP: Part 4 - Send SMS from PHP
WAP: Part 5 - Customizing content with WURFL
WAP: Part 6 - Microbrowser content in WML / XHTML MP

WAP: Part 3 - WAP Push with Kannel & PHP

This article will go through setting up Kannel to send "WAP Push" messages to mobile phones. It assumes you have a working Kannel
installation with a real modem -- see parts 1 & 2 of this series for more information.

Before we continue with this article, let me save you some time. I have found WAP Push messages to be very unreliable. Specifically I was only able to get a WAP Push message sent to a T-Mobile phone from my GSM modem with a T-Mobile SIM card. I tried a Cingular SIM card in the modem, but was unable to successfully send a WAP Push message to a Cingular or other network phone.

I searched for 3rd party companies that will send WAP Push messages, and contacting a few of them. It seems here in the U.S. this type of service is relatively unreliable and sometimes only the network providers themselves can send WAP Push messages that will get through. Case in point: I couldn't get a WAP Push message to a Cingular phone, but you can download ringtones from cingular.com, for which the URL is sent to the phone via WAP Push.

So, if you want to send WAP Push messages from a T-Mobile modem to only T-Mobile phones, read on! If not, you might save yourself the headache and time and skip the next article of my WAP series: Send SMS from PHP.

Sending WAP Push from the command line


To our working configuration file from WAP: Part 2 - Send SMS from Kannel, we'll add the ppg and wap-push-user groups. Also add wapbox to the startup file and restart Kannel.

/etc/kannel/kannel.conf
...
# PUSH PROXY GATEWAY CONFIG
group = ppg
ppg-url = /wappush
ppg-port = 8080
service-name = ppg
trusted-pi = true

# WAP USER
group = wap-push-user
wap-push-user = SellingSource
ppg-username = sellingsource
ppg-password = sellingsource


start_kannel.sh
#!/bin/sh

rm /var/log/kannel/*
bearerbox --verbosity 4 --logfile /var/log/kannel/bearerbox.log /etc/kannel/kannel.conf &
sleep 10
smsbox --verbosity 4 --logfile /var/log/kannel/smsbox.log /etc/kannel/kannel.conf &
wapbox --verbosity 4 --logfile /var/log/kannel/wapbox.log /etc/kannel/kannel.conf &


I initially tested WAP Push using the included test_ppg executable included with the Kannel source. If you don't want/need to test it from the command line you can skip to the PHP segment.

Unfortunately the test programs are not included with the Gentoo/portage installation of Kannel. So I downloaded the source and
compiled (but did not install) it to get the test programs. Then I copied the example xml files used by test_ppg to my home directory for modification.

~ # cd ~/src
src # wget http://kannel.org/download/1.4.1/gateway-1.4.1.tar.bz2
src # tar -jxvf gateway-1.4.1.tar.bz2
src # cd gateway-1.4.1
gateway-1.4.1 # ./configure && make
gateway-1.4.1 # cp test/si.txt ~/si.xml
gateway-1.4.1 # cp test/smstestppg.txt ~/pap.xml

In si.xml I removed the created and si-expires attributes so there's no delivery timing issues (deliver immediately). The indication href
is a URL that will be shown as the 'From:' in the message, and is where the message will take them. The si-id is a unique message
identifier and should be changed everytime the message is sent. I like to use number@domain and up the number once before sending.
Change the message to something suitable, hopefully describing where the href URL will take them.

si.xml
<?xml version="1.0"?>
<!DOCTYPE si PUBLIC "-//WAPFORUM//DTD SI 1.0//EN"
"http://www.wapforum.org/DTD/si.dtd">
<si>
    <indication href="http://wap.yahoo.com"
        si-id="01@sellingsource.com"
        action="signal-high">
            Visit Yahoo! on your phone
    </indication>
</si>

In pap.xml, I changed to the push-id to the same as the si-id, this value should change per message as well. I changed the WAPPUSH number to my own, this time with country code (leading "+1"). I also changed the "carrier" after the 'PLMN@' to be our domain.

pap.xml
<?xml version="1.0"?>
<!DOCTYPE pap PUBLIC "-//WAPFORUM//DTD PAP//EN"
          "http://www.wapforum.org/DTD/pap_1.0.dtd">
<pap>
  <push-message push-id="01@sellingsource.com"
  deliver-after-timestamp="2001-02-28T06:45:00Z"
  progress-notes-requested="false">
    <address address-value="WAPPUSH=+17025551212/TYPE=PLMN@sellingsource.com">
    </address>
    <quality-of-service
    priority="low"
    delivery-method="unconfirmed"
    network-required="true"
    network="gsm"
    bearer-required="true"
    bearer="sms">
    </quality-of-service>
  </push-message>
</pap>


So let's test this from the command line:
gateway-1.4.1 # ./test/test_ppg "http://localhost:8080/wappush?username=wap_user&password=wap_pass" ~/si.xml ~/pap.xml

On my T-Mobile Motorola phone, I now have a message under "Browser Msgs." I can click the "GO TO" button and it will take me to the URL specified in the 'href' attribute of si.xml.

Sending WAP Push from PHP


Now let's do the same thing using PHP. But first let me explain how things are going to change. XML is pretty chatty, so we'll be using a binary format called WBXML. Since we'll be communicating directly to Kannel via HTTP with a URL, we'll represent the hexadecimal numbers with a prefix of '%' for URL encoding, rather than '0x'. Along with the Service Indication (SI) document in WBXML, we'll also have to pass a User Data Header (udh) for WAP Push. For an explanation of the udh and WBXML, please see the reference information links below, I've also documented them as best I know in the code:

<?php

require_once 'SMSBase.php';

class WAPPush extends SMSBase
{
        public function __construct(KannelInfo $kannel_info)
        {
                parent::__construct($kannel_info);
        }

        public function sendSMSLink(LinkSMS $sms)
        {
                $fields = array('to' =>; urlencode($sms->getTo()),
                                                'udh' => '',
                                                'text' => '');

                //Nokia User Data Header (UDH) for WAP Push
                $fields['udh'] .= '%06'; //length of UDH - 6 bytes
                $fields['udh'] .= '%05'; //information element (IE) identifier - 0x05 = 16-bit port addressing scheme
                $fields['udh'] .= '%04'; //IE data length - 4 bytes
                $fields['udh'] .= '%0B%84'; //IE data - destination port, 0x0B84 = port 2948 (WAP Push)
                $fields['udh'] .= '%23%F0'; //IE data - origination port, 0x23F0 = port 9200

                //WBXML version of Service Indication (si)
                //headers
                $fields['text'] .= '%1B'; //Transaction ID
                $fields['text'] .= '%06'; //PDU Type - Push
                $fields['text'] .= '%01'; //length of headers
                $fields['text'] .= '%AE'; //Content-type: application/vnd.wap.sic

                //xml body
                $fields['text'] .= '%02'; //WBXML Version 1.2
                $fields['text'] .= '%05'; //DTD Version SI 1.0 Public Identifier
                $fields['text'] .= '%6A'; //Charset UTF-8
                $fields['text'] .= '%00'; //String Table Length (0)
                $fields['text'] .= '%45'; //<si>
                $fields['text'] .= '%C6'; //<indication>
                $fields['text'] .= '%0C'; //href="http://
                $fields['text'] .= '%03'; //start of string value
                $fields['text'] .= $this->urlHexEncode($sms->getURL());
                $fields['text'] .= '%00'; //end of string value
                $fields['text'] .= '%01'; //si-id attribute
                $fields['text'] .= '%03'; //start of string value
                $fields['text'] .= $this->urlHexEncode($sms->getText());
                $fields['text'] .= '%00'; //end of string value
                $fields['text'] .= '%01'; //end element (</si>)
                $fields['text'] .= '%01'; //end element ()

                return $this->sendSMS($fields);
        }

        private function urlHexEncode($text)
        {
                $string = '';
                for ($i=0; $i < strlen($text); $i++)
                {
                        $letter = $text[$i];
                        //get the numeric ascii value of the letter
                        //convert it to hex and add a percent (%)
                        $string .= sprintf('%%%02X', ord($letter));
                }
                return $string;
        }
}

?>

sendSMS() in the parent, SMSBase, simply uses the file() function to send the URL via HTTP GET -- you'll have to make sure allow_url_fopen is set to On in your php.ini or with ini_set().

Also, now that we're doing all of the encoding ourselves and sending directly to the smsbox daemon, we can turn off wapbox. In the startup script, comment out the wapbox line. In kannel.conf you can comment out any wap related entries in the core group (wapbox-port, wdp-interface-name, etc.). You can also comment out the entire wapbox, ppg, and wap-push-user group blocks. You'll have to restart kannel for these changes to take effect.

Resources


Download the entire example.

(udh explanation)
http://discussion.forum.nokia.com/forum/archive/index.php/t-13518.html

(wbxml breakdown)
http://discussion.forum.nokia.com/forum/archive/index.php/t-16775.html
http://www.activexperts.com/activsms/sms/wappushsi/

(wappush using PHP)
http://www.mail-archive.com/users@kannel.org/msg07893.html

Articles In This Series:


WAP: Part 1 - MultiTech USB GPRS Modem in Linux
WAP: Part 2 - Send SMS from Kannel
WAP: Part 3 - WAP Push with Kannel & PHP
WAP: Part 4 - Send SMS from PHP
WAP: Part 5 - Customizing content with WURFL
WAP: Part 6 - Microbrowser content in WML / XHTML MP

Optimizing Using Xdebug and Kcachegrind

Introduction


Recently it came to light that a product of ours was going to be receiving more and more volume. Our client had plans of putting more users on our system (about ten times as many), and we had already experienced some performance problems before. It is the type of problem that you would have a hard time fixing by throwing more servers at it. The original application wasn’t all that well designed. We made a lot of mistakes and we learned a lot about product development on this particular application. It would seem that horizontal scalability (adding more servers) wouldn’t solve our problem. There are far too many bottlenecks in the database, so more application servers don’t really help us. That leaves the option of vertical scaling (upgrading or replacing servers). Vertical scaling, however, brings with it another element: Cost. Not wanting to spend a ton of money on replacing our already very expensive servers, we turned our eyes to the code itself.

Installing Xdebug and KCacheGrind


How does one find these problems? My first thought was some way of profiling code. Xdebug (http://xdebug.org/) provides a pretty good profiler, and exports results to cachegrind files that can be analyzed using KCacheGrind (http://kcachegrind.sourceforge.net/). This particular combination of open source ventures is very useful. You simply install the xdebug extension, enabled it, then edit your xdebug settings. I use mostly default settings, and you can enable the profiler with the following setting:

xdebug.profiler_enable="1"


XDebug has an enormous amount of options, and the default configuration provided with the extension is pretty solid. There’s a page on their website outlining all the settings and their uses (http://xdebug.org/docs/all_settings).

First Run


Once you have installed and XDebug and KCacheGrind, run your application, and then look in /tmp to see if you have a cachegrind output file. For example:

root@localhost /tmp # ls -al -r cachegrind.out.*
-rw-r--r-- 1 apache apache 105863 Jun 17 12:35 cachegrind.out.510234603


If you do not have a file similar to this, be sure the extension is loaded. If you are using a web server such as apache, remember to restart the daemon after installing xdebug. If you’re using the command line, simply running a php –v should tell you whether or not XDebug is installed and working. For example:

root@localhost /etc/php/cli-php5/ext-active # php -v
PHP 5.2.2-pl1-gentoo (cli) (built: May 25 2007 12:34:43)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
with Xdebug v2.0.0RC3, Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, by Derick Rethans


If you are using a web server for your application, XDebug will show up in the output of a phpinfo() call if it is installed properly. You can also use this to see if the profiler is properly enabled. It should look something like this (There's a lot more options than I show here):

XDebug PHP Info


An Example


I have created a simple application which loops 10,000 times, printing stuff to the screen, performing arithmetic, generating random numbers, and file output. The entire purpose of this little noisy script is to do a whole bunch of stuff, and give us an opportunity to see which parts take the longest, using Xdebug and KCacheGrind. I wanted to profile an actual application at some point, but I felt as though it would be too cumbersome and it might be difficult to illustrate the idea. Below is my example.

profile1.php
<?php

    define ('NUM_LOOPS', 10000);

    function complex_calculation()
    {
        return 1.034587763 * mt_rand() % mt_rand();
    }

    function print_something()
    {
        echo "something";
    }

    function write_something()
    {
        $fp = fopen("test.tmp", "w");
        fwrite($fp, "something_important");
        print_something();
        fclose($fp);
    }

    for ($i = 0; $i < NUM_LOOPS; $i++)
    {
        complex_calculation();
        print_something();
        write_something();
    }
?>


Once you have loaded the cachegrind output file into KCacheGrind, you should be presented with a few panels. One is a list view, another is a panel with a bunch of tabs, and another is a tree view which provides a very useful graphical representation of where CPU is being spent. In the list (and the graph) you should notice an item labeled {main}. This is the all-inclusive element that shows the total execution of the program you're profiling. It should show as 100% of your CPU usage. Here is the list view:

List View


Here's a breakdown of each column:

  • Incl.: The total CPU time spent in this function and every function it called inclusively.

  • Self: Only CPU time spent in this function, not counting the time spent in functions CALLED by this function.

  • Called: Total number of times this function was called.

  • Function: Name of this function.

  • Location: Script file containing this function.



Next is the graph/table view. The nice thing about this feature is its ability to show major choke points in a very noticeable format. Typically, this graph excludes calls which are very tiny compared to the rest of the application. Very useful. If you select the {main} box in the graph and you should see something very close to this:


Tree View


Even the most untrained eye can probably guess that write_something() is the slower part of this application. Double click the write_something() box to make it become the new focal point of the graph. Now that you have that selected, you should see something similar to this:


Tree View 2


With write_something() centered, you can see who has called this method (100% of the time, it is {main}), and how much time is spent in functions called by write_something(). So, of the time spent in write_something(), most of that time is spent opening the file, and less time is spent in fwrite().

Conclusion


I could go into details about optimizing my example script, but I feel as though the point has been delivered. Xdebug and KCacheGrind work together to become a very powerful tool when attempting to optimize a PHP application. There are many features to KCacheGrind and I encourage everyone to explore them all. Additionally, it has the capability to track memory usage, but I've had personal experiences which suggest that does not work perfectly.

Good luck!

MCrypt Woes

As part of our ever increasing need to be secure, I started working on universalizing our encryption schemes and coming up with an easy and standard way for all of our applications to handle encryption. The main goal was to wrap mcrypt. For the most part, this worked great. A problem arose, however!

It seems the mcrypt_create_iv() method in PHP's mcrypt module has been broken for some time. Specifically, when creating an initialization vector using the MCRYPT_RAND constant (instead of, say, MCRYPT_DEV_RAND, etc), it will return the same IV every time. After digging in the extension itself, I found the bug, created a patch and submitted to the PHP bugs site here:

http://bugs.php.net/bug.php?id=40999

The bug should be fixed in the next release. In the meantime, we're using MCRYPT_DEV_URAND, which is similar to using /dev/rand, but won't block if the system hasn't accumulated enough entropy.