Multi-Process Persistent Applications: Part 1 - Forking
Introduction
Recently I was confronted with a problem. We needed to develop an application which could handle large amounts of data, fire off thousands of requests to other servers via http every few minutes, and not let any one thing slow this whole process down. That last part is important. The data being funneled is coming from multiple customers, and heading out to multiple destinations. Moreover, there's an intermediate step that requires the data to be pre-packaged. That step required communication with a large set of database of customers. So, you can see a few things from this.
Our application takes multiple inputs, processes the data against multiple databases and then sends that data to multiple outputs. As you can see, this creates a huge number of potential bottlenecks. I don't want customer A's data slowing down customer B's data, and likewise I don't want the performance of Output A getting in the way of Output B's performance. How do we handle this? There's a number of ways to do this. The solution I went with was one of forking and shared memory, and everything that comes with that (such as System V IPC message queues and Semaphores).
Some Questions Answered
Where do you start? The first step is to install the necessary PHP extensions, and then get a basic understanding of forking.
What happens when you fork? Generally, forking is a sort of "process copying". This is dissimilar from how you'd probably handle this Windows, where creating a separate thread is the way to do it. There are threading libraries for linux, but none of which are well implemented and supported in PHP.
Does it work under Apache? Because forking creates a copy of the process this will not work unless you are running from the command line in a CLI version of PHP.
Is forking expensive? Yes, forking is an expensive process. The best way to handle forking is by forking early, and keeping processes running, rather than creating and terminating processes frequently. If you fork very infrequently, however, it may be more economical to start them and stop them. You will likely have to tune this to your specific application. There are ups and downs to both. Forking more means using more CPU, keeping processes around means more RAM. The decision is yours.
Enabling the Right Extensions
There are a few extensions that will be useful in this adventure. You need not install all of them immediately, but it's worth getting them all out of the way ahead of time. The only extension you need to fork is pcntl (http://php.net/pcntl), the process control extension. I will frequently also use functions that exist in the posix (http://php.net/posix), so install that as well. Later on in this tutorial I'll be covering shared memory, message queues and semaphores, which are all covered by the sysvipc (http://www.php.net/manual/en/ref.sem.php) extension.
Step 1: A Simple Forking Application
Below is an example a very simple forking application. The program starts, forks, and both processes report some information about themselves before exiting.
example1.php
<?php
$PID = pcntl_fork();
if ($PID == -1)
{
// We were unable to fork
die("Error forking.");
}
else if ($PID > 0)
{
echo "I am the parent process, and my ID is: " . posix_getpid() . ". My child ID is: $PID\n";
}
else
{
echo "I am the child process, and my ID is: " . posix_getpid() . ". My parent ID is: " . posix_getppid() . "\n";
}
?>
Put this code in a simple file, let's call it example1.php. You should see output similar to this when executing it:
root@localhost ~/process_tutorial # php example1.php
I am the child process, and my ID is: 1487. My parent ID is: 1486
I am the parent process, and my ID is: 1486. My child ID is: 1487
We used a few important functions here. pcntl_fork() is the function used to fork. What happens here is the process is split, and in the parent process, shown here as process id #1486, the call to pcntl_fork() returns the process ID of the child process. If the return value is -1, something has gone haywire. This is likely a system configuration issue, and not something I can really assist with. In the child process, shown here as process id #1487, the call to pcntl_fork() returns 0. The return value of this function is ultimately how you direct code flow in the two processes. The other two important functions we used are posix_getpid() and posix_getppid(), which return the ID of the current process and the ID of the parent to this process, respectively.
Step 2: Starting a Background Process and Exiting
One of the common things people use forking for is to make an application that automatically backgrounds itself. This is fairly common practice and you see it in a lot of applications, such as VPNC. This allows an application started from the command line to persist even after the controlling terminal has exited. A few things have to happen. Firstly, the parent can just exit. However, the child process will still be attached to the terminal from which the parent was executed.
In order to detach, we make use of a POSIX linux command called setsid(). In PHP, we are given access to this function through the POSIX extension, and the name of the function is posix_setsid(). What this function does, in general terms, is turns the process from which it is being called into an independent process with its own process group. You're probably asking why we need to fork in the first place. Why can't we just detach the process we started from the terminal? The answer to that question lies in the man pages.
The setsid() function shall fail if:
[EPERM]
The calling process is already a process group leader, or the process group ID of a process other than the calling process matches the process ID of the calling process.
That is, when we start php using the command line, we are effectively creating a new process, which is the leader of its process group, and thus we must make another process using pcntl_fork() if we wish to detach.
In the example below, we create a new process, detach it from the controlling terminal, and exit the parent process. The child process will exit after 30 seconds. You could also issue a SIGTERM to the process with the kill command in linux.
example2.php
<?php
$PID = pcntl_fork();
if ($PID == -1)
{
// We were unable to fork
die("Error forking.");
}
else if ($PID > 0)
{
// Exit the parent process
die("Parent exiting. Process started in background (pid: $PID)");
}
echo "Detaching from terminal.\n";
// Create a new session (Detach from terminal)
if (posix_setsid() == -1)
{
die("Unable to detach from controlling terminal!");
}
// Wait 30 seconds before exiting so we view this process running in the background, for testing purposes.
$timer = 30;
while ($timer--) sleep(1);
?>
Executing this code should present an output similar to this:
root@localhost ~/process_tutorial # php example2.php
Detaching from terminal.
Parent exiting. Process started in background (pid: 2131)
root@localhost ~/process_tutorial # ps -ef | grep php
root 2131 1 0 12:43 ? 00:00:00 php example2.php
And about 30 seconds later, the process is gone:
root@localhost ~/process_tutorial # ps -ef | grep php
root@localhost ~/process_tutorial #
Your background process has started, spent 30 seconds doing some things, and exited. That is the basic way to fork and background.
Conclusion
In the next part of this series I'll talk about how to manage multiple child processes. In the third, I will cover how to communicate between processes using message queues. The fourth part will cover shared memory and semaphores.

Leave a comment
You must be logged in to post a comment.