Codeception

Musings & ramblings of a Pythonista

Developing scalable services with Python

Developing multi-threaded applications in python is a "Pain In The Ass". And the GIL (Global Interpreter Lock) takes away the advantage of utilizing multiple cores in a machine. It doesn't matter how many cores a CPU have, GIL prevents threads from running in multiple cores. So python programs would't get the maximum performance out of the CPU when they use threads in their services.

In many cases you might need to write services where you need to listen on a port and wait for the client connection to do some task. If multiple clients are connecting to this service simultaneously then you might need to spawn threads to handle the requests. Considering the fact that GIL introduces a performance bottleneck, the best way to solve this situation is to use python's multiprocessing capabilities. This library provides almost threading like class implementations.

Then a question might arise. How do you share a socket created by the server process to the newly spawned processes? It is possible since all the forked processes will have the parent's file descriptors. multiprocessing library already has this package named multiprocessing.reduction that provides a method reduce_handle which can serialize a socket and you can send this socket to another process using pipes. The child processes can read from the pipe and re-create the socket using rebuild_handle. The following example will make this idea clear to you.

# Main Process
from multiprocessing.reduction import reduce_handle
# serialize the socket
serialized_socket = reduce_handle(client_socket.fileno())
# send it to the child/worker process
pipe_to_worker.send(serialized_socket)

# Worker Process
from multiprocessing.reduction import rebuild_handle
# get the socket from parent
serialized_socket = pipe_from_parent.recv()
# rebuild the file descriptor
fd = rebuild_handle(serialized_socket)
# create socket from fd
client_socket = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
# use the socket as usual. eg: send a message to the client
client_socket.send("Baby, I\'m so fast\r\n")

Preforking

Another way to solve this issue is by spawning multiple process from the main server which is listening on a socket/port and letting all the child processes to accept() connections from the client. Apache uses this style of process scaling known as "Preforking". A simple example using multiprocessing module which runs an instance of a BaseHTTPServer.HTTPServer on a pool of worker processes can be written very easily as follows.

#
# Example where a pool of http servers share a single listening socket
#
# On Windows this module depends on the ability to pickle a socket
# object so that the worker processes can inherit a copy of the server
# object.  (We import `multiprocessing.reduction` to enable this pickling.)
#
# Not sure if we should synchronize access to `socket.accept()` method by
# using a process-shared lock -- does not seem to be necessary.
#
# Copyright (c) 2006-2008, R Oudkerk
# All rights reserved.
#

import os
import sys

from multiprocessing import Process, current_process, freeze_support
from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

if sys.platform == 'win32':
    import multiprocessing.reduction    # make sockets pickable/inheritable


def note(format, *args):
    sys.stderr.write('[%s]\t%s\n' % (current_process().name, format%args))


class RequestHandler(SimpleHTTPRequestHandler):
    # we override log_message() to show which process is handling the request
    def log_message(self, format, *args):
        note(format, *args)

def serve_forever(server):
    note('starting server')
    try:
        server.serve_forever()
    except KeyboardInterrupt:
        pass


def runpool(address, number_of_processes):
    # create a single server object -- children will each inherit a copy
    server = HTTPServer(address, RequestHandler)

    # create child processes to act as workers
    for i in range(number_of_processes-1):
        Process(target=serve_forever, args=(server,)).start()

    # main process also acts as a worker
    serve_forever(server)


def test():
    DIR = os.path.join(os.path.dirname(__file__), '..')
    ADDRESS = ('localhost', 8000)
    NUMBER_OF_PROCESSES = 4

    print 'Serving at http://%s:%d using %d worker processes' % \
          (ADDRESS[0], ADDRESS[1], NUMBER_OF_PROCESSES)
    print 'To exit press Ctrl-' + ['C', 'Break'][sys.platform=='win32']

    os.chdir(DIR)
    runpool(ADDRESS, NUMBER_OF_PROCESSES)


if __name__ == '__main__':
    freeze_support()
    test()

I wrote a simple wrapper for this kind of services which can be scaled. You can find it here.

Tagged under Python, Scalability, Multiprocessing, Socket, Preforking

blog comments powered by Disqus