A long time user of Python and Twisted, I have used Python and Twisted to developed some production systems that processed 60 – 80 million messages a day. It wasn’t pretty but the code base was fairly small and it worked, but not simply. Our deployment machines were Sun T1000 Niagara based machines. We were doing parallel processing of messages and collecting the results and returning them to the master process. This was mostly just massive string processing and ended up being very CPU intensive. We did the multi-threading in C extensions that were the interfaces to the third party processing libraries we were using for message classification, this was as much to avoid GIL issues as it was because we only had C interfaces to the libraries. So we basically created a Facade that delegated work in parallel and aggregated the results to return back to the client process. The only thing we were doing were reading from the socket, queuing up the message bodies in a thread safe queue in a C extension that then sent the message strings to the classifiers and then we asynchronously collected the results and sent them back to the client of our Twisted server.

We immediately ran into scaling issues with Twisted. When we completely saturated a single CPU core on some Quad Core Intel Xenon development boxes, we knew the lower powered Niagara boxes were not going to work as well, the only way to utilize all the CPU power was to run multiple instances of our Twisted server. The T1000/Niagara boxes have 8 cores with 4 thread contexts each for a total of 32 threads, but each core works out to be about the same power as a 200mhz Pentium Pro. The only thing we were able to do was to run 20+ separate instances of our Twisted server on each T1000 to saturate the network utilization in an attempt to get reasonable throughput. Running 20 copies is a operations and monitoring nightmare. We actually created a CPU bound Twisted application, something that was pretty much unheard of on the Twisted mailing list.

I have since re-implemented the same protocol and Twisted implementation in Erlang/OTP with surprising results. The Erlang/OTP version runs much faster, and it is not even optimized yet. I could move from using lists of characters to binaries and it would use much less RAM and should be much faster since I don’t actually do anything with the data other than ship it off to sub-processes. This is just a proof of concept that I could run my old stress tests against. It is about 1/5 the amount of code that the Twisted version is and most importantly I only need to run a single copy to completely utilized a modern multi-core machine. Which is my major sticking point right now, I have big honking multi-core hardware and I need software tools to match them.

Here is an example of a port of the LineReceiver functionality of Twisted.

it is amazing how much “batteries included” stuff Erlang has in it.
you just need to keep the mailing list address handy to get help. :-)


to compile at the command line enter.

c(linereceiver).

then at the command line enter

linereceiver:start

to start the server. Then connect with telnet to the server running the file on port 8080.
This should get you started on your way to writing highly scalable line oriented protocol servers.