Wednesday, September 21, 2005

The Twisted Matrix

It has been a long road from where I started to using Erlang. I certainly don't plan on stopping with Erlang but so far I am quite the fan of it. On this road, which finally lead me here, I have used C, C++, Java, and Python, to name a few. I still have code for a multithreaded http I was writing in C hanging around.
I still use Python quite a bit, and one of the best tools in Python I have used is the Twisted Matrix framework. Prior to discovering Erlang (well more of introduced to it by noss, which i am deeply in debt for), I felt that Python was really *the* language for socket programming. Twisted is what made me think this.
Now, just to be clear, Erlang certainly does not make Python obsolete. Don't confuse enthusiasm for Erlang with feeling every other language is useless. I introduced Python and Twisted into our development cycle at work and I am fairly pleased with the results.
For those that don't know what Twisted is, it is a framework for developing software. What makes Twisted special is they have put a lot of work into making everything work in one thread. Making a server or client that handles multiple network connections is fairly trivial in Twisted.
So what makes Twisted work how it does? A uniform eventloop. In their case it's called a reactor. There are a bunch of different kinds of reactors in Twisted. One needs to be made for whichever kind of event loop you are using. There is a select reactor (default), gtkreactor, gtk2reactor, wxwidgets reactor, and some sort of win32 reactor (I think still under development). The good thing is, most of your code can generally be written without caring about what reactor is being used. That is, atleast, the portions that don't depend on gtk or win32 or wx.
Five years ago, when the Twisted project was first created, the authors feel that Python was the best language for the job. My question is, if Twisted was started today, would Python still be the best choice. I address part of the problem in a previous post. Part of the problem that I point out is, you have to do a lot of work to force everything into the event loop. That makes code reuse more difficult, and near impossible if you need to use a proprietary library that you can't make non-blocking.
An obvious example of this is the adbapi module. adbapi module can use any DB-API 2.0 compliant module. This is to make adbapi useful. There are quite a few DB-API 2.0 compliant modules for Python so the best thing is to make use of them. adbapi's API is not asynchronous or non-blocking though. Doing queries in the Twisted reactor thread means the entire application will stall until the query is finished. To solve this, Twisted does the queries in a thread. Twisted has to fall-back on the very thing it is trying to avoid in order to work. Python is particularly bad at threading too so Twisted tries to keep the number of threads to a reasonable amount. This means the number of queries you can have going concurrently is dependent on this.
A native implementation of the postgresql protocol has been implemented. It is called pgasync. I beleive the author has commented on how much quicker a large number of queries runs in his implementation, although I cannot find the quote. But has the same problem I pointed out before. Because Twisted is all in one thread, everything has to conform to its event loop making the implementation of the pgsql protocol useless. It had to be rewritten to work well in Twisted. This implementation is also fairly useless anywhere except Twisted. I cannot use this in an application that uses an event loop other than Twisted. Wouldn't it be nice if I could use code between projects that arn't dependent on Twisted? In a COL this is a non-issue. Simply run the database code in a process. Processes are part of the language so they are part of any program.
So, to get back to my question, would Python still be the choice of Twisted today, I don't know. I asked the people of Twisted if they still would. For the most part, it seems like they didn't quite know Erlang well enough to say if they would choose it. A few developers said one of the main reasons they chose Python was for its large standard library. In particular the 'os' module was pointed out as an important module that makes Python a good choice. However, after some research, most of the 'os' module is in Erlang's standard library as well. In my opinion, I think the quality of the Erlang standard library and OTP is quite a bit better than Pythons. Python's standard library seems to have suffered from a time period where anyone put anything they wanted into it. As a result, it is large, but a number of modules are fairly poor in quality.
Even if Python has a number of modules in its standard library that are nice for building Twisted, they still need to do a lot of work to force all of this code into the Twisted event loop. If you choose a COL, then you don't have to do any work in forcing code into a single event loop. Instead, you have to do work to build the standard library that Python has. Is that a lot of work? Yes.
One could also argue, "There are a lot more Python projects than Erlang ones so there is a lot of code being written for Python that Twisted can make use of". I think there is little contest in the point that there are more Python projects and programmers. But can Twisted really make use of this code? I think, in general, no, not without work. A lot of interesting things are probably going to involve some blocking somewhere which either needs to be pushed to a thread or rewritten not to block, such as the example I have already given.
The choice to use Python is a difficult one, I think. In my opinion, Python really does not offer much that Erlang doesn't. I think, considering the services Twisted is trying to offer, a COL would be an excellent choice for it. Twisted does a lot of work trying to make asynchronous programming less confusing but writing concurrent programs is more natural. It feels more natural atleast.
I have had a lot of success using Twisted. I don't think I'd write a socket program in Python without it. I also think I would generally not write a socket program in Python these days. Twisted and Erlang seem to be trying to accomplish much the same thing, although going about things in very different ways. In the end I think something like Erlang will win out. A COL seems to be saying, the world is concurrent so lets try to let people write in this natural way. Whereas, Twisted is forcing people to write in a rather artifical and unnatural way. Asynchronous programming works once you grasp it but even still the flow of the programs is rather awkward and hard to grasp I think.
I feel that, today, something like Twisted would be better off written in a COL. Let the work go into making powerful tools instead of massaging prewritten code into the event loop.

Tuesday, September 20, 2005

Java And Threads (Jetty)

noss in #erlang on freenode recently brought to my attention: Jetty Continuations

This is an interesting blog entry. The basic idea is, instead of using 1 thread per connection, since connections can last awhile, they use 1 thread per request that a connection has. The hope being, a connection will idle most of the time and only send requests once in awhile. The problem that they ran into is, a piece of software is using a request timeout to poll for data. So requests are now sticking around for a long time, so they have all these active threads that they don't want. So to deal with this, they use a concept of continuations so the thread can die but the request still hang around, and then once it's ready to be processed a thread is created again and the request is handled. So having all these requests hanging around that arn't doing anything is no longer a problem.
Well, this begs the question, why are you using a dynamic number of threads in the first place if you are going to have to limit how many you can even make. If the problem, in the first place, is they have too many threads running, then their solution works only for idle threads doesn't it? Being forced to push some of the requests to a continuation means they have applied some artificial limit to the number of threads which can be run. What happens then, when the number of valid active requests exceeds this limit? What then? Push active requests to a continuation and get to then when you have time? Simply don't let the new requests get handled? If they want to to use threads to solve their problem then putting a limit on them seems to make the choice of threads not a good one. Too poorly paraphrase Joe Armstrong, are they also going to put a limit on the number of objects they can use? If threads are integral to solving your problem, then it seems as though you are limiting how well you can solve the problem.

This also got me thinking about other issues involving threading in non-concurrent orientated languages. Using a COL (Concurrent Orientated Language) all the time would be nice (and I hope that is what the future holds for us). But today, I don't think it is always practical. We can't use Erlang or Mozart or Concurrent ML for every problem due to various limiting factors. But on the same token, using threads in a non-COL sometimes makes the solution to a problem a bit easier to work with. At the very least, making use of multiple processors sounds like a decent argument. But writing code in, say, java, as if it was Erlang does not work out. I think the best one can hope to do is a static number of threads. Spawning and destroying threads dynamically in a non-COL can be fairly expensive in the long run and you have to avoid situations where you start up too many threads. I think having a static number of threads i a pool or with each doing a specific task is somewhat the "best of both worlds". You get your concurrency and you, hopefully, avoid situations like Jetty is running into. As far as communication between the threads is concerned, I think message passing is the best one can hope for. The main reason I think one should use message passing in these non-COL's is, it forces all of the synchornization to happen in one localized place. You can, hopefully, avoid deadlocks this way. And if there is an error in your synchornization, you can fix it in one spot and it is fixed everywhere. As opposed to having things synchornized all over the code, god knows where you may have made an error.

I think this post most likely opened up a can of worms. I lightly touched on a lot of issues and most likely did not explain things in full. Perhaps this will raise some interesting questions.

Sunday, September 11, 2005

Initial impressions of yaws

If any of this post is incorrect, please feel free to post a comment and correct me.

I have recently installed yaws and I am looking at it to create a small website. It will be fairly simple. My previous web experience has been using Nevow with Twisted. Nevow allows you to create xml files and then the template engine extracts information from the xml file which is translated to function calls on an object representing the page. I like this solution. It keeps the code seperate from presentation which many people seem to suggest is a good idea. I see it as a positive purely based on working with webdesigners. I dislike doing web frontends so allowing someone to make that and giving them the bare minimum needed to let them call functions in my code seems rather nice. From my experiences with yaws, it seems to use a style that reminds me more of PHP. The major difference being that you can use Erlang terms to be transformed to HTML. Nevow has something similar to this called stan. Even with this though, the .yaws file is still not purely Erlang code, but requires escaping html with a tag. This style of making web applications seems fairly error prone and difficult to work with in a team.
I do think Erlang would make a fairly good web development language. The applications could easily be distributed over several nodes, not requiring one to use some sort of load balancer. A web app could scale quite well. But what can be done about the interface for programming? Some sort of XML system might work but I'm not convinced of that. There must be some more intuitive means of mixing code and presentation. I wouldn't be sirprised if someone has attempted to tackle this already, maybe something exists on google about it.

Wednesday, September 7, 2005

Parallel Project

I have a study in Parallel Programming this semester. However, the teacher is fairly lenient as far as projects are concerned. I am looking for some interesting project relating to Erlang. The leading idea right now is to create some sort of distributed computing framework. The idea would be using Erlang to communicate between nodes and then have each node communicate with a local process which does the actual computing (assuming Erlang could not handle the calculations). The major implementation would be a BLAST algorithm since that seems fairly easy to distribute and could possibly have some use at my college. Other ideas include some sort of fault tolerence framework or application which handles nodes going down well. That would seem fairly easy in Erlang so that probably would not provide a semester worth of interesting work. Another possibility is a peer-to-peer chat application where each user contributes to the number of possible people to host.
Hrmm, hopefully I'll get a better idea soon.

Monday, September 5, 2005

Bookmarks

My archive of bookmarks is slowly growing. It contains a Erlang section, a long with a number of other languages as well as lots of other URL's.

http://ortdotlove.net/bookmarks.html