Saturday, August 20, 2005

Where concurrency shines

I think that it can almost be stated as a fact that concurrency in languages that weren't designed with concurrency in mind tends to be poor. The languages I have in mind here are Python, C, C++ and similar. I have come across a few people who disagree with this statement however after questioning I have found that they have A) Not done much of anything complex with threads B) Never used a concurrently oriented language. Needless to say, I don't take their opinion very seriously. Now, obviously, you can take the time to write an application that uses threads and works well. But, with enough time, you can do just about anything, and in the time it takes to make that application one can deffinatly develope an equivalent program faster in concurrent orientated (CO) language.
Because writting decent threaded applications in other languages is so difficult there are a number of frameworks available that try to make it easier to write applications in a single thread. One of the reasons I think these frameworks will fail in comparison to a language such as Erlang for most developers is the amount of work it takes to integrate other libraries into it. For anyone that has used a framework such as Twisted, they have probably run into a situation where they have a third party library they want to use however the problem is, it blocks. For an asynchornous framework this is murder. So one has two choices. Either to run the third party library in its own thread. Obviously this is generally not what we want to do since the whole point of using the framework is to avoid threads. The other solution is to rewrite the library to integrate it into the frameworks event loop. Depending on the situation, this might be acceptable but it sure is a pain to have to do extra work to use this library. Now, a language which supports concurrency does not have this problem so much. The first solution, of running the library in a thread, works perfectly fine. You probably have 300 or 400 threads going already so it is no big deal. This makes it easy to distribute libraries for the particular language.
For a simple example. Imagine you make a really great http client in python. You can't really make a general http client because you need to take into account the various networking frameworks they might be using. If they are using twisted then it needs to integrate into the twisted event loop to be really useful. If they are using asyncore it needs to integrate into the asyncore event loop, and so on and so forth. Now take the same situation in erlang. Just throw the client in a process and you are all set. You don't have to rewrite anything. The obvious benefit of this is increased development speed.

I think it seems pretty clear that our processors and applications are moving towards more concurrent environments. Languages that can take advantage of this environment are most likely going to be the ones that make it. However I'm no fortune teller, so there is a good chance I could be wrong.

I think I tried to put too much into this one post so it might not make sense. Hopefully I got my ideas across.

Friday, August 19, 2005

Developing on the go

One benefit to using erlang compared to more traditional languages is the development cycle. Generally you write code, compile, debug, write, compile, debug, until you have something you want to use. In between compile and debug you run the program. In traditional languages you shut down the application then restart it to debug again. In erlang we can skip the 'restart' and simply load the new code in. This is assuming no bugs in the code didn't cause the entire application to crash horribly.
So basically, what is going on is, if you have a logic problem or what not in your application that you wish to fix. Outside of your running application, you edit the appropriate .erl files and recompile. For example, your application consists of a.erl and b.erl. In b.erl you have a function called mogwai.
So in our example. a.erl calls b:mogwai, and b:mogwai has some sort of error in it that does not cause the application to crash but you want to fix, regardless. You fix b:mogwai's error and you want to load the new codebase in your running application. Recompile b.erl then in the shell to your application you simple do:

nl(b).

This loads the new version of b into your application and now calls to b:mogwai will use the current version of the function. For certain applications this certinaly provides a more elegant development cycle. I think this style also alters the structure of ones code. For instance, if one writes an application knowing that errors in the code can be fixed on the fly, no longer do they necesarly have to exit nicely. Rather, the application can provide a means of restarting the portions that have crashed. The application can continue working without the crashed process/code or stall until the required portion can be brought back. I'm under the impression this is a feature supervision trees offer you. A supervisor simply restarts a process if it crashes, and reports it. By restarting, a new codebase can be loaded that fixes the error.

One final note on how code replacement works. Erlang only attempts to load a new module if the call is in the form of: module:function. For instance, if you have a process in a loop something like:


loop() ->
receive
Something ->
loop()
end.

If you reload the module that this loop is defined in, the call the 'loop()' will not load the new codebase. The common idiom is something a long the lines of:

loop() ->
receive
restart ->
?MODULE:loop();
Something ->
loop()
end.

This will load the new codebase. On a final note, any function called as module:function must be exported.
Erlang certinaly provides some interesting features. Certinaly somethign like code replacement is possible in other languages, such as python (which provides a reload function to reload a module) I think erlang provides a more elegant solution. For instance I don't think Python provides a means of a module to reload itself, especially in the middle of an event loop.
However this is not a contest between code replacement in various languages. I have modified my irc bot to allow code replacement more seemlessly, however I have not allowed a decent means of bringing an irc bot back if there is an error which causes a crash. I'll have the new code online later if anyone is interested.

Tuesday, August 16, 2005

Erlang and strings

A lot of people complain about strings in erlang. I am one of them. Right now I am under the impression a string type needs to be added to erlang. Joe Armstrong thinks that instead of a string we simply need a character type. My complaint with that is there is no decent container for the character type. In erlangs we have tuples, binaries, and lists. Tuples are meant to store a fixed number of objects and do not have operations on them to perform operations such as iterate through them. The element/2 function allows you to access an index of a tuple but that isnt' very useful for iterating through. Binaries might be nice, but there are no functions to nicely deal with binaries as strings. Lists are what we currently have and I am not pleased. Using a linked list to store a string certinaly seems unreasonable in any other language. You have to store a character value and a link to the next node. This is a lot of memory for a string. The other problem with these containers is none of them allow O(1) access to indecies as far as I know. Am I wrong here? I suppose the question then is, is that a problem? I am under the impression that one generally wants O(1) access. For instance, if you have an index in the string and need to access it and surrounding indecies repeatedly.

http://schemecookbook.org/view/Erlang/StringBasics has a quote supposedly from the sendmail people:
But Erlang's treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

This is in reference to their load balancing software. Is this true? I am inclined to think that it certinaly uses a lot of memory used up, but most text-processing is going to require touching every character in a string anyways won't it? What exactly is performance intensive text-processing? Does anyone have any ideas? If one is going to be iterating through the string they can use a binary to store the byte values. The problem with this is that none of the string functions work on binaries. I'm under the impression a string container type will solve some problems. Using integers as the character values seems like a fine idea to me, as people like to point out it makes dealing with unicode slightly easier.

What problems would having a character type solve? Maybe in the morning I'll be able to think of something.

Identd as done as I care

The identd is as done as I'm interested in right now. It works atleast. Instead of worrying about how to figure out who belongs to an actual port I just return a random ident value for any input. However, the function it uses to make this is a variable you pass to the server so it is not very difficult to give it a different function. To run it, simply compile all the files then do
identd:start(SomePort, {random_identd, random}).

There is no clean shutdown, just kill your shell.

You can download it here.

Monday, August 15, 2005

Identd

I think the basic design for my identd is going to be:
  1. Start a process which takes a port and a function.
  2. On a new connection, start a process to handle the connection with the function given.
  3. Go back to waiting.
  4. The new process will read in the port numbers, parse them out, then call the function given with the port information
  5. The function does what it needs to and returns the information
  6. The process responds on the socket with the correct information.
I think this would work good with gen_server behavior, unforunatly I don't quite understand superivsion tree's that well. I will write it the ad-hoc way first then once I figure out the correct method rewrite it that way.
This framework shouldn't be too hard, making a correct identd function which actually gets the user from the ports might be. Basic one will just return a random identd.

IRC Bot

Here is the code for the latest revision of my IRC Bot. It isn't made to be use friendly right now so don't expect it to work right off the bat.

I think to get it started you'll need to do the following:
  1. Compile everything, be sure to add the inc directory to your include path.
  2. Start it with a node name and set a mnesia directory.
  3. Call p1_db:start(). Then p1_db:create_tables().
  4. Then call irc_bot:add_bot (Maybe it's addbot?). It takes a tuple, see code to figure it out.
  5. p1_main:start()
  6. When you are finished: bot_server ! stop. The beauty of erlang lets you do this from another node on another machine too, if you so desire.
The code can be found here.

The only other erlang irc bot I've found is manderlbot which can be found on freshmeat. I think it is a bit better designed than mine and has the intention of other people using it in mind where as mine is more of me just playing around. If there is an interest in it perhaps I will do more with it.

Initial Post

I will be posting my various erlang accomplishments here. My intended goal is to write a smtpd in Erlang. I will have various posts on that in the future. So far I would not describe myself as a very good programmer but I am working on that. I have written portions of an erlang irc bot. It does not do very much, although it has the ability to relay chats between channels on multiple networks and supports some concept of factoids. My next mini project is a plugable identd. This will be fairly small and simply provide the ability to give it a function that creates a response.
Feel free to post responses to my post, I don't mind constructive criticism. The hope is to make my projects better.