Wednesday, April 2, 2008

Scaling webapps

DISCLAIMER: I am not an expert in web apps and simply have been doing some research on them. Given that, my claims and concerns may be completely unfounded and simply a result of my ignorance of web app design, if so, please let me know.

I have been doing a little bit of research on webapps and I'm not sure how I feel about the standard model. The standard model, as I understand it, works by every incoming http request results in some number of DB requests where the DB is the holder of all state information. The web app itself, whether it be in PHP, or Rails, etc is stateless. I certainly don't doubt that for a large majority of applications this works fine. Take something like the yellow pages which is a Rails application according to this. The standard model seems quite reasonable for this. You have a page that is fairly static, people are mostly doing requests for data which requires walking through some large database and the results are being displayed and not much is happening again until the user initiates another page request.

Web apps are changing though. Today, we expect our web apps to look and feel like native applications, which means snappy responses and updates of it even without the user explicitly reloading the page. We have things like AJAX and Comet-style serverpush. In the end, this means more requests are going to the http server even when the user is not doing anything. The Comet style is pretty neat but from what I can see, polling with AJAX is the most widely used way. My concern is how well the standard model is going to scale as web apps are required to act more and more like native applications. The reason for my concern is because this method is simply polling the DB for state changes on every request and in every other situation polling quite clearly has not scaled. This is the exact reason Comet exists.

For the sake of simplicity, let's imagine that the web app is AJAX with polling. Some portion of the page is going to be asking for updates every 5 seconds. The standard model would have us querying the DB every 5 seconds for that request to see if the state has changed. If you have a couple thousand people on this page that is a lot of work. If the component of the page is changing often for a majority of the people then it's not too big of a deal but if it's not changing often for a majority we are doing a lot of worthless DB calls. I'm sure we would design our database so that this call is hopefully very lightweight but how well does that really scale in the end?

Now, the reason we want our wep app (such as something in Rails) to be stateless is because for each request we can be pushed around to a separate VM. So if we hold onto session state information there is a good chance we might not even get a chance to use it on the next request. On top of that, if you are load balancing between several hosts, you may not even be in a good place to share state information between VMs.

I'm sure almost all of this is not news to anyone who has made it this far in the post. What can be done to make a webapp scale better? I'm not sure but here is my suggestion and hopefully someone with more experience than myself can come back and say if it's a horrible idea or not. Let's say I want to make a web app that will be handling something like instant messaging such as google talk or meebo (which are both using Comet).

For starters, I want this to be fairly real time, when I send a message to someone, I want it to get to them ASAP. Secondly, there will be a lot of interaction between users. Clearly, page-by-page viewing is not going to work here. People can't be refreshing their page every few seconds to see if there is an update. AJAX with polling or Comet are clearly your two choices currently. How should this look after the HTTP request comes in though?

Here is how I am suggesting it should:
Each request should have some sort of session ID. Each session ID will be associated with a process. By process here, it could be an Erlang process or some OS process written in some other language, whatever. The point here is, for a particular session ID, its request will get forwarded to the same process every time for the life time of its session. This way, the process can store all the state information. We don't have to do worry about sharing state information in something like memcache. The process would have some sort of timeout value so they die off eventually. At worst case, we are back to the standard model where if a user does multiple requests that happen to have a pause between them longer than the timeout we now have to initiate a new process and it has to do the DB calls to initiate itself. In the best case though, we are consistently getting sent to the same process which is holding onto our data. The upside to this method is the process can also do things specific for that user such as opening up other connections. For instance, in the instant messaging example, a user logs in, they get a session ID and a process created for them somewhere that will be mapped to this session ID. The process opens up a connection to an event server for that user so it can listen for IMs and push IM's out. We now have an application that is quite event based on all sides of it. We won't be hitting the DB too often. Given that we don't really care where this process lives, we can also scale it out to multiple machines and not have to worry about replicating the same data over many machines because we don't know where the users request will go to.

Downsides?
Certainly. Clearly if we have 2000 people using our IM application at once, we need to have 2000 processes a live. If we wrote this in Erlang where each process for a session ID maps to an Erlang process (sorry about the terminology here all)? That's childsplay. We could host this application on one machine! But not everyone wants to write things in Erlang, so what about them? If I were in this situation, I would probably have some machines each running some amount of applications that will be doing some I/O multiplexing to handle this. So you would have some amount of OS processes and each one can handle some amount of session ID's. Pretty standard for something that isn't Erlang but needs to handle a bunch of things at once. If you are into Python, Twisted comes to mind. I'm sure other languages have their own way of doing this.

Another edge case here, and I think this is probably not too hard to deal with, is what if an event comes in (such as an IM) and the process times out due to lack of activity? You could have each event have a timeout and if it is not ACK'd in that time it gets saved to a DB and the next process that is created for that user picks it up as part of its initiation.


To reiterate, I do not have much experience in webapps, I simply have been doing some research and this is the impression that I have gotten. Are my concerns valid? Is the system I described what people are already going to or is it broken? How would someone write Meebo or google talk? Let me know.

Wednesday, March 19, 2008

Back in the saddle

So I have been working on Wall Street for the last year+ and last week decided to quit my job. And non-too-soon, as my job will most likely no longer exist in a few months if you have been following the news over the past few weeks.

I am going back to school, moving out of NYC and off to Maryland so over this summer I should have a lot of free time and I am working on possibly being involved in sort of a freelance project in order to make money (details are still pending). Anyways, the current design of the project has a typical web-app interface, then a manager layer that encapsulates the database. The manager layer handles caching data and moving it back to the database when needed, and event dispatching, blah blah blah.

Immediately I think this is the perfect use case for Erlang. We want it to be fault tolerant of course (the downside of a web-app is when things go wrong it affects everyone) and be able to handle a lot of data moving back and forth (although I'm unsure of the specifics since this is so early). Basically this sounds like a standard Erlang app with mnesia as the cache most likely, spanning a few nodes and moving data back to a DB in a write-behind method.

Some people have a bit of a concern about Erlang, how will we find developers and so on, which are perfectly valid. My response to that is:
Erlang is such a simple language it does not take much time to learn, and while OTP is not as simple, it does not take much time to learn either. In the end, one needs to be less of an expert in Erlang to get an equal or better application (in terms of Erlang's strengths) than they would have to be in another language such as Java.

One possible alternative language being considered is Java because of JBoss. I haven't looked into JBoss in-depth yet, but at a quick glance it looks like it has some really nice and really mature features. The JMS implementation sounds pretty solid and the clustering. Everyone knows Java, or at least puts it on their CV, but this sounds misleading. How good does one need to be in Java in order to not make a mess of an application written using JBoss compared to Erlang? My opinion is that the way one learns Erlang is fairly similar to how they would write production software with Erlang, but perhaps not the same for Java. The features we use in our 'Hello World' are the same that we use in a production environment, this is not true of Java in my experience.

We are still looking into things but I'm currently hoping for Erlang.

Friday, January 25, 2008

Erlang Job Advert

Well, it's been over a year, sorry. Today I received an email with an Erlang job offer for a company based in Boston. I don't know anything about the company at all, perhaps it is junk, but the project seems like something Erlang is good at. It's a telecommuting job, so that might raise a red flag with some people. If you are interested I'll put you in touch with the recruiter or whatever he is, email me at orbitz AT ortdotlove.net (that is really ortdotlove.net, not fancy spelling for ort.love.net).

Here are the job details:
Job Description:

Our client, who is based in Boston, is seeking an adept Erlang developer to help build a next generation metrics and monitoring system on the GNU/Linux platform. This system will watch and report on hundreds and thousands of systems, ranging from network devices to servers to software applications.



Required Skills:

· Must have experience with the Erlang programming language

· Must have expert level network programming experience, preferably in Erlang, C++ and/or C

· Strong background in developing on GNU/Linux for mission-critical production deployments

· Strong understanding of MySQL 4.1-5.x, including database design patterns

· Strong experience programming network servers/clients, including knowledge of fundamental protocols such as TCP, UDP, IPV4/6, and SSL/TLS