Concurrent XML Queries with AJAX

This year I’ve been involved with a couple of projects that required aggregating data from several different companies XML feeds. Initially the XML calls were made one after the other in a traditional sequential manner. However it soon became apparent that this was not really feasible as each search took in the region of 10 to 20 seconds to complete and with 4 or 5 companies this was increasing the wait time for the user to over a minute.

Graphical representation of the traditional sequential model.
Ideally we were looking for a solution that would first reduce the wait time, but second reduce the amount of data shown to the user, as it is mostly similar data consolidating and filtering the data before returning it to the user is essential. The maximum time we wanted to wait would be the longest time for one of the threads to return and if that exceeded some sort of hard limit we would want to stop it then. To achieve these goals we would ideally be looking for some sort of “threaded” model similar to the observer/observable model, where each XML query is launched on its own “thread”. This is not exactly rocket science and neither is it brand new, however searching the net for something that talks about multi threaded web pages and you will probably not find what you are looking for. The rest of this post will explain the simple concept of using javascript to try and achieve a performance increase to help improve your site.

Traditional client web technologies do not offer a threaded model. However with the advent of AJAX (A-Synchronous JavaScript and XML) a better user interface has been possible. The reason why AJAX has become so popular is because of its ability to do things “concurrently” or in the background. Just to be clear though, there is no true concurrency, or parallel processing in JavaScript, however we can spawn off background processes that will at least allow us to do more than one task at once and give us the illusion of threaded scripts, it isn’t true parallelism but allows a certain degree of concurrentness. This has allowed simple tasks such as logging a customer in without a server side refresh, to more advanced uses such as Google Maps (notice how when you scan the map the images are downloaded almost seamlessly in the background) or even Yahoo’s new mail client, using AJAX has allowed it to clone a traditional desktop application email client.

Taking the theory of “concurrent” calls using the XMLHttpRequest we can fire off A-Synchronous requests for our XML feeds. As the XMLHttpRequest is asynchronous, as soon as the call is made, control is handled back to the script and it carries on, allowing you to fire off more calls to other functions. As it’s not true concurrentness the order in which they are called may be a factor. It may be worth putting the longest query to return first, this gives it the benefit of a few extra milliseconds before the other calls are made. It’s not going to make much of a difference, but look after the milliseconds and the seconds will look after themselves!

They key to making this work is how you handle the data that is now returned. As each call returns, you can either process the data individually, in this situation the data from one “thread” has no relevance to the others and can be processed as it is received, displayed to the user or just stored in variables for further manipulation. The other method where the data is all relevant, it would make sense if it is stored in some form of intermediary storage, such as a database. With this model the important factor here is to be notified when all the “threads” have finished. To do this we can use settimeout to periodically check some sort of status variable. e.g.

// create a handler for each feed
feed1Handler(set status of feed 1 to finished);

// make the calls
asyncCall(webfeed1, feed1Handler);
asyncCall(webfeed2, feed2Handler);
asyncCall(webfeed3, feed3Handler);
asyncCall(webfeed4, feed4Handler);

// check to see if all feeds have returned
setTimeout(checkstatus,1000);

Graphical representation of the 'threaded' model

The above example works quite simply, using your Ajax class or methods, call each feed in turn, providing a handler that will change the status of the feed state when finished. After the calls are made, as they run in the background the code continues to run and executes the settimeout function. This simply checks the status fields of all the feeds, if they are all finished it carries on, otherwise it recursively calls itself until they are. This method is suitable when you must wait for all feeds to return before progressing, for example if the data must be compared before returning it to the user. Once the calls have returned, another call can be made to retrieve the data from the database and process to display to the user.

There are several draw backs that become apparent when discussing this approach however. As mentioned above, you must wait for all calls to return before you can proceed. Hence at this stage it’s advisable to implement a hard cut off. Building a time limit into the settimeout function, you can check this limit every time it is called, if greater than 10 seconds say, exit the loop and display the results to the user.

This method of implementation lends itself to something known as Closures, which isn’t entirely safe considering IE’s memory leak problems (I’d optimistically like to assume this will not be an issue in IE7). A further discussion of this can be found at http://jibbering.com/faq/faq_notes/closures.html#clMem

The other main caveat with this model is more and more of the processing power needed is required from the client, as JavaScript is a client side language all calls and data processing is done on the client side machine. Most computers nowadays should not have any issues with this, however if the computer has been attacked by spyware we all know how the internet can become painstakingly slow. Consequently this model assumes a certain amount of work must be done by the client.

Another limiting factor is obviously the users download speed. The more feeds or calls implemented into this solution will obviously take more time, but this is something that can be tweaked until an optimum is achieved.

The main reason for writing this is to hopefully provide someone heading down this road a little bit of guidance, and confidence a solution is workable. With new “comparison” sites appearing every day, using the model above should greatly speed up the process for the consumer with no detriment to the performance of the site. The concepts we used are already widely discussed on the net, however I felt that putting it in a practical context made it seem a lot less theoretical.

Acknowledgements

Chris Clarkson : http://www.hwtechie.co.uk/
Dan Robins : http://www.carhiresearch.co.uk/

3 Comments »

  1. HW Techie » Blog Archive » Concurrent XML Queries (JavaScript “Threading”) said,

    November 7, 2006 @ 1:22 pm

    […] This idea isn’t new in other platforms and mediums where it is known as “threading”, and even using JavaScript as we had is fairly well known amongst developers. However, we found no clear cut explanations of how this works, so you can see HW Happy’s more detailed explanation here. […]

  2. HWHappy said,

    November 8, 2006 @ 10:20 am

    Javascript Atomicity

    Another issue to be considered if following the method above is atomicity, that is, trying to prevent 2 or more ”threads” accessing the same data at the exact same time and corrupting it. This occurs because AJAX allows processes to perform in the background, it doesnt take into account that the background processes and the current process could be accessing the same information at the same time.

    This was a concern I had, and one of the factors that led me to decide to store the results in the database(as well as the select where filters and added benefits of using a query to return what I want) as this would prevent the single thread from trying to change the data at the same time. Whats the risk? probably small for one background process, but when you have 5 or 6, the chances are very high.

    Many people have tried to design mutually exlusive code in javascript to allow access to shared variables, and all usually end up with the same result, it can’t be done. Hence if you are thinking about storing the results from different processes in variables you will have to be very careful it’s done without interference from other processes. A good article that is worth a read is : http://www.polyglotinc.com/AJAXscratch/Mutex/mutualExclusion.html
    This explains the authors attempts at achieving basic mutual exclusion.

  3. Ruby on what? « HW Happy Days said,

    March 26, 2007 @ 10:18 am

    […] Ruby was built from the ground up having major influence by a language called SmallTalk. The idea was to consider everything as an object. Many of the features of Ruby have been designed to make it easier for the programmer with the main issues being described here. However, one of the features that stood out for me was Ruby’s ability to offer multithreading regardless of which platform Ruby runs on. In one of my previous posts I talk about using Ajax to give the illusion of multithreading (concurrent xml), its still not true concurrency. […]

RSS feed for comments on this post · TrackBack URI

Leave a Comment