Optimizing Page Load Time

Christian Decker wrote this mid-afternoon:
Aaron Hopkins of Google has released an article on Optimizing Page Load Time which came out of his experience optimizing page load times for a high-profile Ajax application. Clearly Google developers are along the best known ones in the Web 2.0 movement and the tips Aaron gives might just squeeze out the last few milliseconds from your application and improve the user experience. He makes a really extensive discussion on both problem sources and their solutions:
  • Turn on HTTP keepalives for external objects. Otherwise you add an extra round-trip for every HTTP request. If you are worried about hitting global server connection limits, set the keepalive timeout to something short, like 5-10 seconds. Also look into serving your static content from a different webserver than your dynamic content. Having thousands of connections open to a stripped down static file webserver can happen in like 10 megs of RAM total, whereas your main webserver might easily eat 10 megs of RAM per connection.

  • Load fewer external objects. Figure out how to globally reference the same one or two javascript files and one or two external stylesheets instead of many; try preprocessing them when you publish them. If your UI uses dozens of tiny GIFs all over the place, consider switching to CSS, which tends to not need so many of these.

  • If your users regularly load a dozen or more uncached or uncacheable objects per page, consider evenly spreading those objects over four hostnames. This usually means your users can have 4x as many outstanding connections to you. Without HTTP pipelining, this results in their average latency dropping to about 1/4 of what it was before.

    When you generate a page, evenly spreading your images over four hostnames is most easily done with a hash function, like MD5. Rather than having all <img> tags load objects from http://static.example.com/, create four hostnames (e.g. static0.example.com, static1.example.com, static2.example.com, static3.example.com) and use two bits from an MD5 of the image path to choose which of the four hosts you reference in the <img> tag. Make sure all pages consistently reference the same hostname for the same image URL, or you’ll end up defeating caching.

    Beware that each additional hostname adds the overhead of an extra DNS lookup and an extra TCP three-way handshake. If your users have pipelining enabled or a given page loads fewer than around a dozen objects, they will see no benefit from the increased concurrency and the site may actually load more slowly. The benefits only become apparent on pages with larger numbers of objects. Be sure to measure the difference seen by your users if you implement this.

  • Possibly the best thing you can do to speed up pages for repeat visitors is to allow static images, stylesheets, and javascript to be unconditionally cached by the browser. This won’t help the first page load for a new user, but can substantially speed up subsequent ones.

    Set an Expires header on everything you can, with a date days or even months into the future. This tells the browser it is okay to not revalidate on every request, which can add latency of at least one round-trip per object per page load for no reason.

    Instead of relying on the browser to revalidate its cache, if you change an object, change its URL. One simple way to do this for static objects if you have staged pushes is to have the push process create a new directory named by the build number, and teach your site to always reference objects out of the current build’s base URL. (Instead of <img src=”http://example.com/logo.gif”> you’d use <img src=”http://example.com/build/1234/logo.gif”>. When you do another build next week, all references change to <img src=”http://example.com/build/1235/logo.gif”>.) This also nicely solves problems with browsers sometimes caching things longer than they should – since the URL changed, they think it is a completely different object.

    If you conditionally gzip HTML, javascript, or CSS, you probably want to add a “Cache-Control: private” if you set an Expires header. This will prevent problems with caching by proxies that won’t understand that your gzipped content can’t be served to everyone. (The Vary header was designed to do this more elegantly, but you can’t use it because of IE brokenness.)

    For anything where you always serve the exact same content when given the same URL (e.g. static images), add “Cache-Control: public” to give proxies explicit permission to cache the result and serve it to different users. If a local cache has the content, it is likely to have much less latency than you; why not let it serve your static objects if it can?

    Avoid the use of query params in image URLs, etc. At least the Squid cache refuses to cache any URL containing a question mark by default. I’ve heard rumors that other things won’t cache those URLs at all, but I don’t have more information.

  • On pages where your users are often sent the exact same content over and over, such as your home page or RSS feeds, implementing conditional GETs can substantially improve response time and save server load and bandwidth in cases where the page hasn’t changed.

    When serving a static files (including HTML) off of disk, most webservers will generate Last-Modified and/or ETag reply headers for you and make use of the corresponding If-Modified-Since and/or If-None-Match mechanisms on requests. But as soon as you add server-side includes, dynamic templating, or have code generating your content as it is served, you are usually on your own to implement these.

    The idea is pretty simple: When you generate a page, you give the browser a little extra information about exactly what was on the page you sent. When the browser asks for the same page again, it gives you this information back. If it matches what you were going to send, you know that the browser already has a copy and send a much smaller 304 (Not Modified) reply instead of the contents of the page again. And if you are clever about what information you include in an ETag, you can usually skip the most expensive database queries that would’ve gone into generating the page.

  • Minimize HTTP request size. Often cookies are set domain-wide, which means they are also unnecessarily sent by the browser with every image request from within that domain. What might’ve been a 400 byte request for an image could easily turn into 1000 bytes or more once you add the cookie headers. If you have a lot of uncached or uncacheable objects per page and big, domain-wide cookies, consider using a separate domain to host static content, and be sure to never set any cookies in it.

  • Minimize HTTP response size by enabling gzip compression for HTML and XML for browsers that support it. For example, the 17k document you are reading takes 90ms of the full downstream bandwidth of a user on 1.5Mbit DSL. Or it will take 37ms when compressed to 6.8k. That’s 53ms off of the full page load time for a simple change. If your HTML is bigger and more redundant, you’ll see an even greater improvement.

    If you are brave, you could also try to figure out which set of browsers will handle compressed Javascript properly. (Hint: IE4 through IE6 asks for its javascript compressed, then breaks badly if you send it that way.) Or look into Javascript obfuscators that strip out whitespace, comments, etc and usually get it down to 1/3 to 1/2 its original size.

  • Consider locating your small objects (or a mirror or cache of them) closer to your users in terms of network latency. For larger sites with a global reach, either use a commercial Content Delivery Network, or add a colo within 50ms of 80% of your users and use one of the many available methods for routing user requests to your colo nearest them.

  • Regularly use your site from a realistic net connection. Convincing the web developers on my project to use a “slow proxy” that simulates bad DSL in New Zealand (768Kbit down, 128Kbit up, 250ms RTT, 1% packet loss) rather than the gig ethernet a few milliseconds from the servers in the U.S. was a huge win. We found and fixed a number of usability and functional problems very quickly.

    To implement the slow proxy, I used the netem and HTB kernel modules available in the Linux 2.6 kernel, both of which are set up with the tc command line tool. These offer the most accurate simulation I could find, but are definitely not for the faint of heart. I’ve not used them, but supposedly Tamper Data for Firefox, Fiddler for Windows, and Charles for OSX can all rate-limit and are probably easier to set up, but they may not simulate latency properly.

  • Use Google’s Load Time Analyzer extension for Firefox from a realistic net connection to see a graphical timeline of what it is doing during a page load. This shows where Firefox has to wait for one HTTP request to complete before starting the next one and how page load time increases with each object loaded. The Tamper Data extension can offer similar data in less easy to interpret form. And the Safari team offers a tip on a hidden feature in their browser that offers some timing data too.

    Or if you are familiar with the HTTP protocol and TCP/IP at the packet level, you can watch what is going on using tcpdump, ngrep, or ethereal. These tools are indispensible for all sorts of network debugging.

  • (Optional) Petition browser vendors to turn on HTTP pipelining by default on new browsers. Doing so will remove some of the need for these tricks and make much of the web feel much faster for the average user. (Firefox has this disabled supposedly because some proxies, some load balancers, and some versions of IIS choke on pipelined requests. But Opera has found sufficient workarounds to enable pipelining by default. Why can’t other browsers do similarly?)

The last tip in my opinion is not a really good one, Browser developers will most likely just ignore you, and you wont be able to change the world. For the full analysis, with graphics and the full discussion please see the entry by Aaron, it’s definitely one of the best readings in the last months, with all the technical details :)
[via Ajaxian]

XML11: An Abstract Windowing Protocol

Christian Decker wrote this in the wee hours:
Just today I stumbled across a nice presentation about XML11, an Ajax Toolkit that bridges Java AWT applications right to the Web using modern Ajax-technologies:
The goal of XML11 is to help programmers write AJAX-applications without requiring any JavaScript knowledge. AJAX (Asynchronous JavaScript and XML) has become very popular for building web applications. AJAX basically proposes to move part of the application to the browser without requiring a JRE-plugin. In order to do so the application needs to be written in JavaScript since JavaScript is the lowest common denominator across different web browsers in terms of prerequisites. Writing portable JavaScript is a daunting and tedious task. XML11 allows you to write your application in Java (not JavaScript!). XML11 then translates your Java application to JavaScript so that it can run inside any browser. Just like a C++-compiler shields the programmer from the assembly language, XML11 shields the web-developer from the intrinsic complexities of writing cross-browser portable JavaScript code. As a consequence, a web-developer never has to write or even look at one line of JavaScript code. (source)
Personally I think the approach to the VM simulation using XMLVM a bit an overkill and it might lead to a lot of overhead it’s actually not that bad, due to moore’s law we probably should bother too much about it ^^
Just take a look at the presentation and you will fall in love with it :)

World War 2: The Game

Christian Decker wrote this late at night:

For some weeks now, some Friends of mine and I are working on a next generation browser game. Why next generation? Well it’s completely Web 2.0 emoticon
Basically we have 4 Components:

  •  The AJAX Frontend, which will be the part the user spends his time on. Its Task is to visually display all the data to the user in a nice way. Currently we’re thinking about using the community edition of BackBase which looks really nice and is easy to use. And now a thing the AJAX-Community has been waiting for: we’ll use the Google Maps API as a the base framework for our maps. That means you move your troops on the real map. We’ll discuss the map later on.
  • The Web-Server, which doesn’t do really much being relieved of the task of handling layout and such stuff it will serve the static pages that are used to bootstrap the game itself, and then it is used to generate the XML-Files containing the game data. We’re currently using Tomcat since it allows us to use the Java objects that are then used in the WorkHorse/Simulator directly (more or less as you’ll see later).
  • The persister, which has the task of managing the whole lot of data that we generate and make sure we don’t loose anything. The persister is based upon Prevayler which allows us to keep a "cloud of object" without the need to save and reload to a flat structure like an SQL-Database everytime we do something, all the references are kept in place and no extra loading is done for queries, while for commands we have a slight overhead to create and serialize Transactions.
  • And last but not least the WorkHorse/Simulator! This is the part where the magic happens emoticon. The WorkHorse is a Program that accesses directly the data in the persister (in fact until we find a better way to communicate persister and workhorse are part of the same Process…) that accesses a Priority Queue containing Events. The Events have a scheduled execution time and are processed one after another. Events may be simple tasks light change the password of a user, or really complex stuff like simulation of whole battles, in which case a separate Thread is launched to do the actual execution of the Event.

We played around with different technologies for the communication, which is absolutely non-trivial. The Persister has to be accessible by both the WorkHorse and the Web-Server, we therefore had to split the whole in 3 parts which communicate with each other:

  • The Web-Server mainly executes queries to the persister, asking for new information which is then passed to the AJAX-Frontend. All user interactions, such as sending a troup from one place to another, is then created as Transactions and passed back to the Persister for this a JMS-Server would be good enough.
  • The WorkHorse has to access in an efficient way the data in the persister, this means we have to be able to access the Object structure directly without serializing stuff to be send over a Connection. We thought about using RMI but let the thought drop soon since it restricts the way we can interact with objects. For now we have put the persister and the WorkHorse together, but we are brainstorming to find a better solution.
Well I think this about enough chatter for today, let’s get back to work emoticon