While we’ve touched upon client side caching in our series on Web performance, we haven’t discussed how client caching has grown more rich and useful over the years. In the initial days of the Web and the HTTP/1.0 protocol, caching was mostly limited to a handful of headers, including Expires, If-Modified-Since, and Pragma: no-cache. Since then, client caching has evolved to embrace greater granularity. Some new technologies even permit the deployment of offline-aware, browser-based applications.
Browser Request Caching
The most common and oldest type of client-side caching on the client is browser request caching. Built into the HTTP protocol standard, browser request caching allows the server to control how often the browser requests new copies of files from the server. We discussed the major aspects of browser request caching in part 1 of our series. Over time, Webmasters have taken to using different headers to improve caching on their site, including:
Pragma: no-cache. This old directive is used mostly by HTTP/1.0 servers, and instructs a client that a specific response’s contents should never be cached. It is used for highly dynamic content that is apt to change from request to request.
Expires. Supported since HTTP/1.0, this header specifies an explicit expiration date for cached content. It can be superseded by the value of the Cache-Control header. For example, if Cache-Control: no-cache is sent in a response, this will take precedence over any value of the Expires header.
If-Modified-Since: Since the HTTP/1.0 protocol, clients have been able to use this header to request that the server only send data if the resource has been changed since the specified date. If there have been no changed, the server returns an HTTP 304 Not Modified response.
Last-Modified. This HTTP/1.0 and 1.1 header designates when the resource was most recently changed. Browsers usually supply this value as the value of the If-Modified-Since header.
Cache-Control. This core directive, introduced in the HTTP/1.1 standard, specifies whether a response’s contents can be cached, and if so, for how long. The header “Cache-Control: no-cache” obsoletes the “Pragma: no-cache” header of the HTTP/1.0 protocol.
ETag. The ETag (“entity tag”) header is a hash value that is specific to a given version of a resource. It can be used by the client in conjunction with the If-Match, If-None-Match, and If-Range headers to decide whether it should generate a new request for the latest version of a resource. The format of entity tags themselves is defined in section 3.11 of RFC2616.
Note that this header and the Last-Modified header are exclusive; servers should set one or the other. The ETag header is new with the HTTP/1.1 protocol standard.
For modern applications, the good folks at Google recommend setting one of either Cache-Control or Expires, and one of either Last-Modified or ETag.
Javascript/AJAX Caching
With the advent of JavaScript and AJAX, more Web applications are downloading data dynamically. JavaScript developers can use the XmlHttpRequest object to fetch data in XML (or other) format, and display it in real time without forcing a refresh of the entire page. This presents opportunities for finer-grained caching based on the nature of the data displayed within the page.
AJAX applications can still use all of the browser request caching mechanisms discussed above. The resource requested by the XmlHttpRequest object will be stored in the browser’s file cache just as other HTTP objects are. A given AJAX application can go further and make refresh calls to the XmlHttpRequest object using programmatic rules. In his article “An AJAX Caching Strategy”, Bruce Perry shows how he uses a custom CacheDecider object that he wrote in JavaScript to determine when to update an AJAX display of oil, gasoline, and propane prices.
HTML5 Caching
Developers creating HTML5 applications can create fully offline-aware applications using the HTML5 ApplicationCache interface. The Application Cache uses a cache manifest file to specify which files in an HTML5 application can be used offline, and which files require a network connection. The manifest may also specify a list of fallback files for network resources when the user is offline. For example, instead of fetching the file /get-data.php when disconnected, the manifest can instruct the browser to display the file /offline.html instead. This manifest is referenced in the HTML element of an HTML5 app:
...
Conclusion
Web performance optimization is very important, and today’s Web application development team can boost site performance and improve its site’s load testing scores by selecting from a variety of client side caching techniques. An effective client side caching strategy can reduce load times by several factors. The most recent innovations in client side caching, such as the HTML5 Application Cache, enable an application to run (though perhaps in a more limited form) even without a network connection present.