In the previous installments of our Web performance series, we’ve examined how developers can employ Web server caching and application caching to speed up their Web applications. In this installment, we’ll see how caching entire data sets can increase platform efficiency.
What is Data Caching?
Database caching is a species of application caching that caches the result of one or more database queries in the application server’s memory. For our purposes, we use application caching to refer to caching any component in an application server’s memory, and data caching to refer to caching a collection of data for future querying and sorting.
There are two main approaches to data caching:
- Data set caching. Data returned from a database is stored in a data set, an in-memory representation of data that mimics the column/row structure of a relational database.
- In-memory databases. An in-memory database is a relational database that operates completely in a server’s RAM, as opposed to storing data to a hard drive. In-memory databases can be used for fast access to data previously retrieved from a traditional, disk-based RDBMS.
Let’s dig deeper into each of these.
Data Set Caching
One of the best architectures for data set caching is found in Microsoft’s .NET platform. The DataSet class in .NET implements a disconnected data set. Once a DataSet is filled with the data returned from a SQL query, the connection to the underlying database is broken. This makes DataSets ideal for caching in the Application object.
David Burgett discusses using cached DataSets in detail in an article in MSDN Magazine. In Burgett’s example, a developer of a credit card authorization Web service for businesses retrieves the list of authorized customers into a DataSet and caches the DataSet in memory for future reference.
What happens when a customer changes part of their customer record? In this case, the developer can maximize the benefits of caching by changing the data directly in the cached DataSet, and then push the change down to the underlying database. Instead of invalidating the entire DataSet and re-querying the database for all customer records (a costly operation for performance), the developer need only commit changes to a single customer record. In his article, Burgett further demonstrates how the application can use Web Service calls to keep all of the DataSet caches in a server farm synchronized.
While .NET contains the best support for data set caching, it can be utilized (or at least emulated) in other platforms. PHP programmers can retrieve the results of their database queries as associative arrays, which can be cached in shared memory using the Alternative PHP Cache (PHP). Java contains more direct support for data caching using Service Data Objects (SDO).
In-Memory Databases and Data Caches
Some database vendors support an even more sophisticated version of caching via in-memory databases. The Oracle In-Memory Database Cache supports caching critical subsets of Oracle relational data tables in the application tier. This cache relies upon Oracle’s TimesTen In-Memory Database, which maintains its databases as “embedded databases” entirely within system memory. TimesTen uses checkpointing to persist in-memory data to disk for recoverability, and transactional replication to keep data consistent across multiple servers. Server replication and checkpointing both insure against data loss in the case of server failure.
Other in-memory database systems include SQLite and IBM’s solidDB. MySQL Cluster may also be configured as an in-memory database.
Some developers argue that embedded databases are the wave of the future. As writer Greg Linden points out, most highly performant sites like Facebook already cache so much data that up to 75% of their total data retrieval is from RAM anyway!
Temporary Tables and Database Connections
In the past, it was common for client/server developers to utilize temporary tables stored in a database server’s memory. The temporary table could hold the output of queries that were the result of complex JOIN operations. However, most temporary tables are associated with a specific database connection. Since nearly all Web-based applications use database connection pooling, using temporary tables isn’t feasible.
Some database systems support memory-based database tables. For instance, MySQL can create temporary tables with the MEMORY command. However, these tables are specific to a single instance of the database server and don’t support row-level locking.
Conclusion
Web performance optimization is an engineering discipline which can have a tremendous impact on revenue and customer satisfaction. Caching is probably the biggest weapon in the arsenal of performance engineers, and it is usually relatively easy to utilize. Moving data to and from a database is typically one of the bottlenecks of a web application’s performance because it involves physical devices actuating mechanical arms that are exponentially slower than electricity flowing through memory chips. Thus, putting data into cache memory has enormous performance gains.
In general, developers will achieve greater speed boosts by storing data sets in the application tier, and using the techniques described above to keep these cached data sets synchronized across application and database servers.