The Internet community is struggling to keep pace with continuing traffic growth on the Internet backbone and at popular Web sites.
One response is to attempt to anticipate demand increases and ratchet up service capacity to meet them. Today, server accelerator caches and clustered Web servers are common, but these solutions do not help to slow the growth of backbone traffic. On the other hand, site replication (e.g., mirroring) is still mostly manual and not easily hidden from users, although some progress has been made toward automating some aspects of it (most recently with geographical push-caching [7], document dissemination [5] and smart clients [12]). While these ``supply-side'' approaches contribute to scalability, we believe that the traditional ``demand-side'' approach -- demand-driven caching initiated by clients at their own expense and for their own benefit -- is still the most powerful and economically correct tool at our disposal.
The key to a successful demand-side approach is to encourage comprehensive use of caching within the organizations that provide Internet service to user communities. We refer to these endpoint organizations as ISPs, although they include academic and other institutions as well as commercial Internet Service Providers. ISPs have the strongest economic incentive to deploy caches: they want to provide faster and better service to their users with less bandwidth. In doing so, the ISP incidentally acts as a good citizen by reducing the load it places on Web servers and the Internet backbone.
Some ISPs already service their Web clients through shared proxy servers [9]. We believe that the most effective caches will be those serving the largest user communities. Our studies of typical Web usage at AT&T indicate that large shared proxy caches can dramatically increase the global hit rate seen by a community of 3806 users, even for objects referenced over a 24-hour period (see Section 3.1).
CRISP (Caching and Replication for Internet Service Performance) is a new Internet caching service designed to serve the needs of ISPs with thousands or tens of thousands of users or more. The basic problem solved by CRISP is that large central proxy caches suffer from a variety of performance, scalability, and organizational problems. CRISP allows ISPs to construct very large distributed caches as collections of proxy servers, without sacrificing the benefits of a shared cache. A CRISP cache can be larger and support more users than a central proxy cache, and it can be expanded incrementally by adding new servers. CRISP proxies can be deployed at different points in the organization's network, close to the clients they serve.
CRISP servers cooperate to share their caches, using a central mapping service with a complete directory of the cache contents of all participating proxies. This obvious strategy is easily overlooked due to concerns that the central mapping service might become a bottleneck or a single point of failure. We argue below that these well-known concerns are not a problem for well-configured CRISP caches, given the properties of Web access. Moreover, the CRISP structure can simplify approaches to other issues raised by widespread deployment of large Internet caches, including consistency management, automated load balancing, alternative replacement strategies [11], and network-aware cache structure. For Internet caches, the simplest strategy for cooperative caching is the most effective as well as the easiest to implement and extend.
Section 2 presents an overview of the CRISP cache and the case for a central mapping service for distributed Internet caches. Section 3 presents some trace studies and early experiments, and Section 4 outlines our conclusions.