Caching proxies  are an essential tool for controlling Web user demands on Internet bandwidth and HTTP server capacity. Simple proxy caches are already highly effective for accesses to cacheable, static documents, which make up 89% of references (as we show in Section 2) in recent traces. The effectiveness of proxy caches will improve further with new techniques for maintaining coherency [11,14] and caching dynamic documents [12,7].
This paper explores distributed Web cache architectures, in which a group of proxies (caching servers) together form a ``collective'' Internet object cache that can serve more users over a wider area than any single proxy. The purpose of a collective Web cache architecture is to deliver the benefits of a shared cache to larger communities of users, while distributing the load among the individual caching servers. Similar caching schemes have been shown to be useful for file systems and/or virtual memory [1,8].
We focus on collective architectures in which each proxy makes independent local decisions that determine how its cache space will be used to serve its primary users. In particular, this paper does not consider the impact of coordinated or ``cooperative'' replacement policies, e.g., global LRU replacement  or forwarding of evicted objects to neighboring caches . Cache architectures that manage resources locally are in the spirit of other successful distributed Internet services (e.g. DNS), and are an appropriate starting point for Web caches that cross departmental or organizational boundaries. To contrast this class of caches with more closely coupled architectures with coordinated replacement, we refer to the former as ``collective caches''.
Several instances of collective cache architectures have been proposed and some are in use, including the Harvest cache , its successor Squid , and the Duke/AT&T CRISP cache . A fundamental question for such architectures is how individual proxies locate objects held by other proxies, in order to service local misses from neighboring caches rather than from home HTTP servers on the Internet. Harvest and Squid use multicast queries to neighbor caches for this purpose. CRISP relies on a central map that tracks location of all cached objects directly.
In this paper, we study the tradeoffs between multicast- and directory-based collective caches and explore the scalability, effectiveness, and cost of a range of schemes for maintaining the directory, using the proxy trace  released by researchers at Digital Equipment Corporation. This trace consists of more than 24 million document accesses by 17,354 users over the 25 days from 8/29/96 through 9/22/96.
This paper makes three contributions: