Next: References
Up: Directory Structures for Scalable
Previous: 4 Alternatives for Managing
Subsections
Caching Web proxies are an effective and low-cost tool for
reducing bandwidth demands and document fetch latencies in
the Internet. In a 25-day trace from Digital Equipment Corporation
Web proxies, almost 70% of recorded references are to static objects
shared with other users. The percentage of references to shared
documents grows with larger numbers of users. This motivates the
design of large-scale shared caches
to exploit sharing of Internet documents among members
of large communities.
Large-scale caches are most effectively built as distributed systems.
Besides being more scalable, individual caching servers participating
in a collective cache can be placed close to their users, improving
access times and reducing the network traffic for document accesses
that hit in the cache.
This paper addresses a key issue for distributed Internet caches:
how should individual caching servers share directory information to
capture the largest percentage of shared document references as hits
at the lowest cost? We presented trace analyses and results from
trace-driven execution of collective cache prototypes to evaluate a
range of directory alternatives on the basis of cost, hit ratios, hit
penalties, miss penalties, and network traffic.
While our analysis is not yet complete, we present sufficient evidence
to conclude the following:
- Some form of querying mechanism (e.g. multicast or shared
directory) is needed to
yield hits on the last 20% or so of references in the 25-day trace,
for distributed caches with more than 16 caching servers.
- Multicasted cache probes as in Harvest can yield
hits on all of these references in shallow configurations.
For deeper configurations, Harvest hierarchies yield lower hit
ratios unless parents are large enough to hold copies of the
majority of documents cached by their children.
- By acting on the fastest positive response to a probe,
shallow Harvest configurations achieve better hit latency than
CRISP. On the other hand, Harvest's miss latency suffers
and limits the scalability of shallow Harvest configurations.
- By maintaining a separate global map, CRISP caches yield
the same hit ratio as shallow Harvest configurations, without their
scalability limitation.
CRISP also generates less traffic due to probes and, in the deep
Harvest case, object fetches.
- Partitioned maps can improve CRISP scalability further, but
result in higher query latencies. Replicated maps are scalable and
produce the lowest query latencies, but at higher cost.
- By restricting map replicas to hold only the entries for shared
documents, ``lazy CRISP'' caches can deliver the benefits of replicated maps
while reducing the overheads to store and update the map replicas by
about 70%, with only a 6-point drop in the achievable hit ratio for the
25-day trace. Almost all of these hits can be recovered by a simple
optimization where proxies probe the global directory on a miss in
their local submap replicas, resulting in only a 0.31-point drop in hit ratio.
We extend our sincere appreciation to Jeff Mogul, Tom Kroeger, Carlos
Maltzahn, and Digital Equipment Corporation for providing sanitized
access traces from DEC proxies. We also thank Pei Cao, Fred Douglis,
Mike Feeley, and David Marwood for their valuable insights and
feedback on this paper. Thanks also go to Peter Danzig for providing
details on commercial Harvest and NetCache.
Next: References
Up: Directory Structures for Scalable
Previous: 4 Alternatives for Managing
Syam Gadde
11/14/1997