next up previous
Next: The Case for a Up: Architecture and Rationale Previous: The Costs of a

Handling Failures

Another potential drawback of the CRISP structure is that the mapping server is a central resource whose failure would render all or part of the cooperative cache temporarily unreachable. Also, the map could become out-of-date if a proxy server disconnects.

Because the objects cached by CRISP are read-only, failure of a mapping server does not impair the ability of a CRISP proxy to service local cache misses by fetching objects directly from their home sites until the map is operational again. Though the effectiveness of the cache is temporarily reduced, map failure does not result in denial of service. Similarly, the failure of a proxy is discovered lazily by its peers or the mapping server, and the maps are adjusted, perhaps after a few false hits or misses. For any kind of failure, the maps are restored using a simple recovery protocol, during which the cache operates correctly but with reduced efficiency.

If a mapping server believes that a proxy has failed, it marks all cache directory entries from that cache as unavailable. Clients that were connected to the failed proxy must fail over to a secondary proxy, e.g., by using autoconfiguration scripts [4]. When a proxy rejoins the CRISP cache, it first registers itself with the mapping service to re-enable or restore the maps for cached objects that survived the failure. Similarly, if a mapping server fails and recovers, proxies rebuild the central map by ``rejoining'' the CRISP cache, registering their current cache contents with the recovering mapping server.

Although failures are rare, the costs of recovery are acceptable. Assuming as above that the map is 1% of the size of the cache, the map for a gigabyte proxy can be transmitted in a few seconds over a T3 link. More importantly, a recovering mapping server can begin serving requests immediately, while it is still rebuilding its directories, although it may may return false misses until the maps are fully reconstructed. Effectiveness during recovery could be improved by a variety of techniques, e.g., registering the hottest objects first.


next up previous
Next: The Case for a Up: Architecture and Rationale Previous: The Costs of a

Syam Gadde
Fri Mar 28 10:09:42 EST 1997