Question 1: Sharing data There seemed to be some confusion about the meaning of "atomic". None of these systems are atomic since they have no read/write grouping mechanisms (e.g., transactions). NFS: To answer the NFS part of the question well you needed some additional notes or qualifiers. The answer depends on the interpretation of "writer" in the question. The answers I was looking for consider the whole system, i.e., the "writer" is the application process, and client caching affects the consistency model. But I did not take off too many points if you answered for the server interface and if you were clear about that, and about which version of NFS you were considering. - Confirmable, but file read/write operations are confirmable only by explicit request (e.g., by fsync or close) . - In general, NFS is not safe and therefore not regular even for confirmed writes. With leased locks, NFS is safe and regular but only for confirmed writes and only at a block grain. Porcupine: not confirmable, safe, regular, not atomic. The issue here is that Porcupine propagates writes to replicas with an eventual consistency protocol. Until the write is fully propagated a read can return stale data. All writes eventually complete (fully propagated): the problem here is that the application cannot tell when this happened (not confirmable). FAB: confirmable, safe, regular, not atomic. SSM: confirmable, not safe, not regular, not atomic. Qualifier: SSM is safe and regular with respect to writes by the reader. This is sometimes called the "read your writes" property. In general, SSM assumes that safety and regularity are not important for shared data, because each data object is used by only a single client, i.e., data is not shared. Question 2: Leases Longer leases -> lower locking and maintenance overhead, but there will be access delays if an object is concurrently shared. Also, an object lock will become unavailable for a longer period after a failure of either the server or a client. Coarser scope -> lower locking and maintenance overhead, but inhibits concurrency of access to unrelated objects. Partition -> lock holder and server preserve mutual exclusion, but it may not be possible to complete all writes by the lock holder. For concurrently shared objects it will be necessary to discard writes that were accepted but were incomplete at the time the partition occurred. If the object is not shared, it maybe be possible to complete those writes when the partition heals. Question 3: Stuff happens The problems with NFS are covered by the previous question: writes are confirmable only by explicit request, NFS is not safe or regular, unconfirmed writes may be lost, and there is no atomicity. Problems: no good for concurrent sharing and complex data, most implementations have a single point of failure at the server, and NFS may discard writes that were accepted but not completed/confirmed. I was looking for an answer that said NFS applications are not affected by a server restart, but that application processes essentially must be restarted after a client restart. I.e., you cannot use NFS to preserve continuity of processes that were writing at the time of a client failure, because some arbitrary subset of writes may have been lost. People live with this for user applications (e.g., editors). The user can determine the state of their own files, and most operating systems make no promises about file writes accepted before a crash, even to local storage. For applications that need sharing and/or atomic/durable writes, we need something more powerful, like a database. (Which of course may be built over NFS.) Question 4: Sharing resources Goals: - Assured application throughput levels, which may be bounded to avoid overload. - Application-level performance targets at assured throughput or below. These are typically response time targets, and they should be for percentile response time rather than a mean or median, due to the long tail on response times. - Efficiency of resource usage. Relative merits and limitations: The key idea behind both systems is dynamic provisioning. Dynamic provisioning can improve resource efficiency and it also has an ability to isolate one service from the impact of unexpected load spikes in another. One key difference is that SoftUDC defines no policy for dynamic provisioning, although it has powerful/comprehensive mechanisms. Quorum defines one possible policy with a limited set of mechanisms. SoftUDC and Hippodrome provide similar mechanisms: assignment of application components or storage LUNs to virtual containers bound to specific resources. Hippodrome defines a policy/algorithm to find a good assignment for a set of static workloads, but it does not address policy for the dynamic provisioning case. A complete system could combine the Quorum and SoftUDC approaches: use dynamic traffic shaping and routing of requests among the virtual servers, together with dynamic provisioning for the servers. [One example of how to do this is our Muse system from SOSP 2001.] We can compare SoftUDC and Quorum with respect to the merits/limitations of the mechanisms they offer. - Quorum as presented assumes homogeneous infrastructure for all apps: any request can execute at any server. SoftUDC/Hippodrome localizes each service and allows each service to control the resources it owns. - Quorum assumes fine-grained requests, reasonably homogeneous per-request costs. "Request" is the unit of cost/work for each service. SoftUDC/Hippo is more general with respect to the application properties. It is possible to mix many kinds of applications on the same infrastructure, since the isolation mechanism does not depend on the request properties. - Quorum treats infrastructure as a 'black box'. It assumes that resource consumption is a function of concurrency, and performance is a function of resource consumption. These assumptions might not be true. - SoftUDC/Hippodrome faces an algorithmic challenge for placement/provisioning of each service, it requires some means to route requests to the subset of resources assigned to each service (location tracking/routing), and it depends on resource control mechanisms that some may view as "intrusive" (virtualization). Quorum for storage? Storage and data-intensive applications lead to locality in data placement. It is expensive and unscalable to replicate every piece of data at every location, and replication makes consistency expensive to manage. That means that the request routing scheme must take that locality into account, i.e., it must route each request to a location that has the data needed to serve the request. That may lead to "hot spots" at particular locations. Quorum ignores hot spots. Question 5: log locally, commit globally This was really an opportunity to show that you understand two-phase commit. The question introduces new failure modes: any node may stall while reading or writing its log. It is always possible to map communication delays into a node fail-stop model at the transport layer in the usual fashion. Retransmit messages and declare the unreachable destination failed after a timeout. If a stall happens during the prepare phase, it is always possible to abort the transaction after the timeout. If a stall happens after the transaction is prepared, then we may be in trouble. Suppose a participant is notified of a decision to commit or abort, but it cannot access its log. It might be able to record the decision in memory and continue operating, but then if it fails it won't be sure which transactions committed and which aborted. That is also a problem if it cannot access its log during recovery. It is essential to preserve the ACID property, so a recovering participant cannot allow access to data until it is sure which transactions have committed and aborted. That could lead to unbounded waits to access data on the recovering node. If a coordinator fails and cannot access its log on recovery, then it may be unable to restart or complete any pending transactions. That could lead to unbounded waits to access data on all of the nodes. In these cases, the unbounded waits to access data occur because the node cannot release any locks held by a transaction until it knows whether the transaction has committed or aborted. Solutions? If a participant (or the coordinator) cannot access its log in phase 1, it might be useful to report the problem to others to abort faster and avoid delays in accessing data. But this is a tradeoff between availability and abort rate. For problems occurring in phase 2, or during recovery, it may be useful to add more support in the protocol for nodes to notify one another of transaction status, and/or to take over from the coordinator if the coordinator is stalled.