CPS 512
Distributed Systems
Midterm study outline, Spring 22
http://www.cs.duke.edu/courses/spring22/compsci512/midterm-outline.txt

This is all shorthand.  The pptx names refer to the relevant slide decks on the site, which provide more specific detail. 

1-intro.pptx

- Internet applications are delivered as services.
- And/or structured as compositions of services.
- Various client devices and client software (e.g., browsers, apps) access these services.
- We focus on abstractions, design techniques, and design patterns for services.
- Service properties: scalability, availability, reliability, consistency, reusability, *ility.
- Different services have different demands with respect to these properties.
- We focus on techniques to implement these service properties.

2-services.pptx

- The client/server channel: TCP stream between IP host endpoints.
- Endpoints are named by IP addresses and port numbers.
- Clients access services with request/response patterns over stream connections.
- Example: classical Remote Procedure Call (RPC) client/server paradigm
- Today most channels are layered on HTTP over TCP: REST, gRPC
- RPC systems use various representations for program data (XML, JSON, XDR, protobuf).
- Client stubs hide network/representational details from RPC clients.
- Similarly to HTTP content standards: e.g., pictures, video, text, HTML.
- We can think about services abstractly, independent of communication standards.
- Server application is a program module, API, and state.
- A service (or job) may include many server processes (tasks, pods) working together.
- Each server process may handle many incoming client requests "in parallel".
- "Services may provide services to other services."  (composition)
- What examples of servers acting as clients for other servers?
- How do communication needs vary for external sessions vs. within the datacenter?

- Examplary canonical service APIs to discuss in more detail:
+DNS: simple name lookup, no user-generated writes or updates, multi-domain
+Web/streaming content services: large traffic volumes, get-only
+Chubby/Zookeeper coordination service: strongly consistent named files and watches.
+KVstore: storage objects put/get values in entirety, simple primary key space

Reliable RPC:
- Exactly-once = at-least-once (ALO) + at-most-once (AMO)
- Techniques for ALO and AMO (Lab #1); timeouts, client ID, request ID, reply cache
- When is AMO needed: idempotent vs. non-idempotent requests
- Role of stubs (client and server) in handling ALO and AMO in general way for any app
- Duplicate suppression: reply cache; client stub suppresses any duplicate replies.
- Extend client stub to retarget retransmissions for a server that moves (Lab #2).

3-dns.pptx

- DNS enables symbolic naming of services, convenient for humans
- Hierarchical name space: the DNS symbolic name space is a tree 
- Interior nodes map to a DNS server authoritative for the prefix (like a directory)
- Leaf nodes map to a host or service endpoint(s)
- DNS server has a very simple API; DNS servers work together to provide DNS service
- DNS server query returns local Resource Records (RR) matching a name
- Address (A or AAAA) RR gives a leaf IP address (or multiple options)
- Nameserver (NS) RR gives IP of a DNS authoritative for the prefix (interior node) 
- CNAME RR redirects to another DNS name; lookup that one instead

- Client typically invokes a DNS resolver ("local DNS" or "LDNS")
- Resolver is operated by ISP, or as a public service
- Resolver serves ordinary DNS API, but queries other DNS servers for the results.
- Resolver issues queries to a DNS server for each component of the name
- Resolver caches RRs with TTL (expiration time) to limit cache staleness
- Client of resolver may be a chained resolver (e.g., built into a wireless router)

DNS and trust: authority, governance, privacy
- Concept of a domain (trust domain, administrative domain, DNS "zone")
- A domain is a sphere of ownership and control by an authority
- DNS is a multi-domain service specifying how domains interact to provide the service
- Governance: control over root and TLD NS (e.g., for .com) gives power to subvert
- Privacy: resolver sees all DNS traffic from user/client; concept of trusted resolver
- Domain names are property: the role of registrars to validate and record ownership
- "Sponsored" TLDs (e.g., .gov) have specific requirements for ownership

4-dnssec.pptx

- Risks of subverted resolvers: hijacking, poisoning, censorship, or worse (Ripple).
- A client looks up a service by DNS name: how can it verify a correct  IP address?  
- The problem of integrity: verifying that RRs were truly spoken by proper authorities.
- Step 1: digital signatures for RR sets
- Crypto basics: hashes, keypairs, asymmetric crypto. 
- Basics of digital signatures: generating/checking a signature
- Every authoritative DNSSEC NS server has a keypair (at least one: KSK)
- Step 2: endorsement.  Through DS RR, parent NS endorses public key of child NS.
- How do these mechanisms work together to assure integrity for a full DNS pathname?
- Why is it sufficient for parent to endorse the hash of the child zone KSK?
- Practical crypto challenge: the need for key rotation.
- KSK+ZSK: use of ZSKs to reduce the need for key rotation (of KSK) for an NS.
- The challenge of rotating the root key.
- Practical DNSSEC challenge: the need for domain registrars to validate owner NS keys.
- Current deployment state of DNSSEC: critical infrastructure, falling a bit short.

5-tls-pki.pptx

- Malice/Mal/Mallory/Man in the middle (MITM) attack: how it happens
- Is DNSSEC sufficient to prevent MITM?  How can Mal interpose on a valid IP address?
- Suppose the server has a keypair: how does it help?
- Symmetric crypto and why/how does TLS use it?
- TLS/SSL v1.2: how does it establish a shared secret key and protect it from Mal?
- How does client verify that the server's public key is valid for a specific DNS name?
- How does this validation employ a digital signature?
- If an attacker obtains the server's private key, what harm can it do?
- How can the attacker learn the content of prior TLSv1,2 connections?
- TLS/SSL v1.3: how does it establish a shared secret key and protect it from Mal?
- If your browser provider goes rogue, what harm can it do?
- If a CA is subverted (or leaks its private key), what harm can result?
- How does the CA learn safely the correct public key for a domain it certifies?
- How does a user know if a session with an https server is trustworthy?  (hard)

6-redirect.pptx

- CDN-hosted content services as an examples of mega-scale (geo-distributed) serving
- How does the presence of CDN affect the network traffic to serve the content?
- When a client looks up its DNS name, what factors determine which IP is returned?
- How does a CDN edge server obtain fresh content for a client's requested URL?
- What factors influence origin server load for a CDN-hosted content service?
- What are some tradeoffs in setting TTLs for CDN DNS names and content objects?
- What actions must a content owner take to serve their content through a CDN?
- Why was it valuable to modify DNS (for ECS) to improve CDN effectiveness?
- Why is it useful for a CDN/hosting edge server or reverse proxy to terminate TLS?
- Why would a CDN customer or hosting customer give their private key to a provider?
- CDN interaction with TLS: if a CDN goes rogue or is subverted, what harm can it do?
- With "keyless SSL", what harm can a subverted CDN do?   (How is "keyless" safer?)
- For "keyless", what harm can Mal do by spying on traffic between CDN and key server?
- Is the "keyless SSL" idea workable with TLS 1.2?  How does TLS 1.3 facilitate it?
- Can a streaming content service use two CDNs at once?   How to shift traffic?
- Do you trust a third-party CDN to provide your Apple software updates?  Why safe?
- Why is Duke happy to pay for space and power for Netflix OCA edge servers at Duke?
- Why would a client's player change the video quality of a movie while it is playing? 

7-replication.pptx

- Service abstractions: serving an API through a hidden group of multiple servers.
- Providing *ility properties to a server group with app-independent techniques.
- Example: masking failures with replication for availability, reliability
- Why is managing state harder for "stateful" services than content services?
- In what sense is a content service (e.g., streaming video, images) "stateless"?
- How does a server know if a peer server has failed?
- Why do network partitions make failure detection harder?
- How do leases protect availabiity of locks in the presence of server failures?
- How do leases protect mutual exclusion of locks in a network partition?
- Linearizable service: requests appear to execute in a globally visible sequence.
- Why does replicated state machine (RSM, SMR) require serializing requests?
- What could go wrong in RSM/SMR if a server app module is non-deterministic?
- If a primary fails, how does the backup learn about the pending/incomplete requests?
- In Lab 2, does state transfer to a new backup include the reply cache?  Why?
- In Lab 2, does the view server track which clients are up (e.g., from their pings)?
- Why might different servers in a group have different views?  How long can it last?

8-kube-services.pptx

- Kubernetes is a great service with an unwieldy name, so we call it Kube.
- Kube is a job hosting service for a cluster or data center.
- Kube is an exemplary service model for a cloud infrastructure provider.
- A job (e.g., a service) runs as one or more pods (tasks).
- A pod is an app server instance (node), deployed on one host server (worker).
- A pod comprises one or more containers running on the host.
- A container runs one or more processes in a self-contained app software bundle.
- A container has private ephemeral files, and may attach persistent storage volumes.
- Kube defines Service abstractions to manage groups of pods for a server app.
- if a pod in a Service fails, Kube detects this and launches a fresh replacement pod.
- ReplicaSet is a kind of Service whose pods are visible externally at the same IP (VIP).
- When a client connects to a service VIP, a load balancer selects a pod for the session.
- Distribution of user sessions across pods of a VIP spreads the Service request load.
- A ReplicaSet may auto-scale elastically to add/remove pods according to a policy.
- The throughput capacity of a ReplicaSet service is linear with its number of pods.
- The utilization of a server/service is linear with the offered load: offered/capacity.
- Response time for a request = service time (demand D) + queue delay.
- Idle capacity is 1 - utliization.
- Expected mean response time R is proportional to inverse of idle capacity.
- Mean response times do not capture tail latency.
- Why is tail latency crucial for client perceptions of service performance?
- What measures capture/quantify tail latency?
- What factors impact tail latency?
- T/F: "Caching can't reduce tail latency."  Why or why not?
- What performance measures are best to drive auto-scaling actions?  (hard)

9-shard.pptx

- Where is the service state?  Always a key question for thinking about services.
- Consider a forward tier with ReplicaSet.  Failed pod --> sessions break, state is lost.
- "12 factor" philosophy: reasons to keep forward service tier stateless.
- Forward tier calls stateful service tier for app state storage and/or processing.
- How to scale stateful tier horizontally?  Distribute state across the nodes.
- State objects partitioned into shards, e.g., by primary key ranges.
- Kube StatefulSet is a type of Service with pods in a compact numbering.
- Synchronized launch and shutdown to preserve compact StatefulSet numbering.
- Sharding assigns partitions (slices) of state objects to numbered pods.
- Redistribute shards as StatefulSet grows/shrinks (resharding).
- Each pod in StatefulSet (typically) etattaches a persistent volume to store shard state.
- Client stubs  (pods in forward tier) direct requests to owning pod for requested shard. 
- Mapping function or table maps shard to StatefulSet pod index.
- Simple hashing: mod on the primary key.  What tradeoffs?
- The problem of churn from mapping/assigning shards to pods with simple hashing.
- The challenge of balancing load for sharded assignments.
- Unbalanced loads impact tail latency.
- The Yegge rant: modern structure is an ecosystem of interacting services.
- Services are the unit of development, operations, performance management (devops).

10-hash-slice.pptx

- Ring hashing (consistent hashing) is a family of schemes for shard mapping.
- Goals: local decisions (no table transfer), balanced load, low churn.
- Goal: minimize state transfer cost (churn) when growing/shrinking Stateful tier.
- Hash object keys onto a ring, distribute slices to the nodes.
- Classical consistent hashing: hash nodes onto the ring too.
- Classical consistent hashing: divide the slices where the nodes land on the ring.
- Drawback: add/remove of a node affects only adjacent nodes on ring-->imbalance.
- Refinement: hash each node to many locations on the ring (virtual nodes, tokens).
- Dynamo refinement: simplify: make the slices/partitions fixed-size 
- Distribute the slices evenly among the nodes, or by node weights.
- Maglev illustrates scalable load balancer (LB) for VIP services.
- Maglev uses consistent hashing to assign sessions to VIP nodes.
- Maglev hash scheme ensures sessions stick to their nodes even if Maglev LBs fail.
- Maglev hash scheme distributes shards more evenly than classical consistent hashing.
- Maglev mapping function is more complex-->LB nodes must compute/store table.

11-slicer.pptx

- Slicer auto-sharding service maps shards without consistent hashing.
- Slicer instead computes/distributes slice map tables.
- Slicer splits/merges slices by heuristic to balance load (weighted move).
- Slicer apps define their own request key strings from request fields in their own way.
- Slicer hashes the keys into a hashed key space on a ring. Tradeoffs?
- Slicer quantifies load skew and churn.
- How to apply auto-sharding?  RinK vs. LinK.
- RinK example: memory-based key-value store (memcached) backing VIP tier.
- RinK gets lousy cache performance in forward tier.
- RinK gets high tail latency after resharding.
- LinK replaces RinK with an app-specific stateful tier with auto-sharding.
- How does LinK address the problems of RinK?  What new problems does it introduce?
- Problem: coordinating data access with resharding: consistent shard assignments.
- Slicer uses Chubby coordination service for consistent shard assignments.
- Slicer and Chubby vs. the Lab #2 view server.

12-cap.pptx

- A coordination service manages small/important metadata for app services.
- Chubby (Google) and Zookeeper (Apache) are similar coordination services.
- Functions: locks, small files, hierarchical name service, watches/events, view service.
- Strongly consistent and linearizable, highly available/reliable by RSM replication
- The challenge of consistent caches for file clients (e.g., NFS, parallel FS, Chubby).
- Write ownership protocol for strongly consistent file caches: reader/writer leased lock.
- Lease callbacks prevent conflicting reads/writes when a client modifies file data.
- Example use of Chubby: coordinated resharding for consistent shard assignments.
- Slicer clients (slicelets) and assigners call Chubby to synchronize switch to new maps.
- But then server failure or partition blocks access to affected slices for safety.
- If a slicelet loses session with Chubby, it cannot be sure what slices it owns.
- Illustrates consistency/availabilty tradeoff in presence of network partitions.
- CAP: choose two.
- The CAP partition decision and the concept of Chubby jeopardy.
- What are the choices for the partition decision?  What could go wrong?