CPS 512 Distributed Systems Midterm study outline, Spring 22 http://www.cs.duke.edu/courses/spring22/compsci512/midterm-outline.txt This is all shorthand. The pptx names refer to the relevant slide decks on the site, which provide more specific detail. 1-intro.pptx - Internet applications are delivered as services. - And/or structured as compositions of services. - Various client devices and client software (e.g., browsers, apps) access these services. - We focus on abstractions, design techniques, and design patterns for services. - Service properties: scalability, availability, reliability, consistency, reusability, *ility. - Different services have different demands with respect to these properties. - We focus on techniques to implement these service properties. 2-services.pptx - The client/server channel: TCP stream between IP host endpoints. - Endpoints are named by IP addresses and port numbers. - Clients access services with request/response patterns over stream connections. - Example: classical Remote Procedure Call (RPC) client/server paradigm - Today most channels are layered on HTTP over TCP: REST, gRPC - RPC systems use various representations for program data (XML, JSON, XDR, protobuf). - Client stubs hide network/representational details from RPC clients. - Similarly to HTTP content standards: e.g., pictures, video, text, HTML. - We can think about services abstractly, independent of communication standards. - Server application is a program module, API, and state. - A service (or job) may include many server processes (tasks, pods) working together. - Each server process may handle many incoming client requests "in parallel". - "Services may provide services to other services." (composition) - What examples of servers acting as clients for other servers? - How do communication needs vary for external sessions vs. within the datacenter? - Examplary canonical service APIs to discuss in more detail: +DNS: simple name lookup, no user-generated writes or updates, multi-domain +Web/streaming content services: large traffic volumes, get-only +Chubby/Zookeeper coordination service: strongly consistent named files and watches. +KVstore: storage objects put/get values in entirety, simple primary key space Reliable RPC: - Exactly-once = at-least-once (ALO) + at-most-once (AMO) - Techniques for ALO and AMO (Lab #1); timeouts, client ID, request ID, reply cache - When is AMO needed: idempotent vs. non-idempotent requests - Role of stubs (client and server) in handling ALO and AMO in general way for any app - Duplicate suppression: reply cache; client stub suppresses any duplicate replies. - Extend client stub to retarget retransmissions for a server that moves (Lab #2). 3-dns.pptx - DNS enables symbolic naming of services, convenient for humans - Hierarchical name space: the DNS symbolic name space is a tree - Interior nodes map to a DNS server authoritative for the prefix (like a directory) - Leaf nodes map to a host or service endpoint(s) - DNS server has a very simple API; DNS servers work together to provide DNS service - DNS server query returns local Resource Records (RR) matching a name - Address (A or AAAA) RR gives a leaf IP address (or multiple options) - Nameserver (NS) RR gives IP of a DNS authoritative for the prefix (interior node) - CNAME RR redirects to another DNS name; lookup that one instead - Client typically invokes a DNS resolver ("local DNS" or "LDNS") - Resolver is operated by ISP, or as a public service - Resolver serves ordinary DNS API, but queries other DNS servers for the results. - Resolver issues queries to a DNS server for each component of the name - Resolver caches RRs with TTL (expiration time) to limit cache staleness - Client of resolver may be a chained resolver (e.g., built into a wireless router) DNS and trust: authority, governance, privacy - Concept of a domain (trust domain, administrative domain, DNS "zone") - A domain is a sphere of ownership and control by an authority - DNS is a multi-domain service specifying how domains interact to provide the service - Governance: control over root and TLD NS (e.g., for .com) gives power to subvert - Privacy: resolver sees all DNS traffic from user/client; concept of trusted resolver - Domain names are property: the role of registrars to validate and record ownership - "Sponsored" TLDs (e.g., .gov) have specific requirements for ownership 4-dnssec.pptx - Risks of subverted resolvers: hijacking, poisoning, censorship, or worse (Ripple). - A client looks up a service by DNS name: how can it verify a correct IP address? - The problem of integrity: verifying that RRs were truly spoken by proper authorities. - Step 1: digital signatures for RR sets - Crypto basics: hashes, keypairs, asymmetric crypto. - Basics of digital signatures: generating/checking a signature - Every authoritative DNSSEC NS server has a keypair (at least one: KSK) - Step 2: endorsement. Through DS RR, parent NS endorses public key of child NS. - How do these mechanisms work together to assure integrity for a full DNS pathname? - Why is it sufficient for parent to endorse the hash of the child zone KSK? - Practical crypto challenge: the need for key rotation. - KSK+ZSK: use of ZSKs to reduce the need for key rotation (of KSK) for an NS. - The challenge of rotating the root key. - Practical DNSSEC challenge: the need for domain registrars to validate owner NS keys. - Current deployment state of DNSSEC: critical infrastructure, falling a bit short. 5-tls-pki.pptx - Malice/Mal/Mallory/Man in the middle (MITM) attack: how it happens - Is DNSSEC sufficient to prevent MITM? How can Mal interpose on a valid IP address? - Suppose the server has a keypair: how does it help? - Symmetric crypto and why/how does TLS use it? - TLS/SSL v1.2: how does it establish a shared secret key and protect it from Mal? - How does client verify that the server's public key is valid for a specific DNS name? - How does this validation employ a digital signature? - If an attacker obtains the server's private key, what harm can it do? - How can the attacker learn the content of prior TLSv1,2 connections? - TLS/SSL v1.3: how does it establish a shared secret key and protect it from Mal? - If your browser provider goes rogue, what harm can it do? - If a CA is subverted (or leaks its private key), what harm can result? - How does the CA learn safely the correct public key for a domain it certifies? - How does a user know if a session with an https server is trustworthy? (hard) 6-redirect.pptx - CDN-hosted content services as an examples of mega-scale (geo-distributed) serving - How does the presence of CDN affect the network traffic to serve the content? - When a client looks up its DNS name, what factors determine which IP is returned? - How does a CDN edge server obtain fresh content for a client's requested URL? - What factors influence origin server load for a CDN-hosted content service? - What are some tradeoffs in setting TTLs for CDN DNS names and content objects? - What actions must a content owner take to serve their content through a CDN? - Why was it valuable to modify DNS (for ECS) to improve CDN effectiveness? - Why is it useful for a CDN/hosting edge server or reverse proxy to terminate TLS? - Why would a CDN customer or hosting customer give their private key to a provider? - CDN interaction with TLS: if a CDN goes rogue or is subverted, what harm can it do? - With "keyless SSL", what harm can a subverted CDN do? (How is "keyless" safer?) - For "keyless", what harm can Mal do by spying on traffic between CDN and key server? - Is the "keyless SSL" idea workable with TLS 1.2? How does TLS 1.3 facilitate it? - Can a streaming content service use two CDNs at once? How to shift traffic? - Do you trust a third-party CDN to provide your Apple software updates? Why safe? - Why is Duke happy to pay for space and power for Netflix OCA edge servers at Duke? - Why would a client's player change the video quality of a movie while it is playing? 7-replication.pptx - Service abstractions: serving an API through a hidden group of multiple servers. - Providing *ility properties to a server group with app-independent techniques. - Example: masking failures with replication for availability, reliability - Why is managing state harder for "stateful" services than content services? - In what sense is a content service (e.g., streaming video, images) "stateless"? - How does a server know if a peer server has failed? - Why do network partitions make failure detection harder? - How do leases protect availabiity of locks in the presence of server failures? - How do leases protect mutual exclusion of locks in a network partition? - Linearizable service: requests appear to execute in a globally visible sequence. - Why does replicated state machine (RSM, SMR) require serializing requests? - What could go wrong in RSM/SMR if a server app module is non-deterministic? - If a primary fails, how does the backup learn about the pending/incomplete requests? - In Lab 2, does state transfer to a new backup include the reply cache? Why? - In Lab 2, does the view server track which clients are up (e.g., from their pings)? - Why might different servers in a group have different views? How long can it last? 8-kube-services.pptx - Kubernetes is a great service with an unwieldy name, so we call it Kube. - Kube is a job hosting service for a cluster or data center. - Kube is an exemplary service model for a cloud infrastructure provider. - A job (e.g., a service) runs as one or more pods (tasks). - A pod is an app server instance (node), deployed on one host server (worker). - A pod comprises one or more containers running on the host. - A container runs one or more processes in a self-contained app software bundle. - A container has private ephemeral files, and may attach persistent storage volumes. - Kube defines Service abstractions to manage groups of pods for a server app. - if a pod in a Service fails, Kube detects this and launches a fresh replacement pod. - ReplicaSet is a kind of Service whose pods are visible externally at the same IP (VIP). - When a client connects to a service VIP, a load balancer selects a pod for the session. - Distribution of user sessions across pods of a VIP spreads the Service request load. - A ReplicaSet may auto-scale elastically to add/remove pods according to a policy. - The throughput capacity of a ReplicaSet service is linear with its number of pods. - The utilization of a server/service is linear with the offered load: offered/capacity. - Response time for a request = service time (demand D) + queue delay. - Idle capacity is 1 - utliization. - Expected mean response time R is proportional to inverse of idle capacity. - Mean response times do not capture tail latency. - Why is tail latency crucial for client perceptions of service performance? - What measures capture/quantify tail latency? - What factors impact tail latency? - T/F: "Caching can't reduce tail latency." Why or why not? - What performance measures are best to drive auto-scaling actions? (hard) 9-shard.pptx - Where is the service state? Always a key question for thinking about services. - Consider a forward tier with ReplicaSet. Failed pod --> sessions break, state is lost. - "12 factor" philosophy: reasons to keep forward service tier stateless. - Forward tier calls stateful service tier for app state storage and/or processing. - How to scale stateful tier horizontally? Distribute state across the nodes. - State objects partitioned into shards, e.g., by primary key ranges. - Kube StatefulSet is a type of Service with pods in a compact numbering. - Synchronized launch and shutdown to preserve compact StatefulSet numbering. - Sharding assigns partitions (slices) of state objects to numbered pods. - Redistribute shards as StatefulSet grows/shrinks (resharding). - Each pod in StatefulSet (typically) etattaches a persistent volume to store shard state. - Client stubs (pods in forward tier) direct requests to owning pod for requested shard. - Mapping function or table maps shard to StatefulSet pod index. - Simple hashing: mod on the primary key. What tradeoffs? - The problem of churn from mapping/assigning shards to pods with simple hashing. - The challenge of balancing load for sharded assignments. - Unbalanced loads impact tail latency. - The Yegge rant: modern structure is an ecosystem of interacting services. - Services are the unit of development, operations, performance management (devops). 10-hash-slice.pptx - Ring hashing (consistent hashing) is a family of schemes for shard mapping. - Goals: local decisions (no table transfer), balanced load, low churn. - Goal: minimize state transfer cost (churn) when growing/shrinking Stateful tier. - Hash object keys onto a ring, distribute slices to the nodes. - Classical consistent hashing: hash nodes onto the ring too. - Classical consistent hashing: divide the slices where the nodes land on the ring. - Drawback: add/remove of a node affects only adjacent nodes on ring-->imbalance. - Refinement: hash each node to many locations on the ring (virtual nodes, tokens). - Dynamo refinement: simplify: make the slices/partitions fixed-size - Distribute the slices evenly among the nodes, or by node weights. - Maglev illustrates scalable load balancer (LB) for VIP services. - Maglev uses consistent hashing to assign sessions to VIP nodes. - Maglev hash scheme ensures sessions stick to their nodes even if Maglev LBs fail. - Maglev hash scheme distributes shards more evenly than classical consistent hashing. - Maglev mapping function is more complex-->LB nodes must compute/store table. 11-slicer.pptx - Slicer auto-sharding service maps shards without consistent hashing. - Slicer instead computes/distributes slice map tables. - Slicer splits/merges slices by heuristic to balance load (weighted move). - Slicer apps define their own request key strings from request fields in their own way. - Slicer hashes the keys into a hashed key space on a ring. Tradeoffs? - Slicer quantifies load skew and churn. - How to apply auto-sharding? RinK vs. LinK. - RinK example: memory-based key-value store (memcached) backing VIP tier. - RinK gets lousy cache performance in forward tier. - RinK gets high tail latency after resharding. - LinK replaces RinK with an app-specific stateful tier with auto-sharding. - How does LinK address the problems of RinK? What new problems does it introduce? - Problem: coordinating data access with resharding: consistent shard assignments. - Slicer uses Chubby coordination service for consistent shard assignments. - Slicer and Chubby vs. the Lab #2 view server. 12-cap.pptx - A coordination service manages small/important metadata for app services. - Chubby (Google) and Zookeeper (Apache) are similar coordination services. - Functions: locks, small files, hierarchical name service, watches/events, view service. - Strongly consistent and linearizable, highly available/reliable by RSM replication - The challenge of consistent caches for file clients (e.g., NFS, parallel FS, Chubby). - Write ownership protocol for strongly consistent file caches: reader/writer leased lock. - Lease callbacks prevent conflicting reads/writes when a client modifies file data. - Example use of Chubby: coordinated resharding for consistent shard assignments. - Slicer clients (slicelets) and assigners call Chubby to synchronize switch to new maps. - But then server failure or partition blocks access to affected slices for safety. - If a slicelet loses session with Chubby, it cannot be sure what slices it owns. - Illustrates consistency/availabilty tradeoff in presence of network partitions. - CAP: choose two. - The CAP partition decision and the concept of Chubby jeopardy. - What are the choices for the partition decision? What could go wrong?