Over the last ten years we have witnessed a shift from large mainframe computing to commodity, off-the-shelf clusters of servers. Today's data centers contain thousands or tens of thousands of servers, providing services and computation for tens or hundreds of thousands of users. In addition to traditional IT challenges such as server management, security, and performance, data center owners now must deal with power and thermal issues, previously the domain of facilities management. These trends will continue to accelerate as organizations acquire bladed servers and consolidate multiple, smaller clusters into centrally-located data centers. However, in spite of these trends, there has been no corresponding change in emphasis in the methods and toolkits that target system instrumentation, analysis, management, replay, and emulation. This paper seeks to address this gap. We focus on methods and toolkits to enable the automated collection and analysis of workload traces from data centers, and use those traces as the basis for repeatable and verifiable experiments and workload emulation. Our work has two components: ~ a location- and environment-aware extended knowledge plane that places thermal and power management concerns at the same level as service performance, collecting and analyzing facilities and performance data with particular focus on causal relationships across this boundary, and ~ data analysis and and workload playback methods that allow detailed and flexible emulation of enterprise-class workloads. We discuss the high-level architectural requirements for these two components and present results from specific implementations and toolkits.