Accelerating Data Parallel Applications via Hardware and Software Techniques
The unprecedented amount of data available today opens the door to many new applications in areas such as finance, scientific simulation, machine learning, etc. Many such applications perform the same computations on different data, which are called data-parallel. However, processing this enormous amount of data is challenging, especially in the post-Moore's law era. Specialized accelerators are a promising solution to meet the performance requirements of data-parallel applications. Among these are graphics processing units (GPUs), as well as more application-specific solutions.
One of the areas with high performance requirements is statistical machine learning, which has widespread applications in various domains. These methods include probabilistic algorithms, such as Markov Chain Monte-Carlo (MCMC), which rely on generating random numbers from probability distributions. These algorithms are computationally expensive on conventional processors, yet their statistical properties, namely, interpretability and uncertainty quantification compared to deep learning, make them an attractive alternative approach. Therefore, hardware specialization can be adopted to address the shortcomings of conventional processors in running these applications.
In addition to hardware techniques, probabilistic algorithms can benefit from algorithmic optimizations that aim to avoid performing unnecessary work. To be more specific, we can skip a random variable (RV) whose probability distribution function (PDF) is concentrated on only one value, i.e., there is only one value to choose, and the values of its neighboring RVs have not changed. In other words, if a RV has a concentrated PDF, its PDF will remain concentrated until at least one of its neighbors changes. Due to their high throughput and centralized scheduling mechanism, GPUs are a suitable target for this optimization.
Other than probabilistic algorithms, GPUs can be utilized to accelerate a variety of applications. GPUs with their Single-Instruction Multiple-Thread (SIMT) execution model offer massive parallelism that is combined with a relative ease of programming. The large amount and diversity of resources on the GPU is intended to ensure applications with different characteristics can achieve high performance, but at the same time it means that some of these resources will remain under-utilized, which is inefficient in a multi-tenant environment.
In this dissertation, we propose and evaluate solutions to the challenges mentioned above, namely i) accelerating probabilistic algorithms with uncertainty quantification, ii) optimizing probabilistic algorithms on GPUs to avoid unnecessary work, and iii) increasing resource utilization of GPUs in multi-tenant environments.