## New Directions in Bandit Learning: Singularities and Random Walk Feedback

Ph. D. Defense
Speaker Name
Tianyu Wang
Date and Time
-
Location
Talk will be remote on Zoom
Abstract

The talk mainly focuses on two topics. The first topic is bandit learning for Bounded Mean Oscillation (BMO) functions, where the goal is to "maximize'' a function that may go to infinity in parts of the space. For an unknown BMO function, I will present algorithms that efficiently finds regions with high function values. To handle possible singularities and unboundedness in BMO functions, I will introduce the new notion of $\delta$-regret -- the difference between the function values along the trajectory and a point that is optimal after removing a $\delta$-sized portion of the space. I will show that our algorithm has $\mathcal{O} \left( \frac{\kappa poly-log (T)}{T} \right)$ average $T$-step $\delta$-regret, where $\kappa$ depends on $\delta$ and adapts to the landscape of the underlying reward function.

The second topic is bandit learning with random walk trajectories as feedback. In application domains including online advertisement and social networks, the user behavior can often be modeled as a random walk over a network. To this end, a novel bandit learning problem, where each arm is the starting node of a random walk in a network and the reward is the length of the walk, is introduced. This formulation not only captures a large number of applications in practice but also provides a framework for online learning with random walk trajectories as feedback. In an adversarial setting, I introduce novel algorithms that achieve oblivious regret bound of order ${\mathcal{O}} $$\kappa \sqrt{T}$$$, where $\kappa$ is a constant that depends on the structure of the graph, which can be significantly smaller than the number of arms (nodes).