Geometry of Bias Mitigation Techniques for Word Representations
Vectorized representation of textual data has revolutionized natural language processing, first with methods like Word2Vec and GloVe and then with contextual variants like BERT and RoBERTa. Similar representations are useful for learning on other structured data types like images, trajectories, business transactions, and many more. However, as these are trained on enormous quantities of real-world data -- in the case of language, large amounts of text from the internet, they encode some of the biases from that text. This talk will explore these issues from a geometric perspective, and in particular focus on ways of using geometric interpretations and operations (linearity, projection, and orthogonalization) to attenuate the bias inherent in these representations. We will conclude with some discussion and future directions around the nature of bias and data representation in automated reasoning.
Jeff M. Phillips is Associate Professor in the School of Computing at the University of Utah. He is also the Director of the Utah Center for Data Science, as well as the Director of the Data Science Program in the School of Computing, evolving from the Big Data Program. With research interests that include Algorithms for Big Data Analytics: Geometric Data Analysis, Computational Geometry, Coresets and Sketches, Handling Uncertainty, Data Mining, Databases, Machine Learning, and Spatial Statistics, Jeff was awarded an NSF Career Award in 2014. He received his PhD in Computer Science at Duke in January 2009.