Vectorized representation of textual data has revolutionized natural language processing, first with methods like Word2Vec and GloVe and then with contextual variants like BERT and RoBERTa. Similar representations are useful for learning on other structured data types like images, trajectories, business transactions, and many more. However, as these are trained on enormous quantities of real-world data -- in the case of language, large amounts of text from the internet, they encode some of the biases from that text.
Duke Computer Science Colloquium
Jeff M. Phillips
Date and Time