CS Colloquium: Andrew Yates (University of Amsterdam)
An encoder-centric view of information retrieval
Abstract:
Modern large language models like GPT-4 and Bard are currently in the spotlight due to their strong performance on many information seeking tasks. These impressive models rely on retrieval to generate up-to-date responses across topics, which is often powered by neural methods.
In this talk, I will describe my encoder-centric view of neural methods for information retrieval with a focus on how such approaches relate to large language models (LLMs). LLMs are effective reranking methods when provided with candidate documents, but when and how should they serve as encoders to power efficient first-stage retrieval? I will address this question while describing my recent work on encoders for learned sparse retrieval and dense retrieval, which provide paths to leveraging a LLM’s strong language processing for search.
Bio:
Andrew Yates is an Assistant Professor at the University of Amsterdam, where his research focuses on developing content-based neural ranking methods and leveraging them to improve search and downstream tasks. He has co-authored a variety of papers on neural ranking methods as well as a book on transformer-based neural methods: “Pretrained Transformers for Text Ranking: BERT and Beyond”. Previously, Andrew was a post-doctoral researcher and senior researcher at the Max Planck Institute for Informatics. Andrew received his Ph.D. in Computer Science from Georgetown University, where he worked on information retrieval and extraction in the medical domain.
https://scholar.google.com/citations?user=b4ciDMsAAAAJ&hl=en