PhD Dissertation Defense: Ken Kawintiranon
Title: “Detecting and Understanding of Information Pollution on Social Media”
Social media and the web have become primary sources for obtaining information and news. Given the speed and spread of information on social media, effects of poor-quality information, especially with respect to health-related information, can be consequential. In recent years, researchers have been working on detecting different types of poor-quality information such as fake news and misinformation, as well as identifying levels of support. Knowledge of both of these types of information can be used to mitigate the negative effects of poor-quality information. We refer to such poor-quality information as information pollution. Research on information pollution within computer science is growing rapidly; however, the state of the art still has a number of limitations, including the inability to accurately identify information pollution in noisy domains like social media where high-quality labels are limited and information spreads rapidly.
In this dissertation, we aim to address some of the aforementioned challenges, specifically on two types of information pollution, spam and misinformation. We show on different Twitter data sets that context-specific spam exists and is identifiable using only content-based features, and present a comparative study of different detection algorithms in a low resource setting. Next, we review literature on false information detection (e.g. misinformation and disinformation) and discuss the need for building bridges between false information detection research in computer science and research in other disciplines. To detect misinformation on Twitter in a resource-constrained environment, we develop a novel reinforcement learning framework for weak super-
vision and show that our model outperforms baseline models. To improve detection of multiple myths simultaneously and exploit information learned for different myth themes, we propose a novel cooperative learning method for multi-agent reinforcement learning. This training method is used to improve the training process in our reinforcement learning framework. To understand whether there is support for the misinformation, we study stance detection. We propose a stance detection algorithm that uses the log-odds-ratio algorithm to identify distinguishable stance words, then model a novel attention mechanism that focuses on these words. We show that our approach outperforms the state-of-the-art models on Twitter data sets about the 2020 US Presidential election. Next, we develop and release a pre-trained language model trained on a large amount of social media data about the US election in order to support those studying political (mis)information.
Finally, we publish the models, data sets, and code, enabling future research on spam, misinformation, and stance detection. All of these contributions reduce the existing knowledge gaps, bringing us closer to a world free of information pollution.
Committee members:
Lisa Singh (adviser)
Grace Yang
Nathan Schneider
Ceren Budak (University of Michigan)