PhD Dissertation Proposal: Ken Kawintiranon
Title: “Detecting and Understanding of Information Pollution on Social Media”
Social media and the web have become primary sources for obtaining information and news. Given the speed and spread of information on social media, effects of poor-quality information, especially with respect to health-related information, can be consequential. In recent years, researchers have been working on detecting different types of poor-quality information such as fake news and misinformation, as well as identifying levels of support. Knowledge of both of these types of information can be used to mitigate the negative effects of poor-quality information. We refer to such poor-quality information as information pollution. Research on information pollution within computer science is growing rapidly; however, the state of the art still has a number of limitations, including the inability to accurately identify information pollution in noisy domains like social media where high-quality labels are limited and information spreads rapidly.
In this dissertation proposal, we aim to address some of the aforementioned challenges, specifically on two types of information pollution, spam and misinformation. We show on different Twitter data sets that context-specific spam exists and is identifiable using only content-based features, and present a comparative study of different detection algorithms in a low resource setting. We propose a stance detection algorithm that uses the log-odds-ratio algorithm to identify distinguishable stance words, then model a novel attention mechanism that focuses on these words. We show that our approach outperforms the state-of-the-art models on Twitter data sets about the 2020 US Presidential election. Next, we develop and release a pre-trained language model trained on a large amount of social media data about the US election in order to support those studying political (mis)information. To detect misinformation on Twitter in a resource-constrained environment, we develop a novel reinforcement learning framework for weak supervision and show that our model outperforms baseline models. To improve detection of multiple myths simultaneously and exploit information beyond textual content of posts, we propose utilizing graph techniques to improve the features used by our reinforcement learning framework. Moreover, we propose extending our earlier stance detection work to further understand misinformation beliefs. Finally, we publish the models, data sets, and code, enabling future research on spam, misinformation, and stance detection. All of these contributions bridge the existing knowledge gaps, bringing us closer to a world free of information pollution.