Drone images from Syrian refugee living in refugee camps in Gaziantep, Turkey
Category: Discovery & Impact

Title: Georgetown and UNHCR Collaborate on Big Data and Forced Displacement Project

The collaboration could help governments, NGOs and humanitarian organizations prepare for mass movements before they happen and intervene to address the factors that cause displacement.  

“We always hear about how technology is destroying our world,” says Lisa Singh, director of Georgetown’s Massive Data Institute (MDI) at the McCourt School of Public Policy, who co-leads the effort. “But it is also important to find ways to use technology to improve outcomes for individuals.”

Three professors leading the Massive Data and Displacement project stand outside Healy Hall on a sunny day in March. They are (from left to right) Katharine Donato, Donald Herzberg Professor of International Migration; Lisa Singh, director of Georgetown’s Massive Data Institute; and Ali Arab, associate professor of statistics.
(From left to right) The Massive Data and Displacement project is led by Katharine Donato, Donald Herzberg Professor of International Migration; Lisa Singh, director of Georgetown’s Massive Data Institute; and Ali Arab, associate professor of statistics.

Georgetown and the UNHCR signed a five-year agreement earlier this year in which both organizations will pool data, analytic tools and expertise to fill data gaps and develop new indicators of forced displacement to help generate estimates of displacement in different regions of the world. The project cuts across three Georgetown schools and two institutes and involves collaborators from other universities, a multidisciplinary approach to advance research in the space.

“Traditional data collection methods can be particularly challenging in forced displacement contexts, and harnessing the potential of big data is key to provide more granular, near real-time statistics that allow for a rapid response during humanitarian crises,” says Andrea Pellandra, senior data scientist at UNHCR’s Global Data Service. 

“MDI and ISIM [Institute for the Study of International Migration] worked extensively on the use of big data to understand forced displacement, and I am very excited at this opportunity to collaborate with some of the leading experts in this field. I trust that pooling together our data, resources and expertise will greatly improve our evidence base and inform program decisions that will hopefully lead to better outcomes for the people we serve.”

The agreement comes on the heels of the highest levels of forced displacement ever recorded: 103 million people displaced as of mid-2022 — an increase from the previous year’s 89.3 million, the UNHCR reported.

“There is an immediate need to better understand the issue to potentially prevent or better allocate resources that can improve the situation of the increasing number of migrants,” says Ali Arab, a co-lead and associate professor of statistics in the College of Arts & Sciences.

“The collaboration with UNHCR is critical in allowing us to build this global approach given the data they collect and the exchange of ideas with their experts and community of users and collaborators, including those who depend on UNHCR to inform and shape related policies.”

Georgetown’s efforts are led by the Massive Data and Displacement (MaDD) project, which brings together computer scientists, social scientists, undergraduates and graduate students who have been developing new uses for public, open-source data to analyze and predict displacement since 2015.

Big Data and Migration

Singh, a professor in the Department of Computer Science and research professor at MDI, first began studying the connection between big data and forced displacement in 2014. 

At the time, most research on migration relied heavily on survey and census data to understand mass movements. Relying on this data, which can often be in short supply, to try to predict forced displacement can be difficult, time-intensive and costly, Arab says.

Singh wanted a broader view, one that incorporated the complex factors of migration all over the world — politics, the economy, environment, immigration laws, for example — into a single model that could be easily adapted anywhere. 

In 2014, she and a now retired SFS professor, Susan Martin, analyzed newspaper data in local languages in Somalia and Iraq and found what they called “indirect indicators of movement.” With a grant from the National Science Foundation, they implemented a pilot project in Iraq blending social media and newspaper data with traditional sources of data from surveys and censuses. Building on those findings, Singh, Donato and Ali developed the MaDD project. 

Expanding to Venezuela and Ukraine

An image of the Ukrainian blue-and-yellow flag hanging over a city in Ukraine. Behind it is black smoke from an explosion and a cloudy sky.

In the last five years, the team has studied COVID-19 misinformation as it relates to forced displacement in Venezuela, a result of political and economic turmoil there, and forced displacement from the war in Ukraine, using data from Twitter, newspapers and Google searches.

In studying Venezuela, the team found that specific events and statements made by governments and political leaders about COVID-19 and migration misinformation were associated with shifts in Twitter conversation. In studying Ukraine, the team found Google Trends searches related to travel correlated highly with the actual number of border crossings to neighboring countries, data the UNHCR provided, said Nathan Wycoff, a postdoctoral fellow on the project.  

“This project is a proof of concept that mass migration events leave a digital imprint that aid and intergovernmental organizations could exploit to be better prepared to host the displaced during an emerging crisis,” he said.

Analyzing this data in real-time, says Katharine Donato, co-leader on the project and the long-time director of SFS’s Institute for the Study of International Migration (ISIM), could also help countries and states intervene before a massive outflow.

“Most people don’t want to leave home,” says Donato, the Donald Herzberg Professor of International Migration and a demographer and sociologist. “So why not focus on what can be done to inform proactive policies and practices so people can stay in their homes? That’s a long-term goal: to create information, what we’re calling ‘indirect indicators of forced displacement,’ so that relief efforts are not only reactive but also have the ability to also be proactive.”

Where They’ll Go Next

The team is working to publish its research on Ukraine, including papers presented at Georgetown in April 2022. MaDD team members are meeting monthly with leaders from the UNHCR, and plan to study climate-driven migration in Bangladesh and forced displacement from Central America, among other topics, Donato says.

Long-term, their goal is to help understand the drivers and predict forced displacement globally. This requires building language capabilities to process data in the source’s original language and in destination languages.

Jenny Park (right) with Qihang Wang (left), another research assistant on the project, stand next to a poster illustrating their findings at their research conference last semester.
Park (right) with Qihang Wang (left), another research assistant on the project at their research conference last semester.

Jenny Park (C’24), a junior who’s majoring in Computer Science and Justice and Peace Studies, is working on the project as a Massive Data Institute Scholar. She is developing models to analyze sentiment in organic data in Arabic.

“Knowing that my language skill set was able to play a unique role in contributing to the project was exciting and helped me to better develop these models that were our biggest milestones last semester,” she says. “Sentiment is an important measurement of perception, and gauging the perception of people in response to major events is an extremely useful way to predict if, when and where people will be moving as a response to an event.”

The project has shaped Park’s own career journey, showing how she can make a tangible impact in the field of forced displacement, she says. She’s considering pursuing a graduate degree in computer science – a field she never would have considered without this project. 

“The project has given me a better idea of what working at the intersection of computing and public policy can look like, and gave me a more concrete lens on how I can make my impact in the field of forced displacement,” she says. “I know I will still end up in a migration-related career, but I now feel like I have the tools to make an even more meaningful difference.”

The Massive Data Institute is one of the centers that collaborate together in the Georgetown Initiative on Tech & Society, a cross-campus network that creates novel approaches for interdisciplinary collaboration, research, understanding and action at the intersection of technology, ethics and governance. During the initiative’s annual Tech & Society Week, on March 29, Donato will be speaking about the use of artificial intelligence in migration and border control.

“We are so proud of the Massive Data and Displacement team for this new partnership with UNHCR, said Paul Ohm, inaugural chair of the Tech and Society Initiative. “This is exactly the kind of impactful, interdisciplinary work we try to support through our Initiative.”

Recently, the Massive Data and Displacement team was one of eight inaugural recipients who received $2.3 million in grants from the McCourt Institute and Georgetown for their work developing technology for the common good.

Singh is also the recipient of an NSF Research Experiences for Undergraduates award that supports undergraduate research and focuses on connecting computer science and data science to public policy. As part of the award, nine students from around the country will engage with the MaDD research team this summer at Georgetown and work to improve and add to the team’s set of indirect indicators.