Global Tool Methodology

Abstract

We present a machine learning-based methodology to forecast conflict (defined as organized violence resulting in at least 10 fatalities over a 12 month period), up to a year in advance using a random forest model. When applied to reserved test data, the model captures 86% of future conflicts. The model's conflict signal is noisy, with half of conflict predictions representing false positives.

Water-related variables are assessed to be correlated with conflict outcomes, but not empirically significant for model decision-making. However, adjusting the definition of conflict, such as by lowering the fatality threshold or examining only emerging conflict, increases the significance of water variables.

A web-based tool that houses the model allows users to explore forecasts and individual predictor indicators spatially and through time, providing additional information on underlying vulnerabilities as a first step toward enabling timely, effective water-related interventions to mitigate conflict and/or build peace. 

Model Overview

The overall objective of the Water, Peace and Security partnership is to offer a platform where actors from the global defense, development, diplomacy, and disaster relief sectors (among others) and national governments of developing countries can identify conflict hotspots before violence erupts, begin to understand the local context, and prioritize opportunities for water interventions. This requires ­information that is timely, accurate, and actionable. To meet user demand for early warning information, we will release updated 12-month conflict forecasts every three months. The tool allows users to explore these forecasts of ongoing and emerging conflicts, as well as underlying model inputs and contextual indicators across a variety of domains.

The foundation of our forecasting is an expansive library of quantitative indicators potentially related to conflict. The indicators used in our model—predictor variables—are available for exploration as both interactive maps and time series. Some indicators are not fit for use in a quantitative model, but are nonetheless useful for decision-making. The tool includes these contextual indicators alongside the model inputs as spatial data. All tool functionality—access to datasets, metadata, and geospatial visualization specifications—is powered by the Resource Watch API, an open-source service designed to easily integrate into user workstreams. 

The WPS Global Early Warning Tool is not intended to elucidate causal relationships between the predictor variables and conflict. It does, however, highlight instances of water shocks (i.e. heavy rains or drought) reflecting our audience’s interest in water-related interventions. While we do not know or claim that water shocks drive conflict, we do believe they are important for screening when on-the-ground adaptation measures are needed.

Model Performance

Overall, the model captures 86% of future conflicts, successfully forecasting over 9 out of every 10 ongoing conflicts and 6 out of 10 emerging conflicts. The trade-off for this high recall is low precision for emerging conflicts. Around 80% of all emerging conflict forecasts represent false positives, that is, instances where conflict was forecast but did not actually occur. Note: on-going conflicts have both high recall and high precision (<1% were false positives).

Application

Like an initial medical screening, our model is optimised to flag all concerning cases for further analysis. In other words, we would rather wrongly forecast the presence of conflict than incorrectly forecast its absence (i.e. ‘peace’, in the strictly negative sense). For this reason, we prioritise recall over the other metrics. The downside to this decision is that our model is likely to overestimate conflict.

Users interested in the ongoing conflict forecasts can have high confidence in the forecast, and may feel comfortable acting on this information immediately. For emerging conflicts, users can view these results as a ‘first screening’, feeling confident that our ‘net’ has caught most emerging conflicts, but acknowledging they are interspersed with many instances of peace. These users can then engage with WPS to use its ‘Regional Tool’ to conduct rapid assessments of local conditions in potential hotspot areas (or conduct their own local assessments) before deciding whether to engage with national and local stakeholders (with or without WPS) to identify and prioritise intervention opportunities.

Technical Details

Model type

We employed the random forest (RF) model type, a vehicle for ensemble supervised learning, to forecast conflict.

unit of analysis

Understanding that a useful forecast must specify both where and when something is expected to happen, our unit of analysis has two dimensions: the ‘district-month’ represents a given second-level administrative unit (hereafter, a ‘district’) in a given month. Our district boundaries are based on the Database of Global Administrative Areas (GADM 2018).

We eventually plan to produce forecasts for every district around the world. However, the current geographic scope is limited to the following regions: Africa, South and Southeast Asia, and the Middle East.

The model was trained using data from January 2004 through May 2016. Performance (described above) was assessed using data from January to June 2018.    

Dependent variable

We used the Armed Conflict Locations Event Database (ACLED) to develop our dependent variable (Raleigh et al. 2010). A qualifying conflict event was any instance of an event type listed in the table below. We constructed the dependent variable as a binary value that examined all qualifying events occurring in that district over the following 12 months. ‘Yes’ was assigned to district-months with 10 or more fatalities over the next year, ‘No’ to the others to represent peace.

Predictor Variables

Out of the 80+ indicators tested, the following indicators were most relevant to produce the forecasts, and therefore are used in our current model:

References

CIESIN. 2016a. Gridded Population of the World (GPW), v4: UN-Adjusted Population Count, v4 (2000, 2005, 2010, 2015, 2020). Palisades, NY: SEDAC. https://dx.doi.org/10.7927/H4HX19NJ.

CIESIN. 2016b. Gridded Population of the World (GPW), v4: UN-Adjusted Population Density, v4 (2000, 2005, 2010, 2015, 2020). Palisades, NY: SEDAC. https://dx.doi.org/10.7927/H4HX19NJ.

Hofste, Rutger, Samantha Kuzma, Paul Reig, Sara Walker, Edwin H. Sutanudjaja, Marc F. P. Bierkens, M Kuijpers-Linde, et al. 2019. Aqueduct 3.0: Updated Decision-Relevant Global Water Risk Indicators. Technical Note. Washington, DC: World Resources Institute.

GADM. 2018. GADM Database of Global Administrative Areas 3.6 (version 3.6). Global Administrative Areas. https://gadm.org/download_country_v2.html.

International Food Policy Research Institute. 2019. “Global Spatially-Disaggregated Crop Production Statistics Data for 2010 Version 1.0.” Harvard Dataverse. doi:10.7910/DVN/PRFF8V.

Johnson, Stephanie J., Timothy N. Stockdale, Laura Ferranti, Magdalena A. Balmaseda, Franco Molteni, Linus Magnusson, Steffen Tietsche, et al. 2019. “SEAS5: The New ECMWF Seasonal Forecast System.” Geoscientific Model Development 12 (3): 1087–1117. doi:https://doi.org/10.5194/gmd-12-1087-2019

PBL. 2018. Towards an Urban Preview: Modelling Future Urban Growth with 2UP. 3255. The Hague: PBL Netherlands Environmental Assessment Agency. https://www.pbl.nl/en/publications/towards-an-urban-preview.

Raleigh, Clionadh, Andrew Linke, Håvard Hegre, and Joakim Karlsen. 2010. “Introducing ACLED: An Armed Conflict Location and Event Dataset: Special Data Feature.” Journal of Peace Research 47 (5): 651–660. doi:10.1177/0022343310378914.

United Nations, Department of Economic and Social Affairs, Population Division. 2019. World Population Prospects: Methodological Updates. New York: United Nations. https://population.un.org/wpp/Publications/Files/WPP2019_Methodological-updates.pdf.

van Vuuren, Detlef P., Paul L. Lucas, and Henk Hilderink. 2007. “Downscaling Drivers of Global Environmental Change: Enabling Use of Global SRES Scenarios at the National and Grid Levels.” Global Environmental Change, Uncertainty and Climate Change Adaptation and Mitigation, 17 (1): 114–130. doi:10.1016/j.gloenvcha.2006.04.004.

WHO, and UNICEF. 2017. Progress on Drinking Water, Sanitation and Hygiene: 2017 Update and SDG Baselines. Geneva: World Health Organization (WHO) and the United Nations Children’s Fund (UNICEF). https://washdata.org/.

World Bank Group. 2016. “Agriculture, Value Added (% of GDP).” https://data.worldbank.org/indicator/NV.AGR.TOTL.ZS.

World Bank Group. 2017. “GDP, PPP, (Constant 2011).” https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.KD.