The complete report is available online.
An ongoing challenge within the field of international development is determining how foreign aid can best contribute to development projects and how to measure whether or not this aid is helpful. A new World Bank working paper, co-authored by MIT GOV/LAB partner affiliate Luke Jordan, examines whether certain aspects of development projects are correlated with the impact of aid in recipient countries.
The researchers first looked at donors’ own ratings of project success and found that they have little connection to the actual success of the project. They showed instead that projects that are highly customized to a specific country’s context are tied to better development outcomes.
“Contextualization predicts impact better than any other factor,” Jordan says.
High project ratings aren’t actually tied to successful projects
When projects end, donors typically evaluate their projects to see if they’ve been successful, rating them on a numerical scale. These ratings are used not only within donor institutions, but also in the academic literature. Papers studying whether aid has been successful “will often use these ratings and take them at face value,” says Jordan, who worked on the research during his time as an MIT GOV/LAB practitioner-in-residence in 2021.
But Jordan and his colleagues were skeptical that these ratings held much meaning. Project designers can set easily achievable goals for their projects or limit their evaluation to a short time frame in order to receive a higher rating.
“In my time at the World Bank and then afterwards, I didn’t meet anybody who didn’t think the ratings were gamed who’s ever practically used them,” Jordan says.
To test whether donor ratings were actually correlated with development outcomes, the researchers took ratings from eight donors in 183 recipient countries since the 1990s, as well as outcome data in five sectors — health, education, water and sanitation, energy, and fiscal management. They wanted to see whether these projects were correlated with improvements in these sectors beyond what might have been expected in the absence of these projects.
The researchers found that the projects themselves had a small positive effect on outcomes like increasing primary school enrollment, increasing rural access to water, and decreasing infant mortality. But the analysis found that the project ratings, on the other hand, were not tied to any of these outcomes except for fiscal management. Big projects in particular were more likely to have a large gap between their ratings and their outcomes.
“We tried to p-hack our way to finding significance in the ratings, and we failed,” Jordan says.
Using machine learning to measure contextualization
To find out which factors actually did influence project success, the researchers looked specifically at World Bank projects, for which more data was available. They used a machine learning model similar to the one powering ChatGPT to condense the information in key documents describing the projects into points on a two-dimensional graph. The distance between projects on the graph represented how similar they were to each other, with nearby points representing similar projects and faraway points representing dissimilar projects.
Without any prompting from the researchers, the model loosely grouped the documents by their sector focus, a sign that the model was successfully capturing the relationships between the projects.
Jordan and his co-authors interpreted the distance on the graph between a project and the average of the projects in the same sector as a measure of that project’s contextualization. The thinking was that projects near the average were very similar to other projects in that sector, and thus weren’t specific to their context. The projects that were farther were pretty different from others in the same sector, which could be because they were uniquely tailored to their context.
Using the distance between projects on the graph as a proxy for contextualization, the researchers found that contextualization was indeed strongly correlated with improvements in sector outcomes. The findings backed up what Jordan had heard in conversations with people in the field.
“Practitioners in the field will generally tell you that contextualization makes a big difference to a project, but there hadn’t been a way to quantitatively test that,” he says.
The findings demonstrate that donors’ project ratings shouldn’t be overly relied on for evaluating a project’s success, and that project developers should pay more attention to contextualization.
Jordan also says that project ratings would be harder to game and projects themselves would be improved if project proposals had to explicitly and publicly state what outcomes might look like if the project didn’t happen. They should have to specify, for example, the extent to which infant mortality rates might continue to decline in an area even without the project.
“People need to be a lot more careful and clearer about what they think they’re achieving versus what would happen anyway,” he says.
Photo by Diana Polekhina on Unsplash.