(Satellite images used by the DataKind team to detect mining activity. Source: Jake Porway.)

DataKind aims to correct a power imbalance.

Until recently, data science and machine learning technologies have primarily been used to sell us things: extracting and mining our personal data to improve recommendation engines and bolster online advertising. DataKind, a network of pro-bono data scientists, is helping to bring these technical skills to organisations tackling big humanitarian problems.

Whereas businesses incubate technologies and apply them to generate profits, DataKind instead uses technology as a way of ‘Getting Power to the People’ by making the power of data analytics accessible to socially oriented organisations. Through short-term projects, DataKind volunteers donate their computational skills to organisations with the goal of maximizing their social impact. In his talk, DataKind’s Executive Director, Jake Porway, walked the audience through three such projects, framing them as battles of Davids and Goliaths. On one side, under-resourced organisations defend the public interest. On the other, powerful actors wield technology for different, sometimes destructive, ends.

Presentation slides showing the title of the talk and examples framed as David vs Goliath. Credit: Jake Porway.

Excepting weekend-long Data Dives, DataKind projects last around nine months. After a scoping period and an ethical orientation, volunteers work with partner organisations to develop a data-driven solution to a problem. As it turns out, finding the solution to these problems might not be the hardest part.

Finding a ‘machine learnable’ problem

For a partnership to work, DataKind and the partner organisation must first clearly identify the project’s ultimate goal. Figuring out which problems to try and solve can be a challenge. Wicked problems, for example, do not make for good machine learning projects. Comprising many interdependent sub-problems with no clear structure, wicked problems are hard to define. Problems like ‘homelessness’ or ‘climate change’ have no easy solutions, only better or worse ones.

In order for machine learning to be helpful, aspirational goals must be broken down into smaller problems with quantifiable objectives. However, if the problem is fundamentally one of human behavior—for example, persuading decision-makers or generating buy-in from stakeholders— then DataKind might not be a good partner for that organisation. DataKind sees their role as using data to inform rather influence. To ensure they’re finding the right kind of problems to solve, DataKind conducts an intense scoping process to ensure that any potential projects and partnerships are a good fit.

The end result is a specifically-defined problem. For example, DataKind’s work with Global Witness, a nonprofit organisation that uses legal advocacy to fight environmental and human-rights abuses in the extractives industry. Making mining more humane is a complex problem, and during the scoping process, DataKind and Global Witness identified mine detection as a good target for machine learning.

Mining information to detect mines

To more easily detect certain kinds of mining activity the DataKind team collected satellite images of known mines and used them to train a machine learning model they built to identify other potential mines. After testing it on other satellite images and verifying with partners on the ground, the model was able to predict mine locations with over 95% accuracy.

Satellite imagery used to train the machine learning model (left) and the model’s output showing the mines’ locations (right). Credit Soubhik Barari.

By focusing on a bite-sized chunk of a large, complex problem, DataKind was able to provide Global Witness with actionable information that they could use to support their investigations and legal advocacy.

Given recent events calling attention to the potential negative uses of algorithms and machine learning, what is DataKind’s role in supporting ethical data science? Although DataKind provides an ethical orientation for volunteers, it’s not their ultimate mission to change the industry.
Jake maintains that DataKind’s greatest value added is not necessarily in the projects they work on, but in being a catalyst for future work (one organisation even hired a DataKind volunteer full time!).

When pressed, Jake explained that he would rather work with data scientists who already have a strong ethical foundation than go through the work of creating brand new ethical data scientists. Even though DataKind’s mission isn’t to explicitly create ethical data scientists, organisations like DataKind, and of course MIT, can play a role in training data scientists to apply cutting edge technical skills to do good in the world.