Local government finance is fundamental to modern society and deadly boring to almost everyone. In the United States, municipal bonds (debt taken out by towns and cities) fund everything from schools and roads to power and gas. Modern life would just not be possible without them and bond issues can shape a city for generations, and constrain its budget for as long. Municipal bonds are often the subject of public votes, at least in the US. But those votes have low turnout, because most people are turned away by the highly technical subject matter.
As a governance and civic tech practitioner, I was interested in seeing whether artificial intelligence and machine learning (AI/ML) could help process the technical language that surrounds municipal bonds and hence make them both easier to fund and easier to engage. As a first step towards that, I built a model to predict if a bond would default, using only public information–a hard task that, if solved, could be a first step towards predicting other outcomes of the bond. The first results are very promising, with a model able to predict 9 out of 10 future defaults, and encode the text of a bond’s prospectus in a way that offers promise for future work (the complete research paper is available online).
Municipal bonds: Boring, risky, and immensely important
In many legal codes, any new bond requires a public ballot. Every now and then, a proposed project becomes the subject of significant public debate. But that is rare–only a few thousand people may turn up to vote on hundreds of millions of dollars worth of bonds, and voter turnout on bond elections is, on average, below 15% of the electorate, even on school bonds. Municipal bonds are highly technical to most, involving complex language about durations and coupons and yields and purposes. So this fundamental element of democratic governance is, de facto, locked away from participation– even in the US, where municipal bonds have been issued since the country was founded.
But municipal bonds are also very boring to technical specialists, because they are standardised and often small in size. In the movie “The Big Short”, one character asks another, as if referring to a form of exile, “What are going to do, drop out and go work on municipal bonds?” Most banks and investors only pay attention to such bonds in the US because they have a favourable tax treatment. Regulators and the financial press pay even less attention, so some municipal bonds are treated as more risky in bank regulation than municipal bonds, despite mortgages defaulting thirty times more often. That’s partly why capital pours into expensive apartments more readily than into the infrastructure beneath them.
So, though they may appear boring, working on municipal bonds holds out the promise of both increasing the clinics, schools, renewable energy and much other infrastructure built in the world, and increasing democratic engagement with that construction. That’s just in the US. In the developing world, in countries where local governments can borrow in theory–such as South Africa and India–municipal bond markets barely exist in practice. There are many reasons for that, but among them are difficulties in evaluating the risk of the debt, as well as limited public appetite and engagement.
Predicting bond default rates with neural networks and machine learning
Municipal bonds are a great problem for AI/ML. First, there is a lot of data, and clean, standardized data too. The primary dataset has over 4 million bonds, financing over 440,000 projects. That’s hundreds of thousands of pieces of infrastructure across the US over fifty years, with codes for the project purpose, the state where it was undertaken, a project description and more. Admittedly, the does not say whether the project achieved its desired outcomes, such as whether the schools or clinics were built well and operated well, but they can at least tell if a project failed to the extent that the bond financing it defaulted.
Using this data, as part of my exploration of using AI to support democratic governance, I trained a neural network to predict whether a municipal bond would default or not. This involved some practical difficulties, because only one in a thousand bonds actually default. That means a model has to look for a needle in a haystack, and makes it more likely that a candidate model looks good in theory but performs badly in real life.
I also excluded the ratings given to municipal bonds by credit rating agencies—ratings usually only available to insiders, and with doubtful records. If a model could be trained to find the needle in a haystack that is a defaulting municipal bond, using only the information publicly available when the bond was first issued, then those predictions can be made easily available to citizens when considering a proposed bond. Moreover, the same model might be retrained for other questions of interest, or extended to developing markets.
Applying AI/ML techniques to the democratic governance— still work to do
The results were quite striking. In a held-out dataset of 400,000 bonds, with 400 defaults, a trained model was able to identify 90% of the bonds that would default at the time the bond was issued (technical details in the paper here). The key step was to use what are called “text embeddings” on the bonds’ project descriptions. That technique uses the power of large-scale models for natural language understanding, described in an earlier post on trying to make a model write Shakespeare, to turn chunks of text —in this case the project description —into vectors that encode highly contextual information as numbers. The AI model can then use those numbers, representing how the bond is described, along with project size and other characteristics, to make predictions. On examining the model’s working, we can tell that these “embeddings” are one of the most important features it considers, in combination with the state where a project takes place, the project sector and a few dimensions of the macro-economy at the time.
How does this all link to better understanding democratic governance? The basic techniques and ideas that I’ve used here have become a rich seam of research in my ongoing exploration. I have found that, unfortunately, the current generation of text-based models —even at the leading edge – are just not yet able to generate text that is useful in governance, or to handle very long legal and regulatory documents.
But they can encode the information found in fairly standard but technical language, like that in sovereign bonds and development finance, so that information can be combined to make predictions of potential outcomes. I’m exploring similar techniques to predict which World Bank projects make a difference to development, and which don’t, again with promising results. I’ll be posting a lot more about both that work, and further results on local government debt, in the last stretch of my time as a practitioner in residence at MIT.
Photo by Finn Gerkens on Unsplash.