(Photo by TheeErin, Flickr Creative Commons License).
How does my local government spend taxpayer money? What does this year’s budget breakdown for my town look like? Where can I find an online recap of that town hall meeting I missed?
These are questions that citizens might ask; however, for some, the answers are nowhere to be found. For example, one study found that the vast majority of New York’s largest counties failed to provide accessible information on public meetings, financial reports, and fiscal year budgets in 2014. Similar local government “report cards” exist for other states, but they are few and far between due to the amount of manual labor required for such audits. In other words, the overall state of local government transparency in the U.S. remains a mystery.
At MIT GOV/LAB, we created a big data framework to solve this problem. We developed an automated pipeline to scrape government websites and classify whether different types of information (e.g. meeting minutes, finance reports, annual budgets) are present. After finding that most off-the-shelf classification algorithms performed poorly in finding such specific information on websites, we developed our own algorithm from the ground up. Our approach is thorough: using a hierarchical machine learning algorithm, we predict “yes” or “no” on each site by combing through every HTML element on individual web-pages for evidence of each information type.
The result: transparency report cards for over 9,000 municipalities across nearly all 50 states. In this preliminary sample, we found that almost 70% of sites have posted up-to-date meeting agendas and meeting minutes. Just over 50% of townships made their most recent fiscal year budget available, while more than half of websites failed to post FOIA (Freedom of Information Act) request instructions, information about bids, or finance reports.
Some states clearly perform better than others, including coastal states with strong sunshine laws like California and Massachusetts. As a southern state, Texas performs exceptionally well, likely due to the Texas Comptroller Office’s comprehensive and visible transparency efforts in recent years. What explains low transparency for localities in other states? We hypothesize that bureaucratic capacity is a strong predictor of online transparency: a town that can afford competent IT staff will probably end up with a well-designed, easy-to-navigate, and informative website. We are currently working to collect enough transparency data as well as indicators of bureaucratic capacity in order to test this hypothesis.
The good news is that for many municipalities, government transparency has been improving over time. Using wayback machine, a public database of archived websites, we fed historical snapshots of the local government websites into our classification algorithm. So far, with nearly 2,000 sites back-tested, we find a significant increase in the percentage of websites in our sample containing agendas, budgets, and minutes between the years 2000 and 2015.
A history of transparency improvement in online government is promising — targeted interventions might further accelerate this trend. In the final stage of our project, we are asking: does making local governments aware of their transparency performance actually nudge them to improve? To answer this question, we plan on disseminating our predicted transparency grades (an A indicates the presence of all information types, while an F denotes the absence of all information) to government officials. Using our scraping pipeline, we can then re-evaluate these same sites over time to see whether grades improve. Additionally, we hope to open source our transparency data and measurement tools. Through collaboration and iteration, we aim to expand beyond the local and the U.S. in a global effort to improve online information transparency at all levels of government.
Soubhik Barari will be presenting this work as a research poster at PolMeth XXXV. For more information or to request the draft working paper, please contact either of the co-authors, sbarari@mit.edu or dhidalgo@mit.edu.