(Screenshot of the digitization process that converts data from PDF documents to Excel data tables. Minh Trinh).

The quality of government data also provides reliable information about the quality and capacity of the government that produces this data. For example, when a government conducts a population census, accurate data can only come from areas where the government has the manpower to actually visit and interview individual households; missing numbers or obvious “guesstimates” on the other hand reveal areas where local governments lack this ability. Data availability also reflects a government’s attitude towards transparency, as transparent and readily available government data can allow citizens and civil society organizations to monitor government performance across multiple dimensions.

Under certain conditions, the quality of government data can reveal even more interesting accountability dynamics happening within the government. Specifically, under authoritarian regimes where bottom-up accountability from citizens and society plays a secondary role to top-down accountability from the regime leadership, metrics from government data becomes the yardstick by which agents of the regime are evaluated. In other words, government data can help measure how well lower-level bureaucrats, from provincial governors down to public school teachers, are performing. For example, provincial governors can be evaluated by the growth rate in their locality, whereas public school teachers may be ranked by the graduation rate in their classes.

In authoritarian regimes in general, and one-party regimes in particular, the task of collecting information from society and organizing it into government statistics to transmit upward to regime leaders is often delegated to lower-level bureaucrats. Not only are they large in number and widespread in presence, these agents also have direct access to information simply as a result of their job. Because they are in charge of information collection, and yet are being monitored by this same information, they have a stake in the accuracy of the very government statistics they produce.

To explore potential links between top-down accountability in an authoritarian regimes and the quality of government statistics, I am collecting a large dataset of official data from Vietnam with support from MIT GOV/LAB’s Seed Fund. Most of this data are official statistics produced at the province level, by provincial officials, to report to the central government in Hanoi.

Rapid urban development in Ho Chi Minh City makes data collection at the local level increasingly difficult (Minh Trinh).

A massive amount of upward reporting happens regularly within the Vietnamese bureaucracy. From a government conference I was fortunate to attend, I learned that all levels of government in Vietnam combined produce more than a million different reports every year, which include regular monthly updates on every possible socio-economic issue from economic performance to criminal activity, as well as responses to unscheduled requests for information from senior officials. Each province’s statistical yearbook, which comprises  all major statistics measured by the province for each calendar year, consist of hundreds of pages containing hundreds of different tables.

The amount of work that goes into producing these statistics – on top of bureaucrats’ daily responsibilities – is astonishing (some respondents say writing reports alone takes nearly 25% of their time). There is also potential for error, as some statistics are being requested by upper levels of government at frequencies greater than what the bureaucrats can realistically handle.

Yet, the potential for problems with data quality lies more with the reporting process than with the work of collection and calculation on the ground. A careful look at the statistical reports and yearbooks from all of Vietnam’s 63 provinces show that provincial governments, either through coordination across provinces or through top-down directives, have attempted to standardize the statistics they produce. Yet across provinces there is still significant incongruence over what and how statistics are reported. Comparison across provinces is therefore not reliable.

More importantly, the raw data also reveals patterns of alteration and post-hoc modification i.e. edits to the data after they have been collected. In the working files that I was able to access, evidence of unprincipled edits – e.g. a number crossed out and replaced by another without much justification – is not infrequent. Some statistics would go missing in a particular  publication year, but then resurface in later publication years with different values in different years. None of these are smoking guns, but they raise interesting questions about whether the statistics that reach the central government are an accurate reflection of what really is happening on the ground.

The central government, likely aware of this potential problem, has recently attempted to increase the extent of standardization across the provinces’ statistical reporting, an effort that is evident in more recent statistical yearbooks. Although this standardization does seem to give the central government more control over what information is being reported and reflects its willingness to monitor more closely the production of government statistics at lower levels, it remains to be seen whether the officials who produce the statistics still keep up with the demand from above.