World Development symposium on RCTs

World Development has a great collection of short pieces on RCTs.

Here is Martin Ravallion’s submission: 

….practitioners should be aware of the limitations of prioritizing unbiasedness, with RCTs as the a priori tool-of-choice. This is not to question the contributions of the Nobel prize winners. Rather it is a plea for assuring that the “tool-of-choice” should always be the best method for addressing our most pressing knowledge gaps in fighting poverty.

… RCTs are often easier to do with a non-governmental organization (NGO). Academic “randomistas,” looking for local partners, appreciate the attractions of working with a compliant NGO rather than a politically sensitive and demanding government. Thus, the RCT is confined to what NGO’s can do, which is only a subset of what matters to development. Also, the desire to randomize may only allow an unbiased impact estimate for a non-randomly-selected sub-population—the catchment area of the NGO. And the selection process for that sub-sample may be far from clear. Often we do not even know what “universe” is represented by the RCT sample. Again, with heterogeneous impacts, the biased non-RCT may be closer to the truth for the whole population than the RCT, which is (at best) only unbiased for the NGO’s catchment area.

And here is David Mckenzie’s take: 

A key critique of the use of randomized experiments in development economics is that they largely have been used for micro-level interventions that have far less impact on poverty than sustained growth and structural transformation. I make a distinction between two types of policy interventions and the most appropriate research strategy for each. The first are transformative policies like stabilizing monetary policy or moving people from poor to rich countries, which are difficult to do, but where the gains are massive. Here case studies, theoretical introspection, and before-after comparisons will yield “good enough” results. In contrast, there are many policy issues where the choice is far from obvious, and where, even after having experienced the policy, countries or individuals may not know if it has worked. I argue that this second type of policy decision is abundant, and randomized experiments help us to learn from large samples what cannot be simply learnt by doing.

Reasonable people would agree that the question should drive the choice of method, subject to the constraint that we should all strive to stay committed to the important lessons of the credibility revolution.

Beyond the questions about inference, we should also endeavor to address the power imbalances that are part of how we conduct research in low-income states. We want to always increase the likelihood that we will be asking the most important questions in the contexts where we work; and that our findings will be legible to policymakers. Investing in knowing our contexts and the societies we study (and taking people in those societies seriously) is a crucial part of reducing the probability that our research comes off as well-identified instances of navel-gazing.

Finally, what is good for reviewers is seldom useful for policymakers. We could all benefit from a bit more honesty about this fact. Incentives matter.

Read all the excellent submissions to the symposium here.

People Are Brains, Not Stomachs

Alex Tabarrok over at MR has a fantastic summary of some of the works of this year’s three Nobel Prize winners in Economics. This paragraph on one of Michael Kremer’s papers stood out to me:

My second Kremer paper is Population Growth and Technological Change: One Million B.C. to 1990. An economist examining one million years of the economy! I like to say that there are two views of humanity, people are stomachs or people are brains. In the people are stomachs view, more people means more eaters, more takers, less for everyone else. In the people are brains view, more people means more brains, more ideas, more for everyone else. The people are brains view is my view and Paul Romer’s view (ideas are nonrivalrous). Kremer tests the two views. He shows that over the long run economic growth increased with population growth. People are brains.

Here is the abstract from Kremer’s QJE paper:

The nonrivalry of technology, as modeled in the endogenous growth literature, implies that high population spurs technological change. This paper constructs and empirically tests a model of long-run world population growth combining this implication with the Malthusian assumption that technology limits population. The model predicts that over most of history, the growth rate of population will be proportional to its level. Empirical tests support this prediction and show that historically, among societies with no possibility for technological contact, those with larger initial populations have had faster technological change and population growth.

Read Tabarrok’s entire post here. Highly recommended.

Since Sunday I’ve been asking around if the Prize got any mention on local radio in Busia, Kenya — the cradle of RCTs, if you will, and where Kremer conducted field experiments. No word yet. Will report if I hear anything.

TOMS impact evaluation finds zero to negative effects in El Salvador

This is from the Economist:

The first of two studies found that TOMS was not wrecking local markets. On average, for every 20 pairs of shoes donated, people bought just one fewer pair locally—a statistically insignificant effect. The second study also found that the children liked the shoes. Some boys complained they were for “pregnant women” and some mothers griped that they didn’t have laces. But more than 90% of the children wore them.

Unfortunately, the academics failed to find much other good news. They found handing out the free shoes had no effect on overall shoelessness, shoe ownership (older shoes were presumably thrown away), general health, foot health or self-esteem. “We thought we might find at least something,” laments Bruce Wydick, one of the academics. “They were a welcome gift to the children…but they were not transformative.”

More worrying, whereas 66% of the children who were not given the shoes agreed that “others should provide for the needs of my family”, among those who were given the shoes the proportion rose to 79%. “It’s easier to stomach aid-dependency when it comes with tangible impacts,” says Mr Wydick.

For a litany of criticisms of TOMS before the study see here, here, and here. The original study is available here.

Also, would anyone ever think that donating shoes, or even mining hard hats, to rural Kentucky would be “transformative”?

Anyway, huge props to TOMS for daring to scientifically study the impact of their ill-advised in-kind aid initiative.

On Field Experiments

Two quick thoughts:

  1. The world is a better place because more and more policymakers realize that evidence-based policymaking beats flying blind in the dark. Now if only we invested more in passing policy design, implementation, and evaluation skills to bureaucrats….
  2. Whenever academics get involved in field experiments, we typically try to maximize the likelihood of publication (see Humphreys below). But what is good for journal reviewers may not always be useful for policymakers. This is not necessarily a bad thing. We just need to be up front about it, and have it inform our evaluation of the ethics of specific interventions.

Below are some excellent posts (both old and new) on the subject.

NYU’s Cyrus Samii:

Whether one or another intervention is likely to be more effective depends both on the relevant mechanisms driving outcomes and, crucially, whether the mechanisms can be meaningfully affected through intervention. It is in addressing the second question that experimental studies are especially useful. Various approaches, including both qualitative and quantitative, are helpful in identifying important mechanisms that drive outcomes. But experiments can provide especially direct evidence on whether we can actually do anything to affect these mechanisms — that is, experiments put “manipulability” to the test.

Columbia’s Chris Blattman:

I’m going to go even further than Cyrus. At the end of the day, the great benefit of field experiments to economics and political scientists is that it’s forced some of the best social scientists to try to get complicated things done in unfamiliar places, and deal with all the constraints, bureaucrats, logistics, and impediments to reform you can imagine.

Arguably, the tacit knowledge these academics have developed about development and reform will be more influential to their long run work and world view than the experiments themselves.

Columbia’s Macartan Humphreys on the ethics of social experimentation:

Social scientists are increasingly engaging in experimental research projects of importance for public policy in developing areas. While this research holds the possibility of producing major social benefits, it may also involve manipulating populations, often without consent, sometimes with potentially adverse effects, and often in settings with obvious power differentials between researcher and subject. Such research is currently conducted with few clear ethical guidelines. In this paper I discuss research ethics as currently understood in this field, highlighting the limitations of current approaches and the need for the construction of appropriate ethics, focusing on the problems of determining responsibility for interventions and assessing appropriate forms of consent.

…. Consider one concrete example where many of the points of tension come to a head. Say a researcher is contacted by a set of community organizations that want to figure out whether placing street lights in slums will reduce violent crime. In this research the subjects are the criminals but seeking informed consent of the criminals would likely compromise the research and it would likely not be forthcoming anyhow (violation of the respect for persons principle); the criminals will likely bear the costs of the research without benefitting (violation of the justice principle); and there will be disagreement regarding the benefits of the research—if it is effective, the criminals in particular will not value it (producing a difficulty for employing the benevolence principle). Any attempt at a justification based on benevolence gives up a pretense at neutrality since not everyone values outcomes the same way. But here the absence of neutrality does not break any implicit contract between researchers and criminals. The difficulties of this case are not just about the relations with subjects however. Here there are also risks that obtain to nonsubjects, if for example criminals retaliate against the organizations putting the lamps in place. The organization may be very aware of these risks but be willing to bear them because they erroneously put faith in the ill-founded expectations of researchers from wealthy universities who are themselves motivated in part to publish and move their careers forward.

University of Maryland’s Jessica Goldberg (Africanists, read Golberg’s work):

Researchers have neither the authority nor the right to prohibit a control group from attending extra school, and they cannot require attendance from the treatment group. Instead, researchers randomly assign some study participants to be eligible for a program, such as tutoring.  Those in the control group are not eligible for the tutoring provided by the study, but they are not prohibited from seeking out tutoring of their own.

The difference may seem subtle, but it is important.  The control group is not made worse off or denied access to services it would have been able to access absent the experiment. It might not share in all of the benefits available to the treatment group, but that disadvantage is not necessarily due to the evaluation.

Georgetown’s Martin Ravallion:

I have worried about the ethical validity of some RCTs, and I don’t think development specialists have given the ethical issues enough attention. But nor do I think the issues are straightforward. So this post is my effort to make sense of the debate.

Ethics is a poor excuse for lack of evaluative effort. For one thing, there are ethically benign evaluations. But even focusing on RCTs, I doubt if there are many “deontological purists” out there who would argue that good ends can never justify bad means and so side with Mulligan, Sachs and others in rejecting all RCTs on ethical grounds. That is surely a rather extreme position (and not one often associated with economists). It is ethically defensible to judge processes in part by their outcomes; indeed, there is a long tradition of doing so in moral philosophy, with utilitarianism as the leading example. It is not inherently “unethical” to do a pilot intervention that knowingly withholds a treatment from some people in genuine need, and gives it to some people who are not, as long as this is deemed to be justified by the expected welfare benefits from new knowledge.

Traditional birth attendants and antenatal care in western Kenya

This paper examines the extent to which locally informed intermediaries can be exploited and provided with incentives to change the health-seeking behavior of pregnant women in rural Kenya. Despite Kenya being the largest and most advanced economy of East Africa, maternal and infant health outcomes are typical for those of other sub-Saharan countries, which lag significantly behind the developed world. There is evidence that antenatal care (ANC) is associated with improved maternal health outcomes, yet the majority of women in rural Kenya fail to meet recommendations for ANC timing and use, despite the availability of government subsidized healthcare. I examine whether a local intermediary, whose own incentives might oppose those of the government, can be co-opted to assist the government’s objective of increasing women’s ANC utilization.

I use a randomized controlled trial (RCT) to evaluate a program, which provides financial incentives for TBAs to encourage pregnant women to seek ANC at a formal medical facility. Competition between the TBAs and the formal clinics makes the effect of the program an empirical question, as there is no guarantee that the TBAs will respond to the incentive.

I find that living in a TBA treatment village increases the likelihood of attending the recommended number of visits by 20.7%. Women living in TBA treatment villages are 4.4 percentage points more likely to attend the recommended number of visits than women living in control villages, who attend the recommended number of visits 21.3% of the time. The results of this experiment, the first to study the extent to which TBAs can be motivated to encourage women to attend the prenatal clinic, could have important policy implications. The program’s success suggests that despite having a risk of losing clients, TBAs can be utilized as intermediaries of health facilities. Furthermore, finding that TBAs can induce pregnant women to attend ANC visits indicates that cultural norms, which discourage women going to ANC visits, can be overcome with relatively small financial incentives. By increasing the demand for formal maternal healthcare, TBAs’ encouragement of ANC attendance by women may help achieve improved maternal and child health outcomes.

That’s Georgetown’s Nisha Rai, in an excellent paper on the possibilities of integrating the use of traditional birth attendants with the formal healthcare system in Kenya (and developing countries in general). You can find a summary of the paper at the Bank’s Development Impact blog here.

If you know a policymaker in the health ministry of a developing country, please have them read this paper.

A call for “politically robust” evaluation designs

Heather Lanthorn cites Gary King et al. on the need for ‘politically robust’ experimental designs for public policy evaluation:

scholars need to remember that responsive political behavior by political elites is an integral and essential feature of democratic political systems and should not be treated with disdain or as an inconvenience. instead, the reality of democratic politics needs to be built into evaluation designs from the start — or else researchers risk their plans being doomed to an unpleasant demise. thus, although not always fully recognized, all public policy evaluations are projects in both political science and political science.

The point here is that what pleases journal reviewers is seldom useful for policymakers.

H/T Brett

Can RCTs be useful in evaluating the impact of democracy and governance aid?

The Election Guide Digest has some interesting thoughts on the subject. Here is quoting part of the post:

The use of the RCT framework resolves two main problems that plague most D&G evaluations, namely the levels-of-analysis problem and the issue of missing baseline data. The levels-of-analysis problem arises when evaluations link programs aimed at meso-level institutions, such as the judiciary, with changes in macro-level indicators of democracy, governance, and corruption. Linking the efforts of a meso-level program to a macro-level outcome rests on the assumption that other factors did not cause the outcome.

An RCT design forces one to minimize such assumptions and isolate the effect of the program, versus the effect of other factors, on the outcome. By choosing a meso-level indicator, such as judicial corruption, to measure the outcome, the evaluator can limit the number of relevant intervening factors that might affect the outcome. In addition, because an RCT design compares both before/after in a treatment and control group, the collection of relevant baseline data, if it does not already exist, is a prerequisite for conducting the evaluation. Many D&G evaluations have relied on collecting only ex-post data, making a true before/after comparison impossible.

Yet it would be difficult to evaluate some “traditional” D&G programs through an RCT design. Consider an institution-building program aimed at reforming the Office of the Inspector General (the treatment group) in a country’s Ministry of Justice. If the purpose of the evaluation is to determine what effect the program had on reducing corruption in that office, there is no similar office (control group) from which to draw a comparison. The lack of a relevant control group and sufficient sample size is the main reason many evaluations cannot employ an RCT design.

More on this here.

food for thought

UPDATE: Gelman responds with the question: Why are there IRB’s at all?

Ted Miguel and other similarly brilliant economists and political scientists (in the RCT mold) are doing what I consider R&D work that developmental states ought to be doing themselves. Sometimes it takes intensive experimental intervention to find out what works and what doesn’t. The need for such an approach is even higher when you are operating in a low resource environment.

That said, I found the points on this post from monkey cage (by Jim Fearon of my Dept.) to be of great importance:

Why is there nothing like an IRB for development projects?   Is it that aid projects are with the consent of the recipient government, so if the host government is ok with it then that’s all the consent that’s needed?  Maybe, but many aid-recipient governments don’t have the capacity to conduct thorough assessments of likely risks versus benefits for the thousands of development projects going on in their countries.  That’s partly why they have lots of aid projects to begin with.

Or maybe there’s no issue here because the major donors do, in effect, have the equivalent of IRBs in the form of required environmental impact assessments and other sorts of impact assessments.  I don’t know enough about standard operating procedures at major donors like the World Bank, USAID, DFID, etc, to say, really.  But it’s not my impression that there are systematic reviews at these places of what are the potential political and social impacts of dropping large amounts of resources into complicated local political and social situations.

You can find the rest of the blog post here.

Look here for more information on RCTs.