More on the apparently *transient* effects of unconditional cash transfers

Berk Ozler over at Development Impact has a follow up post on GiveDirectly’s three-year impacts. The post looks at multiple papers analyzing results from the same cash transfer RCT in southwestern Kenya:

First, on the initial studies:

On October, 31, 2015, after the release of the HS (16) working paper in 2013, but before the eventual journal publication of HS (16), Haushofer, Reisinger, and Shapiro released a working paper titled “Your Gain is My Pain.”  In it, they find large negative spillovers on life satisfaction (a component of the psychological wellbeing index reported in HS 16) and smaller, but statistically significant negative spillovers on assets and consumption. The negative spillover effects on life satisfaction, at -0.33 SD and larger than the average benefit on beneficiaries, imply a net decrease in life satisfaction in treated villages. Furthermore, the treatment (ITT) effects are consistent with HS (16), but the spillover effects are not. For example, the spillover effect on the psychological wellbeing index in Table III of HS (16) is approximately +0.1, while Table 1 in HRS (15) implies an average spillover effect of about -0.175 (my calculations: -0.05 * (354/100)). There appear to be similar discrepancies on the spillovers implied for assets and consumption in the HRS (15) paper and HS (16). I am not sure what to make of this, as HRS (15) is an unpublished paper – there must [be] a good explanation that I am missing. Regardless, however, these findings of negative spillovers foreshadow the three-year findings in HS (18), which I discuss next.

Then on the three-year findings:

As I discussed earlier this week, HS (18) find that if they define ITT=T-S, virtually all the effects they found at the 9-month follow-up are still there. However, if ITT is defined in the more standard manner of being across villages, i.e. ITT=T-C, then, there is only an effect on assets and nothing else.

… As you can see, things have now changed: there are spillover effects, so the condition for ITT=T-S being unbiased no longer holds. This is not a condition that you establish once in an earlier follow-up and stick with: it has to hold at every follow-up. Otherwise, you need to use the unbiased estimator defined across villages, ITT=T-C.

To nitpick with the authors here, I don’t buy that [….] lower power is responsible for the finding of no significant treatment effects across villages. Sure, as in HS (16), the standard errors are somewhat larger for across-village estimates than the same within-village estimates. But, the big difference between the short- and the longer-term impacts is the gap between the respective point estimates in HS (18), while they were very stable (due to no/small spillovers) in HS (16). Compare Table 5 in HS (18) with Appendix Table 38 and you will see. The treatment effects disappeared, mainly because the differences between T and C are much smaller now, and even negative, than they were at the nine-month follow-up.

And then this:

If we’re trying to say something about treatment effects, which is what the GiveDirectly blog seems to be trying to do, we already have the estimates we want – unbiased and with decent power: ITT=T-C. HS (18) already established a proper counterfactual in C, so just use that. Doesn’t matter if there are spillovers or not: there are no treatment effects to see here, other than the sole one on assets. Spillover estimation is just playing defense here – a smoke screen for the reader who doesn’t have the time to assess the veracity of the claims about sustained effects.

Chris has a twitter thread on the same questions.

Bottom line: we need more research on UCTs, which GiveDirectly is already doing with a (hopefully) better-implemented really long-term study.

 

 

On Field Experiments

Two quick thoughts:

  1. The world is a better place because more and more policymakers realize that evidence-based policymaking beats flying blind in the dark. Now if only we invested more in passing policy design, implementation, and evaluation skills to bureaucrats….
  2. Whenever academics get involved in field experiments, we typically try to maximize the likelihood of publication (see Humphreys below). But what is good for journal reviewers may not always be useful for policymakers. This is not necessarily a bad thing. We just need to be up front about it, and have it inform our evaluation of the ethics of specific interventions.

Below are some excellent posts (both old and new) on the subject.

NYU’s Cyrus Samii:

Whether one or another intervention is likely to be more effective depends both on the relevant mechanisms driving outcomes and, crucially, whether the mechanisms can be meaningfully affected through intervention. It is in addressing the second question that experimental studies are especially useful. Various approaches, including both qualitative and quantitative, are helpful in identifying important mechanisms that drive outcomes. But experiments can provide especially direct evidence on whether we can actually do anything to affect these mechanisms — that is, experiments put “manipulability” to the test.

Columbia’s Chris Blattman:

I’m going to go even further than Cyrus. At the end of the day, the great benefit of field experiments to economics and political scientists is that it’s forced some of the best social scientists to try to get complicated things done in unfamiliar places, and deal with all the constraints, bureaucrats, logistics, and impediments to reform you can imagine.

Arguably, the tacit knowledge these academics have developed about development and reform will be more influential to their long run work and world view than the experiments themselves.

Columbia’s Macartan Humphreys on the ethics of social experimentation:

Social scientists are increasingly engaging in experimental research projects of importance for public policy in developing areas. While this research holds the possibility of producing major social benefits, it may also involve manipulating populations, often without consent, sometimes with potentially adverse effects, and often in settings with obvious power differentials between researcher and subject. Such research is currently conducted with few clear ethical guidelines. In this paper I discuss research ethics as currently understood in this field, highlighting the limitations of current approaches and the need for the construction of appropriate ethics, focusing on the problems of determining responsibility for interventions and assessing appropriate forms of consent.

…. Consider one concrete example where many of the points of tension come to a head. Say a researcher is contacted by a set of community organizations that want to figure out whether placing street lights in slums will reduce violent crime. In this research the subjects are the criminals but seeking informed consent of the criminals would likely compromise the research and it would likely not be forthcoming anyhow (violation of the respect for persons principle); the criminals will likely bear the costs of the research without benefitting (violation of the justice principle); and there will be disagreement regarding the benefits of the research—if it is effective, the criminals in particular will not value it (producing a difficulty for employing the benevolence principle). Any attempt at a justification based on benevolence gives up a pretense at neutrality since not everyone values outcomes the same way. But here the absence of neutrality does not break any implicit contract between researchers and criminals. The difficulties of this case are not just about the relations with subjects however. Here there are also risks that obtain to nonsubjects, if for example criminals retaliate against the organizations putting the lamps in place. The organization may be very aware of these risks but be willing to bear them because they erroneously put faith in the ill-founded expectations of researchers from wealthy universities who are themselves motivated in part to publish and move their careers forward.

University of Maryland’s Jessica Goldberg (Africanists, read Golberg’s work):

Researchers have neither the authority nor the right to prohibit a control group from attending extra school, and they cannot require attendance from the treatment group. Instead, researchers randomly assign some study participants to be eligible for a program, such as tutoring.  Those in the control group are not eligible for the tutoring provided by the study, but they are not prohibited from seeking out tutoring of their own.

The difference may seem subtle, but it is important.  The control group is not made worse off or denied access to services it would have been able to access absent the experiment. It might not share in all of the benefits available to the treatment group, but that disadvantage is not necessarily due to the evaluation.

Georgetown’s Martin Ravallion:

I have worried about the ethical validity of some RCTs, and I don’t think development specialists have given the ethical issues enough attention. But nor do I think the issues are straightforward. So this post is my effort to make sense of the debate.

Ethics is a poor excuse for lack of evaluative effort. For one thing, there are ethically benign evaluations. But even focusing on RCTs, I doubt if there are many “deontological purists” out there who would argue that good ends can never justify bad means and so side with Mulligan, Sachs and others in rejecting all RCTs on ethical grounds. That is surely a rather extreme position (and not one often associated with economists). It is ethically defensible to judge processes in part by their outcomes; indeed, there is a long tradition of doing so in moral philosophy, with utilitarianism as the leading example. It is not inherently “unethical” to do a pilot intervention that knowingly withholds a treatment from some people in genuine need, and gives it to some people who are not, as long as this is deemed to be justified by the expected welfare benefits from new knowledge.

A call for “politically robust” evaluation designs

Heather Lanthorn cites Gary King et al. on the need for ‘politically robust’ experimental designs for public policy evaluation:

scholars need to remember that responsive political behavior by political elites is an integral and essential feature of democratic political systems and should not be treated with disdain or as an inconvenience. instead, the reality of democratic politics needs to be built into evaluation designs from the start — or else researchers risk their plans being doomed to an unpleasant demise. thus, although not always fully recognized, all public policy evaluations are projects in both political science and political science.

The point here is that what pleases journal reviewers is seldom useful for policymakers.

H/T Brett

Can RCTs be useful in evaluating the impact of democracy and governance aid?

The Election Guide Digest has some interesting thoughts on the subject. Here is quoting part of the post:

The use of the RCT framework resolves two main problems that plague most D&G evaluations, namely the levels-of-analysis problem and the issue of missing baseline data. The levels-of-analysis problem arises when evaluations link programs aimed at meso-level institutions, such as the judiciary, with changes in macro-level indicators of democracy, governance, and corruption. Linking the efforts of a meso-level program to a macro-level outcome rests on the assumption that other factors did not cause the outcome.

An RCT design forces one to minimize such assumptions and isolate the effect of the program, versus the effect of other factors, on the outcome. By choosing a meso-level indicator, such as judicial corruption, to measure the outcome, the evaluator can limit the number of relevant intervening factors that might affect the outcome. In addition, because an RCT design compares both before/after in a treatment and control group, the collection of relevant baseline data, if it does not already exist, is a prerequisite for conducting the evaluation. Many D&G evaluations have relied on collecting only ex-post data, making a true before/after comparison impossible.

Yet it would be difficult to evaluate some “traditional” D&G programs through an RCT design. Consider an institution-building program aimed at reforming the Office of the Inspector General (the treatment group) in a country’s Ministry of Justice. If the purpose of the evaluation is to determine what effect the program had on reducing corruption in that office, there is no similar office (control group) from which to draw a comparison. The lack of a relevant control group and sufficient sample size is the main reason many evaluations cannot employ an RCT design.

More on this here.

food for thought

UPDATE: Gelman responds with the question: Why are there IRB’s at all?

Ted Miguel and other similarly brilliant economists and political scientists (in the RCT mold) are doing what I consider R&D work that developmental states ought to be doing themselves. Sometimes it takes intensive experimental intervention to find out what works and what doesn’t. The need for such an approach is even higher when you are operating in a low resource environment.

That said, I found the points on this post from monkey cage (by Jim Fearon of my Dept.) to be of great importance:

Why is there nothing like an IRB for development projects?   Is it that aid projects are with the consent of the recipient government, so if the host government is ok with it then that’s all the consent that’s needed?  Maybe, but many aid-recipient governments don’t have the capacity to conduct thorough assessments of likely risks versus benefits for the thousands of development projects going on in their countries.  That’s partly why they have lots of aid projects to begin with.

Or maybe there’s no issue here because the major donors do, in effect, have the equivalent of IRBs in the form of required environmental impact assessments and other sorts of impact assessments.  I don’t know enough about standard operating procedures at major donors like the World Bank, USAID, DFID, etc, to say, really.  But it’s not my impression that there are systematic reviews at these places of what are the potential political and social impacts of dropping large amounts of resources into complicated local political and social situations.

You can find the rest of the blog post here.

Look here for more information on RCTs.