WB style impact evaluation ... nothing like the CA approach!

The longer I work around the international relief and development community, the more concerned I become about progress and performance. A few days ago I reviewed some material posted by ODI in the UK about impact evaluation the World Bank and academic way!

Some of the text is reproduced below. My take-away from this presentation and discussion is simply that the experts are doing more and more to build a system of ineffective measurement without getting to grips with the basic problems that are constraining success.

Bluntly put ... the experts do not seem to have a handle on the problem ... let alone being in a position to recommend a solution. The World Bank and Academia are not getting to grips with management information but working with tiny datasets a very long way from any specific problem using esoteric statistics that is almost certain to generate completely unreliable conclusions.

My reality test is very simple ... I feel comfortable about the data when those involved with the analysis can tell me what things cost, and the answers seem to make sense. I feel even better when they are also able to articulate what impact there has been ... and they understand cost effectiveness ... that is, how much value was realized for how much cost.

If the experts know about cost and cost effectiveness ... and know about the behavior of cost and the behavior of cost effectiveness ... then there is the basis for a useful conversation. The problem with what follows is that the experts do not sound very much as if these metrics are part of their vocabulary.

I would love to be wrong ... because the improvement of development will only take place when cost and cost effectiveness are at the center of impact evaluation. I realize that my observations open me to criticism ... and the excuse that cost and cost effectiveness does not apply in sectors like health and education. This is pure humbug ... cost needs to be understood in every sector and every situation ... and impact should be very clearly understood. If football players understand the way football is scored and the applicable rules ... then I would argue that educators and MDs should understand scoring and the applicable rules in their respective sectors.This is the report of the ODI event:
Using impact evaluation to improve development
4 May 2010

Alison Evans - Director, ODI posed the question: "How do you improve development effectiveness through better use of evidence?" Impact evaluation is a key means towards producing and using evidence to inform the design of development programmes. It is an important tool in building a stronger evidence base on effective development programmes and, in turn, improving development policy. Ariel Fiszbein, the Chief Economist in the Human Development Network at the World Bank, discussed how the World Bank are using impact evaluation and employing it to inform policy in the health, education and social protection sectors. This was followed by comment from Professor Costas Meghir

Event report

Alison Evans opened the meeting noting that impact evaluation (IE) is a topical subject, as it has been used in many different disciplines, especially over the past 10 to 20 years. IE has made an impact on the evidence that is used to build better understanding of development interventions and inform policy decisions in terms of which to suport. She also noted the particularly important role of IE in the field of conditional cash transfers (CCTs). She then gave the floor to the first speaker.

Ariel Fiszbein, Chief Economist, Human Development, World Bank (see paper):

In the spirit of Primo Levi, Ariel Fiszbein began his presentation by highlighting his approach to impact evaluation: one that is verifiable and demonstrable and grounded in humility towards the subject matter and towards what the unknowns. There is an increasing number of IEs that are being carried out by the World Bank (WB) in the area of human development (HD), particularly in the sectors of social protection, health and education. However, Mr. Fiszbein also acknowledged the importance of recognising the challenges of IEs, foremost among which are costs. It is therefore important to be selective and focus on areas where there is growing interest. In addition, for the purpose of obtaining information on human development, he highlighted the importance of relying on a stream (or programme) of evaluations, rather than individual evaluations. This, he continued, allows comparing results and building robust evidence for one case against another, particularly when varying certain parameters of IE in different contexts. The WB has seven priority thematic areas (see slides), but Mr. Fiszbein only expanded on three of them:

Conditional Cash Transfers (IE):
In running IEs on CCTs, the speaker distinguished between first generation evaluation questions and second generation ones. The former ask what sort of impact CCTs may have on HD outcomes (such as on consumption, poverty and enrolment levels). He noted that there has been mixed success of CCTs in improving final outcomes in health and education. He also emphasised that the role of second generation questions is to uncover the dynamics behind the relationship. For example, by considering what conditions are needed (such as conditional transfers or tailored payment schemes) to guarantee the success of CCT programmes.

Paying for Performance (P4P) in Health:
Mr Fiszbein first explained the nature of P4P programmes, noting that they usually pay a person or group of persons (often service providers like health workers or school principals) if certain conditions are met and/or results are achieved. This can take many forms. Focusing on health, Mr. Fiszbein highlighted the role of IE to inform the sort of impact on quality and quantity of service providers P4P might have. Taking the example of Rwanda, where a P4P scheme implemented bonus payments based on quantity and quality of maternal and child healthcare, he noted that the programme resulted in a 7.3% increase in the proportion of deliveries at hospitals or other health facilities. However, there was no effect on the number of prenatal care visits or immunisation rates, and the speaker argued second generation questions should be trying to understand whether a focus on demand-side initiatives will improve outcomes. By taking the example of Uganda, where performance bonuses did not work, Mr Fiszbein finished by highlighting his earlier comment that a set of evaluations is needed in order to answer more general questions about the worth of P4P schemes.

School Accountability:
The speaker started by listing different strategies that are used for improving learning outcomes, and asked how teachers can be more productive. He noted several modalities, for instance school-based management or paying for teacher’s performance. Evaluations have shown that, in certain countries such as Mexico – where grants and training were made to parent associations – there have been reductions in school failures. Evaluations, the speaker continued, can also help explain how accountability can be improved by providing more information to parents and local communities. But this requires further testing through multiple evaluations. To demonstrate the point, he highlighted varied results obtained in India, which showed that information campaigns could and could not have positive effects on education outcomes.

The speaker finished his presentation by reinforcing Primo Levi’s message that it is better to have evidenced and demonstrable data, even if this takes time and effort.

Costas Meghir, Professor Institute of Fiscal Studies:

After praising the work and investment of the WB in running IEs, Professor Meghir returned to the basics and gave different reasons why these are important: to guarantee value-for-money, to find out if policies work, to enhance knowledge of human behaviour and to ensure human development outcomes are not subject to political vagaries.

He continued by suggesting that well controlled environments (however hard to achieve) can help extract causes and effects of certain events. In particular, he highlighted the importance of finding out why a certain experiment worked better than another, as this will be crucial to build thorough policies. It is also important, he continued, to scale up those experiments, that is, to find if it works when variables (such as the environment) are changed. Yet one must also be aware of challenges affecting scale-ups, including peer effects (e.g. an experiment involving five children can generate different results when additional children are included), congestion effects (e.g improvement of literacy rates can be undermined by the quality and/or inadequate quantity of teachers), and changes to price/general equilibrium effects (educating children can have an impact on their wages, as well as those of adults and/or skilled people).

Professor Meghir concluded that evaluations should not simply look at mechanisms in isolation but rather they should be informed by well-defined multidimensional models. Although this might require more resources, he said, there is still a great amount of lessons that are ‘transferable’, and this should be subject to further scrutiny. In particular, since there might not be very good economic models for malaria or early childhood development (ECD), a lot of advances are needed on the experimental and theoretical side to achieve this ultimate goal.

Questions & Answers session.
A selection of questions that were put to the panellists and their responses are below:

How are IEs negotiated and in what sort of environment? Do IEs influence decision-making?
Negotiations are done in non-transparent ways and are not evidence-based. Although a stream of result might not change behaviours immediately, there has been a growing emphasis (particularly within the WB) on evidence-based results. This might create a culture of change in the long-run.

What are the costs associated with IEs, and who performs them?
Data collection incurs significant costs, and this can change from project to project. Depending on the size of the service to be evaluated as well as the country, grants can be made ranging between US$150,000 to US$300,000.

The second part of the question might point to the existence of an ‘extractive industry’ of researchers from the North. This highlights the importance of capacity building, which the WB puts a lot of emphasis on, for example by developing a training programme for researchers and academics in the South.

How can IEs move with their time, make a relevant contribution without becoming quickly outdated?
Knowledge doesn’t come quickly and/or easily, and therefore ideas/hypotheses that are put forward need to be tested. It is therefore important to choose relevant areas of interest, and there have been areas showing increasing amount of interest, such as ECD and school accountability.

How can IEs help look at outcomes rather than outputs, which many evaluations fail to do nowadays?
In order to avoid a ‘mechanistic’ type of IEs that measure outputs, one could think of this as a result chain, with intermediate and final outcomes. Yet, IE cannot be assumed to measure final outcomes for every project to determine whether it works or not. Instead, we should test meaningful ideas in order to build better knowledge on effective programmes and approaches.

When is it most suitable to use IEs?
IEs try creating a basis for validating a theory by linking particular types of interventions to interim outcomes to final outcomes. However, IEs will not provide a general framework under which to make decisions. These are always made under a certain amount of uncertainty. It is important that practitioners consider using IEs on a case-by-case basis, as they cannot always apply across a whole range of topics. In general, however, the growing areas of interest in human development alluded to earlier can be of help in doing this.

Are randomised control trials (RCTs), and IEs more broadly, just another fad?
It doesn’t appear so. To the contrary, what might be a fad are minor changes to variables that have little impact on the ultimate outcome. RCTs should help develop programmes that are scalable, thereby taking theory and empirical work into policy design.
