There was an interesting back and forth on Twitter a couple of weeks ago as one user vented frustration with the problem of evaluations making unjustifiable quantitative statements based on qualitative analysis. Then another user brought up bias (institutional and personal) which opened a whole other can of worms (or Pandora’s box, if you’re squeamish). As someone who used to commission evaluations and then later do them myself, I was interested to learn more about bias and evaluation results. Fortunately, the kind people of Twitter never fail to provide solid academic research links to their arguments (for once, I’m not being sarcastic).
I wondered about how to go about framing this article so that it wasn’t incredibly boring to the average reader: ‘bias’ ‘qualitative’ ‘quantitative’ ‘methodologies’ – brings you right back to your polisci statistics classes that sometimes (ok, often) felt like near death experiences. And then I remembered the four incredible (incredulous?) months I spent consulting for a small NGO which cared little for things like evidence and global definitions of terms like ‘impact’. It’s an excellent place to start.
I’m not going to name names, that’s unprofessional. But I will say this: the number of reports that I had to read and then try to rewrite making claims of ‘impact’ with no evidence whatsoever nearly pushed me over the edge. There were a number of reasons behind this but it largely resulted from a shockingly weak activity monitoring system. At this point, I want to emphasize the difference between activities and results. It is absolutely crucial that, in this day and age of being accountable for the money actually effecting some sort of change with the money that is spent, we don’t say ‘we did this training’ and call our intervention a success. Seriously. When we implement activities as basic as trainings, we are doing so with a purpose. We want the participants to learn something or be better at something and we need, therefore, to focus on whether they actually learned and improved, not just if they attended. So we follow-up at a later date to find out from the participants if a) do they remember what they learned at the training, b) do they use this new knowledge at all, and c) do they have examples (because examples can provide evidence of the on-going change being effected). For example, improvements in budgeting, or timely reporting, or understanding of issues in the community (meaning more effective responses). The objective of the activity, as part of a wider set of interventions, is to effect change or set change in motion, and we monitor to see if that change is actually happening – basically, gather the evidence of our success (or failure) and why.
I had to read and write reports based on activities with zero information on the results of those activities. And then my supervisors would change some of the text to focus on ‘impact’ – for example, being mentioned in regional media or releasing a report. In my world, that would be a ‘well done’ as opposed to an impact. I tried to explain this but to no avail. They wanted to be flexible about what an impact was. In fairness, even ODI admits in a recent paper that there is ambiguity and confusion about what impact is and how it should be defined, but not so much, I think, that simply stating what a good job we did can be interpreted as such. Even the most lazy of development practitioners knows that something resembling evaluation must be involved.
So what is evaluation, then? According to one (frustrated) blogger, evaluations may well be a waste of time because we fail to link them to all of the hard work and results of our monitoring efforts. More often than not, evaluation questions start with intervention logic and are followed by vague or open-ended questions that may (but likely not) demonstrate some sort of contribution by the intervention to change (ie: impact). Truthfully, I agree in full with the claim that we should already have a good idea of the impact that the intervention has made based on good monitoring data (if monitoring has been undertaken to track change effected rather than activities undertaken). But generally this doesn’t happen, so evaluation results end up emanating from poorly formed qualitative questions that are later transformed into quantitative data that is inherently biased towards ‘finding a positive result or a big (if not statistically significant) effect’ that is interpreted as impact, with loose justifications or evidence for contribution (and sometimes attribution) for the project. You can read more about bias and methodologies here.
An example of an evaluation question that starts at zero rather than with properly analyzed monitoring data comes from our frustrated blogger (above): “How does capacity building for better climate risk management at the institutional level translate into positive changes in resilience?” Or one of my own equally ambiguous questions that I later attempted to quantify using a scale: “How did the partnership strategy contribute to the progress towards the outcome?” Yikes. I know.
I have meandered through this argument but the point is this: if we want development work to be taken seriously, then we also need to take it seriously. We need to walk the talk. M&E cycles? Then make damn sure monitoring data is the basis for evaluations and not just something we do to pretend to be accountable. Let’s talk about results in terms of the real change we effect, whether at activity or institutional level. Let’s stop being so flexible about what an ‘impact’ is in order to make us look good. Let us prove that there is actually an impact.
On the other hand, it seems we development people can be slow learners, so the blogosphere need not worry about finding something new to be frustrated about. But perhaps we can start thinking about it;)