Why AI in Evaluation, and Why Now?
Artificial Intelligence (AI) is no longer just a buzzword. It's already reshaping how we live, work, and (yes) evaluate. In international development, evaluation plays a critical role in generating evidence, guiding decisions, and ensuring accountability. But with increasing data complexity, limited time, and growing expectations for inclusivity and learning, many evaluators are asking: Can AI help us do this better?
These questions were central to our presentation at the recent gLOCAL event, where we shared this new resource with evaluators from around the world and engaged in rich discussions about the opportunities and challenges ahead. The session drew participants from over 20 countries, spanning Kenya to Germany, India to Mexico, including colleagues from various CGIAR centers and beyond.
What emerged from the conversation powerfully validated our underlying assumptions. The audience raised the fundamental concerns that evaluators worldwide are grappling with: How do we prevent AI from inventing information instead of retrieving it from reliable sources? What's the right approach to disclosing AI use in evaluations? How do we ensure consistent AI outputs regardless of who's asking? How effective is AI for analyzing raw qualitative data like interview transcripts? Has AI actually made evaluations more cost-effective?
These questions confirmed exactly what drove us to develop this resource: an urgent need for practical guidance and real-world examples that help evaluators navigate AI responsibly and effectively.
To that end, CGIAR’s Evaluation Function launches a new resource: Considerations and Practical Applications for the Evaluation Function of CGIAR.
This Methods Note is not intended as a guide, but rather as a conversation starter. It is designed to support staff, consultants, and partners of the Evaluation Function (EF) who are beginning to explore how to thoughtfully and responsibly integrate AI into evaluation work. To reiterate the rationale for this note: AI is already here — the real question is whether we are using it wisely.
Why Should Evaluators Care About AI?
Let’s start with the obvious: evaluation work is tough. As evaluators we juggle interviews, surveys, reports, messy data, and tight deadlines. AI tools promise to help lighten the load by offering transcription, translation, summarization, and analysis at speed and scale.
But it’s not just about effectiveness and efficiency.
AI can enhance the quality and inclusivity of evaluation. It can surface hidden patterns in complex datasets. It can help design more responsive surveys. It can even transform dense technical findings into accessible insights for diverse audiences. In short, AI opens doors—not just to faster work, but to better, deeper work.
And yet, the promise comes with pitfalls.
CGIAR’s own research has shown that AI can perpetuate bias—especially around gender roles in agriculture. A recent study testing large language models against questions posed by women farmers in India revealed troubling gaps. AI outputs often reinforced stereotypes, missed structural inequalities, and gave guidance that sounded helpful—but wasn’t grounded in real-world constraints.
So yes, AI can help. But only if evaluators remain in the driver’s seat.
What’s Inside the New AI Method Note?
The AI Method Note is designed for practical use. Whether you're testing AI for the first time or refining your existing workflows across evaluation phases, it offers tools, considerations, and examples to put to use right away.
Here’s what you’ll find:
- A primer on AI and Generative Artificial Intelligence (GenAI), a type of AI that creates new content such as text, images, videos, etc. by using models, like deep learning and neural networks: What it is, how it works, and why it matters for evaluation
- Ethical guidance: addressing bias, privacy, transparency, and human oversight
- Use cases across the evaluation cycle: from evaluation design to data collection, analysis, and dissemination
- Software and prompt examples: Helping you experiment with the right tools and approaches
- Supervision strategies: ensuring AI remains a support—not a shortcut
- A call to build ‘AI muscle’: encouraging experimentation, reflection, and shared learning
It’s not a checklist. It’s a toolkit. And it will evolve, just as the technology and policies around AI continue to evolve.
Where Can AI Add Real Value in Evaluation?
The method note outlines several areas where AI can make a meaningful difference. Here are just a few:
- Text Processing: AI can transcribe interviews, summarize long reports, and translate documents—making data more manageable and multilingual work more feasible.
- Evidence Management: Need to review 50 documents fast? AI tools can help you extract, compare, and synthesize evidence from multiple sources, building stronger analytical foundations.
- Evaluation Design: AI can suggest methods, generate theories of change, and co-create survey questions, adapting to your evaluation’s context and goals.
- Data Analysis: From sentiment analysis to predictive modeling, AI can help surface trends, visualize relationships, and interrogate assumptions—while allowing human evaluators to interpret results in context.
- Communication: AI helps create briefing notes, visuals, and even podcast scripts—so findings reach the right audiences in the right format.
In every case, the goal isn’t automation for its own sake. It’s smarter use of time and talent.
What Should We Watch Out For?
AI brings power—but also complexity and risk. The method note doesn’t shy away from hard questions:
- Who decides what’s “true” in AI outputs?
- Whose knowledge is prioritized—and whose is excluded?
- How do we manage privacy when feeding sensitive data into AI tools?
- What’s our role when AI-generated outputs sound confident but are just… wrong?
One of the most important ideas in the method note is that AI in evaluation requires supervision at multiple levels—not just for accuracy, but for ethics, fairness, and epistemology. It's not just about what AI tells us, but how it shapes what we think we know.
The note encourages evaluators to document how AI is used, test tools critically, and—crucially—never use AI tools before understanding what happens to the data.
So, Who Is This Note For?
This is for anyone in the evaluation ecosystem within and beyond CGIAR:
- Internal staff designing evaluations
- Consultants drafting frameworks or analyzing data
- Policymakers who commission evaluations
- Organizations building AI policies
Whether you're ready to integrate AI or just want to understand the implications, this note gives you a foundation for responsible, reflective practice.
Our Invitation to You
AI is changing the landscape of evaluation. But that doesn’t mean we should follow blindly. It means we need to lead—with curiosity, caution, and collaboration.
We invite you to explore the CGIAR AI method note, test the tools, and join the global conversation on what meaningful, ethical AI use in evaluation looks like.
Let’s not just adapt to AI. Let’s shape its role in our work—and ensure it serves the values of equity, participation, and learning that define great evaluation.
Download the Method Note: Considerations and Practical Applications for the Evaluation Function of CGIAR.
Explore the Evaluation Method Notes Resource Hub