authorsEnrico Bonaiuti is Research Team Leader for Monitoring, Evaluation and Learning at CGIAR’s International Center for Agricultural Research in the Dry Areas (ICARDA), and Program Management Officer at the International Potato Center (CIP). He leads the Monitoring, Evaluation, Learning and Impact Assessment (MELIA) activities for the OneCGIAR Genetic Innovation Action Area. Etienne Vignola-Gagné is an analyst of public policy, innovation systems, and organizational change who currently works for research evaluation team Science-Metrix at Elsevier.

On February 27 and 28, 2023, the two experts attended a hybrid workshop – held in Rome, Italy and online – about the CGIAR’s Independent Advisory and Evaluation Service (IAES)’s new set of evaluation guidelines. These build on the CGIAR Independent Science for Development Council (ISDC)’s Quality of Research for Development (QoR4D) Frame of Reference, and provide the framing, criteria, dimensions, and methods for assessing QoR4D – both within CGIAR and in other like-minded organizations. The event was designed to help practitioners across and beyond the CGIAR System to understand and apply the new guidelines in their own evaluative contexts.

We spoke to Bonaiuti and Vignola-Gagné to find out more about their experience of the workshop and their aspirations in this arena going forward.

Q: How familiar were you with the ISDC and the Evaluation Function prior to the workshop?

Bonaiuti: Reasonably familiar. For one of CGIAR’s Action Areas – the one that takes care of genetic innovation – I act as the focal point for monitoring, evaluation, learning, and impact assessment, so I work closely with the ISDC. Having around 10 years of experience in the organization, I’ve also seen how things have changed and progressed over time.

Q: How much experience did you have in evaluating quality of science and QoR4D?

Bonaiuti: In my previous role at CGIAR, I was coordinating a couple of research programs (now called Initiatives). Both of these undertook evaluations, of which quality of science was one of the evaluation criteria. It was quite interesting, because the normal way that initiatives report on scientific production is thinking about journal articles that are indexed in the Web of Science. Recently, bibliometric analysis has become more popular, and so one of the later evaluations we did was also capturing additional metric like the citation index, and using new kinds of indicators like altmetrics.

Then, last year, I was part of a reference panel to accompany Science-Metrix in looking at new quality of science indicators that can be used, and they came up with a technical note on the topic. The interesting part for me would be now to see how feasible this is for us to measure, because one of the biggest challenges is always to see if it's cost effective to measure a particular kind of indicator. The good thing is that that document has a very long list of indicators, so this will allow us to find the right balance.

Also, just before the workshop, a colleague and I sent a peer assessment of these indicators to the [CGIAR] evaluation function. We categorized them in terms of indicators that can be easily done by the monitoring officer or can be automated; those that are required to be implemented by external evaluators; and those that require a company that has a more structured way to do them. We’re planning to run a pilot this year to see if these indicators are feasible.

Vignola-Gagné: The Science-Metrix team provides bibliometrics assessments across a broad spectrum of fields of science and institutional contexts (evaluations of funding programs mostly, but also of research centers and universities). One day we work on bibliometrics of a department focusing on quantum computing, and the next on the societal impacts of a major oncology research funding program. So, it’s a huge space of actors in the research sector to be aware of!

When we became aware of ISDC’s call for a bibliometrics design study in the fall of 2021, we had worked for programs with a transdisciplinary or transformative focus (Belmont Forum, Sentinel North at Université Laval, certain components of FP7 and H2020) or with a research-for-development orientation (GIZ, IDRC), but were not yet familiar with CGIAR’s work.

One thing that eased our way into knowing the program better was our prior contributions to mixed methods evaluations. Doing joint, concrete works on tools such as data collection matrices really helped us connect the dots between our prior concepts and practices and the Quality of Science and QoR4D analytical grids.

Q: Were your expectations met? Did you have any unanticipated learnings?

Bonaiuti: Yes. For a start, I think there was the right mix of participants. I think one good aspect of the evaluation function is to bring different people, different opinions, and different experiences from other organizations into the CGIAR network. Having multiple organizations participating, and only a few people from the CGIAR, helped us to really embrace these knowledge-sharing and reflection objectives.

I also believe the agenda was very well organized. As usual, I would say that an extra day would have been helpful – but I think that if there had been that extra day, then I would still have said the same! But the facilitator did a great job at timekeeping, and there were also many opportunities for discussion over lunch and dinner, and during the coffee breaks.

I also appreciated getting to meet with people in person, because we've been meeting remotely for a long time with COVID and so on. I realize this may have been more difficult for other participants – for me, since I live in Italy just a few hours away from Rome, it was very convenient and much more valuable than just connecting online.

Vignola-Gagné: In terms of organization, and connecting from Canada, I was really impressed by the seamless integration of virtual platforms within the hybrid format. The flow was perfect and interactive activities treated virtual and in-person participants equally.

In terms of content, the Guidelines and the workshop have really gotten me thinking more about the process evaluation versus performance evaluation distinction. I’ve been toying with the approach of a process-oriented bibliometrics practice. We’ve been moving in that direction for a time, the basic pieces were already in place, but that distinction is really helping me connect the dots and put a name to it.

Q: What did you find the most exciting (eg. sessions, people, case studies)?

Bonaiuti: One thing was the presentations on the ways that colleagues from other organizations are approaching the topic – particularly those that had worked with the CGIAR before and were able to put their views into our context. I also enjoyed the sessions that were dedicated to group work, and the fact that each group was composed of people from various organizations.

Vignola-Gagné: The one thing you do miss when joining virtually is the full set of interactions that make up an in-person conference, so I can only comment on the formal sessions I attended. Coming in from outside CGIAR and still learning about CGIAR research, I was actually very impressed with some of the more traditional presentations by seasoned CGIAR researchers that tracked the deep history of societal engagement at the centers. Having encountered some scepticism that research could really actively focus on improving our societies and environments recently, it was reinvigorating to see research teams that actively want and do engagement, rather than see this as just a funder’s requirement.

Q: What will you take forward into your work? Where/how?

Bonaiuti: I will take forward my curiosity about testing the indicators that were proposed by Science-Metrix, and seeing whether they can be easily tracked.

I would also like to think more about how those indicators can be communicated, because one of the main problems that I think we face in general with the scientific community is that the evaluation is always something that comes at the end. Even if we try to do evaluation in real time, we’re not always paying attention to the real added value of an evaluation to learn something or to get something back that can help us to better manage our research programs.

So, one of the things I want to bring back is to reflect more on how to frame these indicators in a way that clearly shows colleagues how they can help them in their work. Evaluation is not just about somebody else wanting to evaluate you: it’s also about wanting to evaluate yourself to then improve.

Vignola-Gagné:

I agree with Enrico that there is a general need for better practices in communicating bibliometrics findings to researchers and other stakeholders. The problem is compounded by the familiarity of many with author-level or publication-level metrics. You need different interpretative habits when considering how your teams’ outputs contribute to more collective achievements and outcomes, at the program level.

I also agree that the different timelines for outputs, outcomes and impact can be confusing. When considering your next move as a researcher or program manager, you need to take simultaneously integrate evidence in process from ongoing activities, outputs from projects concluded two years ago, outcomes from projects concluded five years ago, and societal impacts from projects concluded ten years ago or more. There is no easy way forward that I see here to tackle this challenge, but as we get better at capturing and communicating the long-term societal achievements of research, I think this will progressively raise interest in understanding this complexity.

Q: Are there any other ways in which you would like to continue engaging with IAES/Evaluation and CGIAR on this topic, and around the Guidelines? If so, what?

Bonaiuti: I would love to support similar workshops to take place – at least a couple of times a year – where we bring in people with different views. It's a very rich experience; sometimes I think we put so much focus on the to-do list that we don't use the opportunities to learn ourselves.

Vignola-Gagné: I think there is a time for crafting your evaluation design, but there is also much to learn from trying things out in practice, which is what is starting to happen now with the bibliometrics recommendations. I would very much like to see how new bibliometrics indicators deployed by CGIAR can foster organizational learning and help design improved research programs. We would get to learn from this as much as CGIAR does, and could use the retroaction as input when designing our own improved indicators and evaluation strategies.

More broadly, I think CGIAR programs are really exemplary of the transdisciplinary and transformative approach to doing research that we are seeing increasingly emphasized by funders globally. I’ve been wondering whether CGIAR journal publications could provide a good empirical object for testing out ideas about how research program administration can foster societal-readiness and societal outcomes of research.

Q: What are your suggestions for how best to engage with CGIAR managers and scientists to improve the knowledge of and use of bibliometrics for evaluating quality of science?

Bonaiuti:  It is always important to roll out additional indicators with attention to the data providers and value for money. To get CGIAR colleagues to improve their knowledge and use of bibliometrics, we need to clearly define the business questions these indicators address, and how the results are turned into practical and actionable recommendations. It is important to consider the context in which CGIAR colleagues operates, and the incentive for such changes.

Vignola-Gagné: CGIAR is now entering an important piloting phase for its new wave of bibliometric indicators. Interpretation is a crucial step in any bibliometric evaluation – you’re arguably worse off with a poor interpretation of robust findings than very cautious interpretation of imperfect findings. Therefore, the test here is to make sure the subject matter experts or other peer reviewers that will be the direct users of the bibliometrics find them informative and insightful, but also realize their limitations. CGIAR managers, as the ultimate users of the bibliometrics (with findings integrated within subject matter expert assessments), should also have clarity on which CGIAR funding or support mechanisms are actually being assessed through the bibliometric evaluation.

To give one example of the need for caution in interpretation, the indicator of policy-related uptake computed from the Overton database is a great way to capture some (but by no means all) of the impacts of research on policymaking activities. But you need to deploy a broad definition of what policymaking is, spanning the spectrum from evidence synthesis written by researchers to proper parliamentary activities. In practice, legislative texts seldom directly cite journal publications – there is a chain of information processing and synthesis that takes place before that, which may not be linear and might conflate datapoints from multiple scientific sources. Parliamentary commissions may call in researchers as expert witnesses rather than citing their work. So the indicator of policy-related uptake is likely to capture CGIAR journal publications’ citations in World Bank or IPCC evidence syntheses; and that’s fine – these are small steps and building blocks towards eventual policy change. But evaluators absolutely need to be aware of this to achieve correct interpretations of the findings.

Q: Which do you think are the most useful bibliometrics to evaluate quality of science in the context of R4D?

Bonaiuti: Among the suggested indicators proposed by Science-Metrix, I found the analysis of authorship diversity from an organizational point of view to be particularly interesting and potentially useful. Together with altmetrics, this could more prominently portray behaviour change in different organizations when they are involved in knowledge generation from the outset. The monitoring team can check regularly as to whether those actors do anything different over time as a result of co-publishing a knowledge product.

Vignola-Gagné: I think going forward we’ll have to expand considerably the panel of indicators we tend to look at – we can’t consider just two or three dimensions at a time if we are to assess the societal outcomes of research and the multiple pathways in which they may manifest. You’d want to simultaneously consider gender and South-North equity in publication authorship, cross-disciplinarity, cross-sectorality, and pre-printing frequency, along with things like citations in policy documents or journalistic outlets. You would also want to develop an appreciation that some successful projects perform well on a few of these dimensions, but not necessarily all.