Criteria to Judge Qualitative Research

There is a lot of uncertainty and confusion about how to judge the soundness and rigor of qualitative studies. Experienced researchers can read a report, article, or book based on qualitative evidence and conclude whether the work is of high quality, but such judgments are based less on a set of agreed-upon and standardized criteria among scholars, and more on vaguely articulated principles and an instinctual reading of the material. It is in this context that two U.S.-based sociologists – Mario L. Small and Jessica M. Calarco – have written a pathbreaking book, Qualitative Literacy: A Guide to Evaluating Ethnographic and Interview Research.

I read the book with a sense of relief and delight because finally I found a clear articulation of criteria that combines abstractions with practical tips, and ideas with excellent concrete examples. However, the guidance does not always apply in the context of development projects because academic timelines and constraints are different. Research funded by development agencies have to be completed in very short time windows, relative to the longer academic timelines for publishing articles and books.

For example, Small and Calarco begin with one precondition in the introduction, even before they lay out their criteria for judging qualitative studies. The precondition is “exposure”. They make the point that “An interview study of 120 people interviewed for 1 hour will have far less exposure than one with 40 people interviewed four times each for 3 hours per interview. The first will have 120 hours of exposure; the second 480. Interviewers from different perspectives will differ on which of the two approaches is preferrable: some will insist that large samples are always better; others, that one-shot interviews are always suspect; and still others (like us), that which of the two is superior depends on the question. But they will certainly agree that ceteris paribus…. exposure… [as a general idea] …is actually not controversial.” While I agree with the authors that more exposure leads to superior understanding, in the context of development projects, the available time and budget does not always allow extensive exposure.

An even bigger challenge of producing qualitative research of the highest quality for development projects is finding the right researcher. Oftentimes, a team has to be cobbled together to meet a trifecta of skills – fluency in the local language(s), contextual understanding of both place and subject matter, and experience and expertise in qualitative methods, data collection and analysis.

Therefore, this blog attempts to do three things: (a) outline the criteria proposed by Small and Calarco, (b) assess the applicability of the criteria to qualitative research in development projects broadly and impact assessments in particular, and (c) to the extent possible, provide examples from research in agriculture.

Criteria 1: Cognitive empathy

Among the most important objectives of empirical qualitative research is to demonstrate cognitive empathy - the degree to which the researcher understands how those interviewed or observed view the world and themselves – from their perspective. It is important to note that the word cognitive carries as much weight as the word empathy. As the authors point out, an empathetic researcher can report what others say, but a cognitively empathetic researcher does more to understand the perceptions, motivations, and meanings of social action.

In the context of development, this is of particular import since there is usually a huge gap between the “experts” and the “subjects” of the research that can impact research outcomes and policy decisions. Consider the scenario where it is scientific experts who develop new seed varieties, policy experts who devise action plans for farmers based on the scientific recommendations, field experts (say, agricultural extension officers) who disseminate knowledge among farmers on the use of the new seeds and experts on impact assessments who research whether farmers have adopted the seeds. One of the goals of qualitative research is to consider the end users of the seeds – the farmers – as experts as well. They are the experts of their own lives and farming practices. By cognitively empathizing with the farmers, qualitative research aims to narrow the gap between the experts and the end users.[1]

A qualitative approach to fieldwork on seed adoption will try to understand – from the farmers’ perspective - why they have or have not adopted the seeds produced by the scientists. Let’s say that some farmers have not adopted the new seeds. A cognitively empathetic researcher will eschew the idea that ways of thinking that are different from those of “experts” are somehow irrational or illogical. It is the work of the qualitative researcher to account for why farmers hold the views they do or practice what they practice and uncover the respondent’s logic.

Importantly, the authors also point out the difference between empathy and sympathy. The latter is a feeling of “pity or sorrow for the troubles of another.” Empirical studies that draw on sympathy express solidarity with those they study, but a cognitively empathetic study will rely on explaining how the ‘other’ views the world and the inherent logic of that worldview. A good qualitative researcher can achieve such an understanding with or without agreeing with it, or extending solidarity or sympathy. The latter is a matter of moral, affective, and political disposition whereas cognitive empathy is an important methodological skill in interpretive inquiry.

For example, in a study on women’s agency in India, we devised open-ended questions to assess a respondent’s ability to make decisions around her own mobility. Could she go outside the home on her own? Or did she always have to be accompanied by a family member, typically a male member of the household? We expected that women who had to be accompanied all the time were less mobile, and therefore had low agency. However, in an interview, a woman told us that she did not want to go anywhere on her own and not only preferred but insisted that her husband accompany her. She further explained that she was the oldest of three daughters, her father had taught her to ride a motorcycle, and from the age of thirteen, she had been in charge of many chores outside the home. Now, as a married woman, she viewed being driven around by her husband on his motorcycle as freedom, rather than as a restriction. Through cognitive empathy, we understood the meaning she attached to being mobile and her motivations for restricting her own movements, and once we saw the world through her eyes, we were able to achieve a far more nuanced understanding of women’s agency.

Criteria 2: Heterogeneity

A second criterion of effective empirical work using qualitative methods is heterogeneity. Put simply, “heterogeneity is an indicator of good work because…such work depicts the degree of diversity in people or contexts that insiders – members of a particular group of community – know there to be.” By paying attention to heterogeneity in the group that we are studying, we also guard against “out-group homogeneity bias”. Drawing on psychological literature, the authors explain that human beings view their own groups as diverse and can offer many variations in practices, experiences, traits, and attitudes when asked about the groups they belong to, but we paint the ‘other’ with simplistic overgeneralizations. This makes sense because we lack information about the ‘other’. An effective piece of qualitative research will bridge this gap. It is the work of the qualitative researcher to show the diversity of understanding, experiences, and motivations among individuals and groups.

Let’s say we are conducting a qualitative study to assess the impact of a development programme that was implemented state- or province-wide. Too often qualitative researchers feel compelled to answer the question, “Did the intervention work?” In fact, this is a wrong question because qualitative approaches are not suited to answer this research question. More than likely, the qualitative design will cover a handful of sites and the study will rightly be challenged on representativeness. But this does not mean that qualitative research has no role to play in assessing impact. The strength of qualitative evidence would be to cover depth at fewer sites to provide the heterogeneity of experiences that beneficiaries had with the programme. It is better to begin with a different set of questions: Who did the intervention benefit? In what way? What were the mechanisms that explain how the benefits accrued? Who benefitted less? Who benefitted the least? Why?

Criteria 3: Palpability

Small and Calarco define palpability as “the extent to which the reported findings are presented concretely rather than abstractly.” This is not to say that abstractions are not important, but abstract statements are interpretations of data, not data itself. Palpability speaks to the strength of qualitative evidence shown to support the analysis. Qualitative researchers must write about particular people, in particular places, having particular experiences. They must produce palpable evidence in support of their interpretations. There is a way of moving from the particular to the general, but without concrete and palpable data, qualitative research reads at best as anodyne, and at worse as unconvincing.

For example, I recently read a statement in an abstract of a qualitative assessment of a forestry programme: “We found evidence of an overwhelming influence of broader institutional structures on marginalized populations.” This statement was too abstract, and it barely registered as important. Buried in the report were the actual data, which were both interesting and crucial to the assessment.

When the forestry programme encouraged the cordoning off of the common lands that belonged to the whole village, a worthwhile endeavor, it had the positive effect of improving forest cover. However, the poorest villagers were mostly adversely affected. They owned small ruminants (sheep and goats), which were herded by women of the household on land that was previously open for grazing. Better-off villagers owned larger animals like cows and buffaloes but did not use the open lands because the undergrowth was inadequate as fodder for the larger animals. Since women in general, and poor women in particular, were excluded from the village governance structures, they did not have a voice in the way the project was designed or implemented and were negatively impacted by the programme.

When the fullness of the narrative was revealed, the data became much more concrete, and it revealed the lived experiences of different sections of village society (thereby also meeting the criterion of heterogeneity). The data could have been made even more palpable by including excerpts from the transcripts because illustrative quotes not only corroborate the assessment but also carry the evidence forward by highlighting what respondents experienced and how they felt.

While concrete data enhances palpability, poorly written studies also err on the other side of the spectrum: so much detail that the study reads like a series of “stories” without any analytical insights. Therefore, we conclude with the final criterion: analytical capacity. Note that this is a criterion that Small and Calarco collapse under the criterion of palpability. However, I have added it as a separate criterion because many impact assessments based on qualitative evidence are sound on the design and the data collection but are poorly executed in analysis and writing.

Criteria 4: Analytical capacity

Data on impact assessments derive largely from in-depth interviews and focus group discussions. Even fieldwork of a few days or weeks yields a lot of data by way of pages of transcripts or notes. Quick turnaround times to write up the findings make it difficult to analyze the data effectively. As a result, qualitative impact assessments present a lot of detail (thereby fulfilling the criteria above of concreteness and palpability) but without any abstractions or analytical insights. Sometimes, there is so much detail that the researcher is, in effect, forcing the reader to emerge from the details, take a step back and wonder, “what is the main takeaway?”

Writing up findings from evidence that is messy and sometimes contradictory requires a delicate balance between moving from the particular to the general. A well-executed piece of qualitative analysis shows the “essential linkages between two or more characteristics in terms of some explanatory schema.”[2] For example, a schema could be “when x occurs, whether Y will follow depends on W.” Studies of high quality can “justifiably state that a particular process, phenomenon, mechanism, tendency, relationship, dynamic or type exists.”

For example, if the fieldwork included, say, 4 villages, the researcher can step back and ask, what were the necessary steps in the implementation across villages? What was different? And most importantly – why? Or the fieldwork may raise the question – why are Village 1, 2 and 3 similar, but not Village 4? Or why are Villages 1 and 2 similar across one dimension but not another?

Without providing analytical insights, the qualitative assessment is left wanting and all the time and resources taken to collect the data comes to naught.

Together these four criteria - cognitive empathy, heterogeneity, palpability, and analytical capacity - specify the qualities that distinguish empirically sound and rigorous qualitative research. In many ways, seasoned qualitative researchers will find these qualities familiar and intuitive. My hope is that this brief overview demystifies and offers guidance for quantitative researchers to judge research based on qualitative data.

Many colleagues in quantitative research tradition remain skeptical of the possibility of rigor in qualitative research in the absence of scientific objectivity. How do qualitative researchers ensure objectivity in the absence of experimentation and control? Is interpretation subjective or objective? Can words be reliable data? What about bias? The next blog in the series will offer an overview of how qualitative researchers approach these questions in their work.

[1] Rao, Vijayendra. 2022. Can Economics Become More Reflexive? Exploring the Potential of Mixed-Methods. Policy Research Working Paper; No. 9918. © World Bank, Washington, DC.

[2] Small, Mario Luis. "How many cases do I need?' On science and the logic of case selection in field-based research." Ethnography 10, no. 1 (2009): 5-38.

Criteria to Judge Qualitative Research

Monica Biradavolu

Strengthening Impact Assessments within CGIAR - IAFP Focal Point Meeting at AAEA 2025

Biofortified Crops and Costly ‘Monitoring Drift’ in Uganda

SPIA Welcomes New Use of Evidence, Senior Officer

Does An Innovation’s Reach Reveal Anything About Its Impact? Under The Right Conditions: Possibly

Estimating the Reach of Biofortified Crops to Farm Households: The HarvestPlus Model vs. National Sampling and Genotyping in Uganda

Agricultural innovations during economic transformation: Insights from SPIA country studies

CGIAR Independent Advisory and Evaluation Service (IAES)

Criteria to Judge Qualitative Research

Share on

Related Publications

Does An Innovation’s Reach Reveal Anything About Its Impact? Under The Right Conditions: Possibly

Estimating the Reach of Biofortified Crops to Farm Households: The HarvestPlus Model vs. National Sampling and Genotyping in Uganda

Agricultural innovations during economic transformation: Insights from SPIA country studies

CGIAR Independent Advisory and Evaluation Service (IAES)