Article Text

Download PDFPDF

Guidelines for evaluating prevalence studies
Free
  1. Michael H Boyle, PHD
  1. Department of Psychiatry, McMaster University Hamilton, Ontario, Canada

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

As stated in the first issue of Evidence-Based Mental Health, we are planning to widen the scope of the journal to include studies answering additional types of clinical questions. One of our first priorities has been to develop criteria for studies providing information about the prevalence of psychiatric disorders, both in the population and in specific clinical settings. We invited the following editorial from Dr Michael Boyle to highlight the key methodological issues involved in the critical appraisal of prevalence studies. The next stage is to develop valid and reliable criteria for selecting prevalence studies for inclusion in the journal. We welcome our readers contribution to this process.

You are a geriatric psychiatrist providing consultation and care to elderly residents living in several nursing homes. The previous 3 patients referred to you have met criteria for depression, and you are beginning to wonder if the prevalence of this disorder is high enough to warrant screening. Alternatively, you are a child youth worker on a clinical service for disruptive behaviour disorders. It seems that all of the children being treated by the team come from economically disadvantaged families. Rather than treating these children on a case by case basis, the team has discussed developing an experimental community initiative in a low income area of the city. You are beginning to wonder if the prevalence of disruptive behaviour disorders is high enough in poor areas to justify such a programme.

Prevalence studies of psychiatric disorder take a sample of respondents to estimate the frequency and distribution of these conditions in larger groups. All of these studies involve sampling, cross sectional assessments of disorder, the collection of ancillary information, and data analysis. Interest in prevalence may extend from a particular clinical setting (a narrow focus) to an entire nation (a broad focus). In the examples given above, the geriatric psychiatrist needs evidence from an institution based study (narrow focus), whereas the child youth worker needs evidence from a general population study (broad focus).

In recent years, concern for the mental health needs of individuals in clinical settings has been broadening to include whole populations. This population health perspective has stimulated numerous prevalence studies of psychiatric disorder which are intended to inform programme planning, evaluation, and resource allocation. In general, the quality of these prevalence studies has been improving as a direct result of drawing on advances in survey methodology. In this note, the guidelines for evaluating prevalence studies arise from criteria applicable to community surveys.

Guidelines for evaluating prevalence studies

The validity of prevalence studies is a function of sampling, measurement, and analysis. Answers to the following questions (see box on 38) can serve as criteria for assessing these features.

Sampling

(1) DOES THE SURVEY DESIGN YIELD A SAMPLE OF RESPONDENTS REPRESENTATIVE OF A DEFINED TARGET POPULATION?

A valid study enlists a sample that accurately represents a defined target population. Representativeness is a quality associated with the use of statistical sampling methods and careful evaluation of respondent characteristics.

Is the target population defined clearly?

A sample provides the means to obtain information about a larger group, called the target population. The target population must be defined by shared characteristics assessed and measured accurately. Some of these characteristics include age, sex, language, ethnicity, income, and residency. Invariably, subsets of the target population are too expensive or difficult to enlist because, for example, they live in places that are inaccessible to surveys (eg, remote areas, native reserves, military bases, shelters) or they speak languages not accommodated by data collection. These excluded individuals need to be described and their number estimated as a proportion of the target population. The requirements to define the target population and to identify systematic exclusions are necessary to give research consumers a basis for judging the applicability of a study to their question.

Was probability sampling used to identify potential respondents?

Probability sampling relies on the principle of randomisation to ensure that each eligible respondent has a known chance of selection; it requires that members of the target population be identified through a sampling frame or listing of potential respondents. This listing must provide access to all members of the defined target population except for exclusions acknowledged by the study authors. Probability sampling comes in a variety of forms from simple to complex. In simple random sampling, a predetermined number of units (individuals, families, households) is selected from the sampling frame so that each unit has an equal chance of being chosen. More complex methods may include stratified sampling in which a population is divided into relatively homogeneous subgroups, called strata, and samples selected independently and with known probability from each strata; cluster sampling in which a population is divided into affiliated units or clusters such as neighbourhoods or households and a sample of clusters selected with known probability; multistage sampling in which samples are selected with known probability in hierarchical order, for example, a sample of neighbourhoods, then a sample of households, then a sample of individuals; or multiphase sampling in which sampled individuals are screened and subsets are selected with known probability for more intensive assessment. The use of probability sampling is a basic requirement in prevalence studies.

Do the characteristics of respondents match the target population?

Non-response is the failure to enlist sampled individuals. If non-response is extensive and influenced by variables central to study objectives, it can lead to selection bias and estimates that deviate systematically from population values. When information is available on non-respondents, methods exist and should be used to evaluate selection bias.1 In the absence of such information, sample respresentativeness must be evaluated by comparing the sociodemographic characteristics of respondents with those of the target population derived from a census or other relevant databases.

Sampling

(1) DOES THE SURVEY DESIGN YIELD A SAMPLE OF RESPONDENTS REPRESENTATIVE OF A DEFINED TARGET POPULATION?

Is the target population defined clearly?

Was Probability sampling used to identify potential respondents?

Do the characteristics of respondents match the target population?

Measurement

(2) DO THE SURVEY INSTRUMENTS YIELD RELIABLE AND VALID MEASURES OF PSYCHIATRIC DISORDER AND OTHER KEY CONCEPTS?

Are the data collection methods standardised?

Are the survey instruments reliable?

Are the survey instruments valid?

Analysis

(3) WERE SPECIAL FEATURES OF THE SAMPLING DESIGN ACCOUNTED FOR IN THE ANALYSIS?

(4) DO THE REPORTS INCLUDE CONFIDENCE INTERVALS FOR STATISTICAL ESTIMATES?

In clinical studies of treatment, prevention, prognosis, and quality improvement, ≥80% response has become the recommended minimum for follow up.2 Although apparently fixed, this minimum standard is, in fact, variable because it fails to account for study to study variation in non-response at inception. The threshold for minimally acceptable response in prevalence studies should be set at 70% as long as the report shows that respondents and non-respondents, and/or the study sample and the target population, have similar important sociodemographic characteristics. Without evidence of comparability between respondents and non-respondents and/or the study sample and the target population, the minimum standard should be set at 80%.

Measurement

(2) DO THE SURVEY INSTRUMENTS YIELD RELIABLE AND VALID MEASURES OF PSYCHIATRIC DISORDER AND OTHER KEY CONCEPTS?

A valid study uses instruments that provide reliable and valid measurement. These are qualities that arise from the use of standardised data collection methods and that are confirmed empirically by measurement evaluation studies.

Are the data collection methods standardised?

Prevalence studies collect information for purposes of estimation (eg, frequency and distribution of psychiatric disorder) and hypothesis testing (eg, association between disorder and other variables of interest). To achieve these purposes, identical methods of assessment and data collection must be used with all respondents so that the information for analysis is completely comparable. Any deviation from a standard data collection protocol applicable to all respondents creates the potential for biased comparisons. Standardisation of method refers not only to eliciting information from respondents but also to interviewer training, supervision, enlistment of respondents, and processing of data.

Are the survey instruments reliable?

Reliability establishes the extent to which an instrument can discriminate between individuals. To evaluate reliability, data are collected to separate between individual differences that are real or actual (true variation) from ones that are unreal or artifacts of the measurement process (random variation). An informative empirical test of instrument reliability in prevalence studies is to give the survey instrument on two occasions, about 7–10 days apart (test-retest design), and to examine levels of agreement using κ, for cross classified data, and the intraclass correlation coefficient, for dimensional data.

Instrument reliabilities must be based on a sample derived from, or at least similar to, study respondents; they also need to include effects for all major sources of unwanted random variation. Respondent effects due to temporal fluctuations in memory, mood, and motivation are invariably present. There may also be interviewer effects arising from differences in presentation, competence, and impact and setting effects stemming from variability in the location and circumstances of data collection. If all 3 sources of unwanted variation were applicable in a study, then the test-retest design described above should take them into account.

Although there is no consensus on minimum standards for reliability, a good reason exists for setting them. Random variation in measurement leads to attenuation of effects (bias towards the null). Tolerating large differences in reliability between measures creates an unequal basis for comparing effects, and in the same study, this practice can lead to extremely biased inferences. To prevent the mindless analysis and reporting of associations for poorly measured variables, minimum reliability standards should be set at 0.60 (based on κ) for cross classified data and 0.70 (based on the intraclass correlation coefficient) for dimensional data.

Are the survey instruments valid?

Validity establishes the extent to which an instrument makes discriminations between individuals that are meaningful and useful. Evaluating instrument validity is analogous to testing hypotheses on substantive associations between measured variables, with one important difference: validity testing is done to confirm, not to add to, existing theory and knowledge. In the measurement of psychiatric disorder, this theory and knowledge come from clinical and epidemiological studies that have focused on aetiology, course, and response to treatment. Although the need to present evidence on instrument validity extends to all key variables, it is the assessment of psychiatric disorder which provides the focus here.

Efforts to validate structured interviews for classifying psychiatric disorder have been remarkably circumscribed. This is true in children for a variety of interviews3 and in adults for the current recommended standard—the Composite International Diagnostic Interview.4 The best of these studies usually compare assessment data generated by lay interviewers versus clinicians.

There has been no commentary on minimum validity standards for psychiatric instruments used in prevalence studies. The following are recommended here: (1) instrument content for measuring disorder (items and questions) should map into the operational criteria and symptoms contained in existing nosological systems (International Classification of Diseases and Diagnostic and Statistical Manual); (2) classifications of disorder should be based on compound criteria, including symptoms and evidence of impairment, distress, or disadvantage; (3) the identification of cases should derive from an explicit rationale that includes an external criterion and decision rules for discriminating between test positives and test negatives5; and (4) evidence should exist from head to head comparisons with independent assessment data that the instrument meets specificity criteria (ability to distinguish among different categories of disorder).

Analysis

(3) WERE SPECIAL FEATURES OF THE SAMPLING DESIGN ACCOUNTED FOR IN THE ANALYSIS?

Complex sampling methods mean that eligible respondents will have different probabilities of selection. These methods introduce design effects—a term used by survey reseachers to indicate that the sampling method will have an impact on the calculation of variance estimates for testing hypotheses and determining confidence intervals. Complex sampling methods require the use of special statistical methods to obtain estimates that are unbiased and associated with the correct statistical precision.

(4) DO THE REPORTS INCLUDE CONFIDENCE INTERVALS FOR STATISTICAL ESTIMATES?

A primary objective of prevalence studies is to produce frequency estimates of disorder overall and for population subgroups. The usefulness of these estimates derives from the expected closeness between the unobserved value in the target population and the observed value in the sample. Confidence intervals quantify this closeness by telling us the chance, for example 95%, that the unobserved target population value will fall within a certain range of the observed sample value. Estimates in prevalence studies must be accompanied by confidence intervals or the information needed to calculate them.

Comment

The criteria presented in this commentary identify guidelines to evaluate the basic elements of prevalence studies: sampling, measurement, and analysis. The objective is to help the research consumer make informed judgments about the validity of a particular report. Basic guidelines are set to stimulate debate and further study. Although the criteria arise mostly from experience with prevalence studies done in general population settings, they extend to studies done in clinical settings, with one important caveat. In clinical settings, the question, “does the survey design yield a sample of respondents representative of a defined target population?” is largely unanswerable. It is difficult, if not impossible, to define the target populations that give rise to respondents sampled from clinical settings. The idiosyncracies of referral to mental health services render suspect the general applicability of prevalence estimates from one setting to the next. This issue needs further clarification as it raises an important question about the usefulness of publishing prevalence estimates from studies done in clinical settings.

References