Bayesian modelling of tuberculosis clustering from DNA fingerprint data

Stat Med. 2008 Jan 15;27(1):140-56. doi: 10.1002/sim.2899.

Abstract

A combination of continuous and categorical tests, none of which is a gold standard, is often available for classification of subject status in epidemiologic studies. For example, tuberculosis (TB) molecular epidemiology uses select mycobacterial DNA sequences to provide clues about which cases of active TB are likely clustered, implying recent transmission between these cases, versus reactivation of previously acquired infection. The proportion of recently transmitted cases is important to public health, as different control methods are implemented as transmission rates increase. Standard typing methods include IS6110 restriction fragment length polymorphism (IS6110 RFLP), but recently developed polymerase chain reaction based genotyping modalities, including mycobacterial interspersed repetitive unit-variable-number tandem repeat and spoligotyping provide quicker results. In addition, it has recently been suggested that results from IS6110 RFLP can be used to create a continuous measure of genetic relatedness, called the nearest genetic distance. Whichever method is used, estimation of cluster rates is rendered difficult by the lack of a gold standard method for classifying cases as clustered or not. Since many of these methods are relatively new, their properties have not been extensively investigated. Misclassification errors subsequently lead to sub-optimal estimation of risk factors for clustering. Here we show how Bayesian latent class models can be used in such situations, for example to simultaneously analyse Mycobacterium tuberculosis DNA data from all three of the above methods. Using the data collected at the Public Health Unit in Montreal, we estimate the proportion of clustered cases and the operating characteristics of each method using information from all three methods combined, including both continuous and dichotomous measures from IS6110 RFLP. A misclassification-adjusted regression model provides estimates of the effects of risk factors on the clustering probabilities. We also discuss how one must carefully interpret any inferences that arise from a combination of continuous and dichotomous tests.

MeSH terms

  • Analysis of Variance
  • Bayes Theorem*
  • Cluster Analysis*
  • DNA Fingerprinting*
  • Female
  • Humans
  • Male
  • Molecular Epidemiology / methods*
  • Mycobacterium tuberculosis / genetics*
  • Tuberculosis / epidemiology*
  • Tuberculosis / microbiology*