The biometrical comparison of cardiac imaging methods

https://doi.org/10.1016/S0169-2607(99)00048-6Get rights and content

Abstract

Objectives: Biometrical comparison procedures for cardiac imaging methods with continuous outcome are reviewed mainly concentrating on assessment and design adequate comparison of accuracy and precision. Univariate graphical and numerical representation of corresponding deviations is outlined to derive a ‘check list’ of minimum information necessary to compare the measurement methods. Data: The methods reviewed here are illustrated by the comparison of standard 2DE bidimensional cardial volumetry versus assessment using TDE colour imaging in 28 normal probands. Sources: The paired t-test and the corresponding confidence interval approach are used to assess deviations in location of two imaging methods; the test procedures of Maloney and Rastogi Hahn and Nelson and Grubbs are surveyed as proposals for the comparison of precisions in paired data. The Krippendorff coefficient and the Bradley/Blackwood test are illustrated as surrogate measures for method concordance. Conclusions: Since these methods can be performed by simple modification of standard options available in most statistics software packages, this review intends to enable cardiologists to choose appropriate methods for statistical data analysis and representation on their own.

Introduction

The comparison of clinical measurement methods is of broad interest for any medical discipline, which is based on continuous diagnostic measurement such as, for example, orthodontics (cranial and in particular mandibulary and maxillary imaging), cardiology (cardiac imaging and bidimensional cardial volumetry), radiology (signal intensity assessment in tissue subcuts), pathology (antibody concentration in prepared tissue). Comparison problems will arise, for example, when an established, but expensive or invasive reference method is to be compared to a new and more attractive substitute measurement method, where the latter, however, must not yield less accurate or precise measurements than the standard. Therefore statistical comparison procedures are invoked to provide information on possibly significant deviations in accuracy and/or precision as well as on the order of concordance between the methods under consideration.

Despite the necessity of using adequate statistical procedures for data aggregation and representation, there are tendencies to oversimplify the latter by simply providing mean values and standard deviations for each of the imaging methods under consideration, separately, or by simply providing a correlation coefficient as a surrogate measure for agreement between the two measurement series. However, in the sense of a ‘good biometrical practice’ it should be mentioned, that measurement comparison is usually based on paired data, that is the intraindividual information contained in it will be ignored by only presenting separate means and variances for each imaging method separately. The latter can result in crucial errors due to liberal decisions based on the suboptimal representation, that is significant results may only be significant due to the kind of data analysis, but not because of really existing clinical differences between the imaging methods under consideration. A corresponding effect will be illustrated in Section 2.2 in the setting of the comparison of (paired) measurement precisions.

Therefore this review summarizes procedures for the biometrical evaluation of paired data. These methods, however, are hardly available in standard software packages. It will always be emphasized how to use standard programmes like, for example, excel, sas, spss or winstat to modify available options to implement the subsequent methods. The corresponding formulae will sometimes appear rather tedious, but readers should clearly concentrate on the interpretation of the methods’ results and pitfalls in their applications instead of their derivation and mathematical representation; the latter merely intends a basis for self-implementation than a short-cut tutorial on mathematical statistics.

Methods for the design adequate comparison of location and scale will be summarized, which are easy to perform and to interpret. Many applications of these statistical procedures will be devoted to the comparison of diagnostic methods in cardiology. Whereas the admission and evaluation of new drugs or therapies often concentrates on mean or median effect estimates, the comparison of diagnostic procedures calls for the additional comparison of scale: if a new measurement method has to be compared to a gold standard, there will be two questions of comparable interest. However, only the first one is frequently asked when designing method comparison studies: “Is the new method valid?”, i.e. are there significant mean or median deviations? Nevertheless, there is a second indication for substitution of an established method by a new one: For example, cardiac imaging methods are frequently used for the determination of cardiac functional parameters, which are clinically relevant indicators for decisions on further invasive diagnostics or immediate interventional therapy. Assessment of such endpoints, however, may become remarkably biased, when being performed by, e.g. students of medicine or physicians, who have hardly been concerned with cardiac imaging before. But if the new method promises easier application maybe due to additional features adjuvating assessment of the clinical endpoints of interest, there will also be a focus on the measurements’ variation: If the new method turns out valid and in fact easier to apply, one will expect a significant decrease in variability and thus additional gain in diagnostic quality when based on the clinical endpoints under consideration. Simultaneous application of both measurement instruments (new and established one) will allow for the simultaneous intraindividual comparison of location and scale. Remembering the above cardiac imaging setting this paper mainly concentrates on tests for the detection of differences in paired variances; accordingly corresponding methods to establish ‘diagnostic equivalence’ of measurement methods will not be considered in the following.

Furthermore, the following will only consider univariate measurements; extensions to multiple endpoints are straight-forward with the methods described below by applying multiple test procedures on the several clinical parameters of interest (see e.g. [1]). Also a second problem related to multiplicity will hardly be dealt with during this survey, that is the topic of reliability studies: If one single assessment of the parameter of interest may not be sufficiently reliable, that is an experimentator has to perform replicate measurements based on each method, one further has to compare the reliabilities of the concurring procedures. This review, however, will only concentrate on single assessments (e.g. resulting from the observation of unique events due to ethical restrictions or from aggregation of replicate measurements).

Next we will assume normality of the data under consideration: If X1 and X2 denote the corresponding measurement results based on methods 1 and 2, respectively, both will be assumed to follow a normal (Gaussian) distribution with respective population means μ1, μ2 and standard deviations σ1, σ2. This normal assumption can, for example, be enforced by taking replicate measurements on each patient with each measurement method of interest and using the respective replications’ mean as a basis for analysis. Another attractive way to enforce normality is the application of appropriate transformations to the original data. For example, the log transform Yi=log(Xi) should always be taken into consideration to improve the fit of data to normal distributions.

Further let ρ denote the Pearson correlation between X1 and X2. The normal assumption allows for a quite appealing (although very simplified) interpretation of accuracy and precision of the measurements methods represented by X1 and X2: Different location (accuracy) can be described by the paired mean difference μ1μ2≠0, deviation in scale (precision) is established for σ12σ22≠0.

Section snippets

Background and methods

The normal assumption provides simple significance tests for the null hypotheses H0: μ1=μ2 (2.1 The paired, 2.4 The Grubbs test (1973)) and K0: σ12=σ22 (2.2 The Maloney/Rastogi test (1970), 2.3 The Hahn/Nelson test (1970), 2.4 The Grubbs test (1973)) versus corresponding alternatives H1: μ1μ2 and K1: σ12σ22.

The tissue Doppler echocardiography data

Endocardial border tracing in bidimensional cardiac imaging is necessary for determining the main parameters in left ventricular volumetry; it is supposed to be improved by tissue Doppler echocardiography (TDE) due to coloured contour imaging as compared to the previous imaging gold standard (2DE) in left ventricular volumetry [9]. Whereas TDE increased insight into the relationship between coronary perfusion and myocardial function in LDA patients, its value for cardial volumetry in healthy

Discussion

This text tried to survey some elementary methods for the biometrical comparison of cardiac imaging procedures and provided estimates and test procedures for assessing differences in accuracy (means) and precision (variances) of the methods under consideration, if the underlying data series can be assumed to be Gaussian.

However, such an overview cannot be complete, and in fact it was never intended to be. It should rather provide some simple dictionarial suggestions for approaches to

Summarizing tips for practice

In the author’s opinion the above items indicate the following recommendations to provide minimum information necessary for the statistical representation of method comparison trials:

  • Numerical representation may be structured as indicated in Table 1, where K and r mainly provide information on method agreement and reproducibility, whereas means, standard deviations and the corresponding paired significance tests rather refer to location and scale, i.e. to bias, precision and thus validity.

References (14)

  • D.G Altman

    Practical Statistics for Medical Research

    (1991)
  • C.J Maloney et al.

    Significance tests for Grubbs’ estimators

    Biometrics

    (1970)
  • J.H Hahn et al.

    A problem in the statistical comparison of measuring devices

    Technometrics

    (1970)
  • F.E Grubbs

    Errors of measurement. precision, accuracy and the statistical comparison of measuring instruments

    Technometrics

    (1973)
  • M Bland et al.

    Statistical methods for assessing agreement between two methods of clinical measurement

    Lancet

    (1986)
  • L.I.K Lin

    A concordance correlation coefficient to evaluate reproducibility

    Biometrics

    (1989)
  • K Krippendorff

    Bivariate agreement coefficients for reliability of data

There are more references available in the full text version of this article.

Cited by (8)

View all citing articles on Scopus
View full text