The biometrical comparison of cardiac imaging methods
Introduction
The comparison of clinical measurement methods is of broad interest for any medical discipline, which is based on continuous diagnostic measurement such as, for example, orthodontics (cranial and in particular mandibulary and maxillary imaging), cardiology (cardiac imaging and bidimensional cardial volumetry), radiology (signal intensity assessment in tissue subcuts), pathology (antibody concentration in prepared tissue). Comparison problems will arise, for example, when an established, but expensive or invasive reference method is to be compared to a new and more attractive substitute measurement method, where the latter, however, must not yield less accurate or precise measurements than the standard. Therefore statistical comparison procedures are invoked to provide information on possibly significant deviations in accuracy and/or precision as well as on the order of concordance between the methods under consideration.
Despite the necessity of using adequate statistical procedures for data aggregation and representation, there are tendencies to oversimplify the latter by simply providing mean values and standard deviations for each of the imaging methods under consideration, separately, or by simply providing a correlation coefficient as a surrogate measure for agreement between the two measurement series. However, in the sense of a ‘good biometrical practice’ it should be mentioned, that measurement comparison is usually based on paired data, that is the intraindividual information contained in it will be ignored by only presenting separate means and variances for each imaging method separately. The latter can result in crucial errors due to liberal decisions based on the suboptimal representation, that is significant results may only be significant due to the kind of data analysis, but not because of really existing clinical differences between the imaging methods under consideration. A corresponding effect will be illustrated in Section 2.2 in the setting of the comparison of (paired) measurement precisions.
Therefore this review summarizes procedures for the biometrical evaluation of paired data. These methods, however, are hardly available in standard software packages. It will always be emphasized how to use standard programmes like, for example, excel, sas, spss or winstat to modify available options to implement the subsequent methods. The corresponding formulae will sometimes appear rather tedious, but readers should clearly concentrate on the interpretation of the methods’ results and pitfalls in their applications instead of their derivation and mathematical representation; the latter merely intends a basis for self-implementation than a short-cut tutorial on mathematical statistics.
Methods for the design adequate comparison of location and scale will be summarized, which are easy to perform and to interpret. Many applications of these statistical procedures will be devoted to the comparison of diagnostic methods in cardiology. Whereas the admission and evaluation of new drugs or therapies often concentrates on mean or median effect estimates, the comparison of diagnostic procedures calls for the additional comparison of scale: if a new measurement method has to be compared to a gold standard, there will be two questions of comparable interest. However, only the first one is frequently asked when designing method comparison studies: “Is the new method valid?”, i.e. are there significant mean or median deviations? Nevertheless, there is a second indication for substitution of an established method by a new one: For example, cardiac imaging methods are frequently used for the determination of cardiac functional parameters, which are clinically relevant indicators for decisions on further invasive diagnostics or immediate interventional therapy. Assessment of such endpoints, however, may become remarkably biased, when being performed by, e.g. students of medicine or physicians, who have hardly been concerned with cardiac imaging before. But if the new method promises easier application maybe due to additional features adjuvating assessment of the clinical endpoints of interest, there will also be a focus on the measurements’ variation: If the new method turns out valid and in fact easier to apply, one will expect a significant decrease in variability and thus additional gain in diagnostic quality when based on the clinical endpoints under consideration. Simultaneous application of both measurement instruments (new and established one) will allow for the simultaneous intraindividual comparison of location and scale. Remembering the above cardiac imaging setting this paper mainly concentrates on tests for the detection of differences in paired variances; accordingly corresponding methods to establish ‘diagnostic equivalence’ of measurement methods will not be considered in the following.
Furthermore, the following will only consider univariate measurements; extensions to multiple endpoints are straight-forward with the methods described below by applying multiple test procedures on the several clinical parameters of interest (see e.g. [1]). Also a second problem related to multiplicity will hardly be dealt with during this survey, that is the topic of reliability studies: If one single assessment of the parameter of interest may not be sufficiently reliable, that is an experimentator has to perform replicate measurements based on each method, one further has to compare the reliabilities of the concurring procedures. This review, however, will only concentrate on single assessments (e.g. resulting from the observation of unique events due to ethical restrictions or from aggregation of replicate measurements).
Next we will assume normality of the data under consideration: If X1 and X2 denote the corresponding measurement results based on methods 1 and 2, respectively, both will be assumed to follow a normal (Gaussian) distribution with respective population means μ1, μ2 and standard deviations σ1, σ2. This normal assumption can, for example, be enforced by taking replicate measurements on each patient with each measurement method of interest and using the respective replications’ mean as a basis for analysis. Another attractive way to enforce normality is the application of appropriate transformations to the original data. For example, the log transform Yi=log(Xi) should always be taken into consideration to improve the fit of data to normal distributions.
Further let ρ denote the Pearson correlation between X1 and X2. The normal assumption allows for a quite appealing (although very simplified) interpretation of accuracy and precision of the measurements methods represented by X1 and X2: Different location (accuracy) can be described by the paired mean difference μ1−μ2≠0, deviation in scale (precision) is established for σ12−σ22≠0.
Section snippets
Background and methods
The normal assumption provides simple significance tests for the null hypotheses H0: μ1=μ2 (2.1 The paired, 2.4 The Grubbs test (1973)) and K0: σ12=σ22 (2.2 The Maloney/Rastogi test (1970), 2.3 The Hahn/Nelson test (1970), 2.4 The Grubbs test (1973)) versus corresponding alternatives H1: μ1≠μ2 and K1: σ12≠σ22.
The tissue Doppler echocardiography data
Endocardial border tracing in bidimensional cardiac imaging is necessary for determining the main parameters in left ventricular volumetry; it is supposed to be improved by tissue Doppler echocardiography (TDE) due to coloured contour imaging as compared to the previous imaging gold standard (2DE) in left ventricular volumetry [9]. Whereas TDE increased insight into the relationship between coronary perfusion and myocardial function in LDA patients, its value for cardial volumetry in healthy
Discussion
This text tried to survey some elementary methods for the biometrical comparison of cardiac imaging procedures and provided estimates and test procedures for assessing differences in accuracy (means) and precision (variances) of the methods under consideration, if the underlying data series can be assumed to be Gaussian.
However, such an overview cannot be complete, and in fact it was never intended to be. It should rather provide some simple dictionarial suggestions for approaches to
Summarizing tips for practice
In the author’s opinion the above items indicate the following recommendations to provide minimum information necessary for the statistical representation of method comparison trials:
- •
Numerical representation may be structured as indicated in Table 1, where K and r mainly provide information on method agreement and reproducibility, whereas means, standard deviations and the corresponding paired significance tests rather refer to location and scale, i.e. to bias, precision and thus validity.
- •
References (14)
Practical Statistics for Medical Research
(1991)- et al.
Significance tests for Grubbs’ estimators
Biometrics
(1970) - et al.
A problem in the statistical comparison of measuring devices
Technometrics
(1970) Errors of measurement. precision, accuracy and the statistical comparison of measuring instruments
Technometrics
(1973)- et al.
Statistical methods for assessing agreement between two methods of clinical measurement
Lancet
(1986) A concordance correlation coefficient to evaluate reproducibility
Biometrics
(1989)Bivariate agreement coefficients for reliability of data
Cited by (8)
Reproducibility of optical biometry using partial coherence interferometry: Intraobserver and interobserver reliability
2001, Journal of Cataract and Refractive SurgeryMeasuring agreement: Models, methods, and applications
2017, Measuring Agreement: Models, Methods, and ApplicationsTranspalpebral tonometry: Reliability and comparison with Goldmann applanation tonometry and palpation in healthy volunteers
2005, British Journal of OphthalmologyGraphic representation of data resulting from measurement comparison trials in cataract and refractive surgery
2003, Ophthalmic Surgery Lasers and Imaging