New Farnsworth-Munsell 100 hue test norms of normal observers for each year of age 5–22 and for age decades 30–70
- Correspondence to: Dr P R Kinnear, Vision Research Laboratories, Department of Psychology, King’s College, Aberdeen AB24 2UB, UK;
- Accepted 1 July 2002
Aims: To provide normative data for chromatic discrimination on the Farnsworth-Munsell 100 hue test particularly for observers under 23 years of age.
Methods: Normal observers were screened for congenital colour vision deficiencies using the Ishihara test leaving 382 observers.
Results: New total error score (TES) norms (means and 95th percentiles) are presented for each year of age from 5–22 and for 10 year age groups from the 30s to the 70s. These norms are presented as actual values (TES) and also as square root values (√TES). Other data include partial error scores for red-green and blue-yellow axes discrimination.
Conclusion: This study provides the most detailed set of normative data to date. The data are also in agreement with other reports of chromatic discrimination, showing that the performance in this task varies as a U-shape function with age, the best being achieved at 19 years of age.
The Farnsworth-Munsell (F-M) 100 hue test1 is widely used for measuring chromatic discrimination by clinicians and vision scientists. As well as helping to identify various congenital colour vision deficiencies, it is a useful and sensitive test for measuring the changes due to neuronal disease2,3 or possible side effects in therapeutic management.4,5
The test was first devised by Farnsworth in 1943 and the present 85 coloured cap version dates from 1957. The caps are arranged in four boxes, each containing a fixed anchor cap at each end. There are 22 caps in the box 1 and 21 caps in each of remaining three boxes. The total error score (TES) is a measure of accuracy of an observer in arranging the caps so as to form a gradual transition in chroma between the two anchor caps; the higher the number of misplacements the larger the TES. TES norms for the test have been published at intervals including those by Verriest et al6–8 based on data for observers divided into various age cohorts (either 5 year or 10 year).
There is, however, ambiguity about scoring cap misplacements occurring near the ends of boxes in the published norms. The test instructions for scoring do not specify how to account for chromatic step changes across the boundaries between boxes. Before automatic scoring equipment was developed, most testers scored the test on the basis that the anchor caps at the ends of the boxes did not exist. Thus a sequence such as cap numbers 20, 19 in the first box followed by caps 26, 23 in the second box would have a partial error score for the last arranged cap in the first box (19) calculated by adding |19 − 20| to |19 − 26| and then subtracting 2 (that is, 1 + 7 −2 = 6). Likewise the partial error score for the first arranged cap in the second box (26) would be |26 − 19| + |26 − 23| − 2 = 8. For this method, the total error score from summing all the partial scores is a multiple of four. Since the introduction of automatic scoring, testers have scored the caps in each box independently of the caps in the previous or following boxes by making use of the numbers of the anchor caps. With this method, the total error score is not necessarily a multiple of four and can result in a lower TES than that obtained using the former method. The scoring artefacts associated with the division of the test into boxes have been addressed by Victor9 and by Craven,10 the latter drawing attention to the point that box-end caps have lower scores than mid-box caps when the TES is low but higher scores when the TES is high.
A further scoring anomaly arose with earlier versions of the test when the boxes were shorter than the current versions. Some testers removed cap 85 from the first box and installed it as the anchor cap instead of 84, leaving just 21 caps for observers to arrange. For example, Verriest et al’s6 observers used binocular vision and the scoring disregarded the numbers of the anchor caps whereas his later observers8 were tested both binocularly and monocularly, the scoring excluded cap 85, and the numbers of the anchor caps were used for calculating misplacements at the ends of boxes.
Finally, the distribution of TES at a given age group has a skewed distribution, hence it has been recommended that the transformation of TES to the square root of TES (√TES) provides a nearer normal distribution for statistical analysis of the data.11 Although this recommendation has been questioned,12 more recent normative data have been published as means of √TES for 5 year and 10 year age groups.8,13 More statistically efficient methods based on maximum-likelihood procedures have been offered by Craven14,15 but they are not convenient to use in a clinical situation.
Here, we report normative data for observers in each year of age from 5 to 22 and for 10 year age groups from the 30s to the 70s in both TES and √TES form. We also report partial error scores (TPES) for the red-green and blue-yellow axes of colour vision deficiencies because they represent a useful way of studying the relative contributions of ageing and retinal disease on colour discrimination. We have, therefore, outlined TPES scores as well as a comparison of the means from using the two methods of scoring misplacements at the ends of boxes.
OBSERVERS AND METHODS
Equal numbers of male and female observers aged from 5–79 were given a preliminary test with Ishihara plates to screen out any red-green colour anomalous or defective observers. Larger sample sizes were selected for the ages between 12 and 15 where greater variation in scores generally occurs. The 100 hue test was then administered to the remaining observers by several testers. Tests were carried out under either natural or artificial daylight illumination in a number of schools and laboratories; care was taken to use the same instructions in all testing sessions. In order to make the procedure easier for children, the caps for rearranging were taken out of the box and placed on a non-reflecting white surface behind the box and their positions were randomised. Observers were instructed to select the coloured caps and place them in the box between the anchor colours “so that they form a regular colour series between the two end caps” (Farnsworth1). For consistency, the same procedure was used for adults. The observers were tested binocularly and were told that the test should take about 2 minutes per box but that accuracy was more important than speed. The younger observers (primary schoolchildren) took significantly longer to complete the test than the older observers. Parental consent was obtained for testing schoolchildren. All the observers were either emmetropes or wore their optical corrections. Overall ethical approval was obtained from the departmental ethics committee.
Cap 85 was retained in the first box. For observers under 30 years of age, the scoring was done by noting the partial error scores for pairs of caps in the two ways described above (that is, by disregarding the numbers of the anchor caps and by taking them into account). For observers over 30 years of age, the scoring was done in the traditional method by disregarding the numbers of the anchor caps. Total error scores (TES) were computed by summing the partial error scores for every cap. Total partial error scores (TPES) were also computed for the red-green (R-G) axis using caps 13–33 and 55–75, and for the blue-yellow (B-Y) axis using caps 1–12, 34–54, and 76–85 as recommended by Smith et al.16 The square roots of TES and TPES were also computed.
A total of 395 people were tested. Thirteen children, mostly aged under 10, with a total error score greater than 500 were excluded from the computation of the norms on the basis that a score of that size indicates virtually no colour discrimination. Thus, 382 observers (189 males and 193 females) remained, of whom only nine (all under the age of 10) had a total error score of more than 400.
The means in Table 1 were obtained by comparing the best fitting curve for the scatter plot of all observers in conjunction with the means and medians for each year of age up to 22 and each 10 year age group from the 30s to the 70s. These TES means are plotted in Figure 1. Figure 2 shows the square root of TES (√TES) along with their respective 95th percentiles computed by adding 1.65 times the standard deviation to the mean.
The curves for the mean TPES for the R-G and B-Y axes are shown in Figure 3. The TPES scores also show the expected increase in B-Y deterioration compared with R-G deterioration beyond the 30s age group.
A comparison of the our mean square root of TES data in 5 year and 10 year cohorts to match those of Verriest et al8 and Roy et al13 show comparable mean values (Fig 4) for the aggregated age groups except that those of Roy et al are rather more irregular.
Table 2 tabulates the mean differences between calculating the error scores across and within boxes for TES and √TES for the younger observers. The values indicate that there is not an appreciable difference in scores obtained using the two scoring methods.
The present study, in agreement with previous reports,6–8,13 shows that the performance on F-M 100 hue test varies as a U-shape function of age. Younger children make significantly more misplacement errors leading to higher TES than observers in their 20s. The performance of older adults also deteriorates with age. This trend is repeated for total partial error scores (TPES) indicating that the development and the subsequent deterioration of performance with age also varies as a U-shaped function for red-green and blue-yellow opponent systems. As expected the blue-yellow sensitivity deteriorates more than red-green sensitivity for observers over 40 (Fig 3). For both TES and TPES measures we did not find a difference as a function of sex and the performance of male and female observers were not significantly different. The best performance on this test is achieved by those in their late teens and early 20s. This finding is also in line with recent reported discrimination performance using preferential looking in infants and forced choice chromatic discrimination in younger and older adults.17 Hence, on the basis of these similar age trends, we propose that in the case of younger children, the 100 hue changes in performance is a measure of their chromatic discrimination ability rather than of a lower comprehension of the task or lower attentional span. In our experience, most schoolchildren were attentive and tried to do their best since they considered the test as a challenge and a variation in the everyday teaching routine at school.
We have also addressed the effect of different methods used for scoring the test. The results outlined in Table 2 show the effect of taking into account the transition between the boxes or treating each box independently. The corrections that should be applied in order to transform the results obtained using one method to another for various age groups are small. In light of these findings, we propose that for simplicity, scoring should be done for each box independently.
The present study provides detailed normative data at the lower age range not previously published and it addresses the sensitivity of F-M 100 hue test for measuring changes in chromatic sensitivity in younger children caused by pathological damage or therapeutic intervention. The comparison of performance in F-M 100-hue or computerised measures of chromatic sensitivity is the subject for our current programme of research.