Article Text

Download PDFPDF

Biobanks and the importance of detailed phenotyping: a case study—the European Glaucoma Society GlaucoGENE project
  1. P Founti1,
  2. F Topouzis1,
  3. L van Koolwijk2,3,
  4. C E Traverso4,
  5. N Pfeiffer5,
  6. A C Viswanathan2,6
  1. 1
    A’ Department of Ophthalmology, School of Medicine, Aristotle University of Thessaloniki, AHEPA Hospital, Thessaloniki, Greece
  2. 2
    Glaucoma Research Unit, Moorfields Eye Hospital, London, UK
  3. 3
    Glaucoma Service, The Rotterdam Eye Hospital, Rotterdam, The Netherlands
  4. 4
    Centro di Ricerca Clinica e Laboratorio per il Glaucoma e la Cornea, Clinica Oculistica, DiNOG, University of Genoa, Genoa, Italy
  5. 5
    Department of Ophthalmology, University Eye Hospital, Mainz, Germany
  6. 6
    Department of Epidemiology, Institute of Ophthalmology, London, UK
  1. Dr A C Viswanathan, Glaucoma Research Unit, Moorfields Eye Hospital, City Road, London EC1V 2PD, UK; vis{at}


Background: Dissecting complex diseases has become an attainable goal through large-scale collaborative projects under the term “biobanks.” However, large sample size alone is no guarantee of a reliable genetic association study, and the genetic epidemiology of complex diseases still has many challenges to face. Among these, issues such as genotyping errors and population stratification have been previously highlighted. However, comparatively little attention has been given to accurate phenotyping. Study procedures of existing large-scale biobanks are usually restricted to very basic physical measurements and non-standardised phenotyping, based on routine medical records and health registry systems.

Discussion: Study procedures of existing large-scale biobanks are usually restricted. Considering that the objective of an association study is to establish genotype–phenotype correlations, it is doubtful how easily this could be achieved in the absence of accurate and reliable phenotype description. The use of non-specific or poorly defined phenotypes may partly explain the limited progress so far in glaucoma complex genetics. This report examines the European Glaucoma Society GlaucoGENE project, which is the only large multicentre glaucoma-specific biobank. Unlike previous biorepositories, this initiative focuses on detailed and standardised phenotyping and is expected to become a major resource for future studies on glaucoma.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The major progress in identifying the genetic basis of Mendelian disorders has not been followed by similar achievements in mapping complex diseases, defined as diseases that do not exhibit classic Mendelian inheritance attributable to a single gene but are determined by a number of genetic and environmental factors.1 Specifically, there has been a failure of genetic association studies to discover susceptibility loci or replicate initial positive genotype–phenotype correlations in complex diseases.212 Inadequate statistical power to detect small and moderate effects was recognised as one of the major limitations.2 3 1315 The need for large sample sizes has led to numerous large-scale collaborative projects that systematically store biological material linked to clinical and other information. These so-called “biobanks” are designed to create unprecedented opportunities for understanding the pathogenic basis of common diseases and ultimately for implementing genetic findings in clinical practice and public health.9 16 17 On the other hand, they have raised profound ethical issues1821 and scepticism on whether benefits will outweigh costs.2224 What remains unquestionable is that the genetic epidemiology of complex diseases still has many challenges to face, mainly in terms of study design and methodology.711 22 23 2530 Among these, we emphasise the importance of detailed and standardised phenotyping, which has not been given the attention it deserves12 and does not seem to have been employed in some large biobanks. Since complex diseases are characterised by large phenotypic variability,1 this raises concerns such as how genetic findings derived from such initiatives could be correctly related to the different clinical aspects of a complex disease.

With regards to ophthalmic complex diseases, breakthroughs have already been made in mapping age-related macular degeneration (AMD).3134 The strong effect of a complement factor H variant in AMD (odds ratio >2.45 and population attributable risk up to 50%) was possibly behind the success of these studies, where well-defined criteria for diagnosis were used, although no detailed phenotyping was considered. However, the identification of genetic variants with smaller effects and associations with specific aspects of the phenotype would have possibly required a more detailed phenotypic assessment. This seems to be the case in glaucoma, where genetic findings mostly refer to the minor fraction of cases that follow Mendelian rules of inheritance, meaning that the genetic background of the common, non-Mendelian forms of glaucoma remains largely unknown.3537

In 2010 there will be 60.5 million people with glaucoma worldwide, among which 8.4 million will be bilaterally blind; by 2020, these numbers will increase to 79.6 million and 11.2 million respectively.38 On the basis of new opportunities presented in the postgenomic era, the European Glaucoma Society (EGS) GlaucoGENE project has been designed to provide a reliable, extensive and stable resource to enhance research studies in glaucoma genetics. With detailed and standardised phenotyping as one of its basic principles, not only is this project one of the very few phenotype–genotype databases in the field of ophthalmology, but also it may be regarded as a pioneer biobank.


A “biobank” generally refers to a repository of biological material. In genetic research the term is typically used to describe a biological sample collection from which genetic information can be extracted and matched with clinical and other information. However, several definitions can be found, and no international consensus has been reached. “DNA data bank,” “DNA bank” and “genetic dataset” are commonly used synonyms of “biobank.”

The American National Bioethics Advisory Commission uses the term “DNA bank” to describe a facility that stores extracted DNA or other biological materials for future DNA analysis, which are usually stored with some form of individual identification for later retrieval.39 The Public Population Project in Genomics (P3G), which is a principal international body for the harmonisation of biobanks, sets the additional criterion of large number of samples collected.40 On the other hand, the Swedish Act on Biobanks (2002:297) focuses on the potential of data reidentification rather than the number of samples.41 It has been suggested that what differentiates a genetic study from a biobank is that the former focuses on specific genetic hypotheses, while the latter is oriented toward future hypotheses that may not be framed at the outset.42

Based on overall methodology, biobanks may be either disease-specific or non-disease-specific.43 The term “population-based biobanks” is commonly used to describe the latter category, although subjects are not always randomly selected from the population of reference. Disease-specific biobanks are usually case-control studies recruiting subjects who have developed the disease of interest, as well as healthy individuals. Non-disease-specific biobanks are typically cohort studies, where subjects are recruited from the general population to be followed up over time; depending on study methodology, the recruitment process may involve only healthy individuals or not. However, this is a very crude classification, and several study designs have been employed in biobanks so far.


Based on the catalogue of the P3G observatory, over 100 biobanks with a sample size of more than 10 000 subjects have been completed or are currently being conducted.44 In addition, there are several collaborative projects or networks, each involving a number of biobanks, such as the GenomEUtwin45 and the European Prospective Investigation into Cancer and Nutrition (EPIC).46 Previous articles have extensively discussed study design with regards to cohort versus case-control studies as the optimum approach for studying complex diseases.9 10 22 23 26 In this report we focus on the phenotyping, which we describe as the methodology used to collect information on phenotypes. Following the well-known example of deCODE genetics,47 several national cohort studies have been designed to provide DNA databases. The UK Biobank,48 the CARTaGENE in Canada,49 the Estonian Genome Project (EGP)50 and the Kadoorie Study of Chronic Disease in China51 are only some examples of large-scale cohort biobanks aiming to investigate the genetic basis of multiple important chronic diseases. In all of them, baseline assessment is limited to questionnaires and very basic anthropometric or physical measures. With regards to follow-up, no information is available for the EGP, while in all other projects, outcomes will be assessed by focusing on end-points, that is whether a disease is present or not, through routine medical or other health-related records and national registry systems. Therefore, no standardised phenotyping is to be performed. Considering that case-control48 51 and case-cohort48 studies will be nested within these projects, it is very unclear how cases and controls may be reliably selected and, moreover, how the variety of each disease phenotype will be ascertained. The UK Biobank investigators acknowledge the need for intensive phenotyping in the future, but they recognise that this would not be feasible for the whole cohort; nor has there been a detailed discussion on what these measures should be.48

In disease-specific biobanks, although a wide variety of clinical information is usually available, standardisation may not be included. For example, in the Inflammatory Breast Cancer Research Foundation Biobank52 and the National Psoriasis Victor Henschel BioBank,53 clinical information is provided through medical records. Standardisation of phenotypes has been a concern, even in projects involving standardised baseline assessment, such as the MORGAM, which is a multinational collaboration of cardiovascular cohorts and a component of the GenomEUtwin.54 Due to the nature of the study, a limited number of phenotypes could be standardised with precision.54 Moreover, harmonisation in data management including quality, completeness and consistency is of particular importance in projects involving a large number of biobanks, and such efforts have been already conducted by the GenomEUtwin and the EPIC investigators.55 56


As opposed to Mendelian disorders, causal variants in complex diseases are expected to have rather small effects,2 15 explaining why sample size is a key determinant in association studies.2 3 1315 However, due to the small effect size, the credibility of an association, meaning the likelihood that an association exists after some evidence has been accumulated, may largely depend on the ability to control for errors and bias.25 This is a serious consideration for studies nested within biobanks where potential sources of errors and bias have not been properly addressed. To date, most reports on potential confounding focus on genotyping errors and population stratification,10 2527 5759 while little attention has been given to phenotyping.12 However, even modest levels of error in either the genotyping or the phenotyping may result in significantly diminished power of a study.11

Issues related to phenotypic assessment, such as establishing diagnostic criteria for a disease, determining what measurements to perform, using validated techniques for data collection and distinguishing cases from healthy individuals are not new to medical research. However, when investigating the genetic component of a complex disease, there are additional reasons why they become so important. The essence of an association in genetic epidemiology is to investigate how a genotype is correlated to a phenotype.60 Complex diseases are typically characterised by phenotypic heterogeneity, which refers to the large variability of clinical manifestations within the same disease.1 Phenotypes belonging to a complex disease are composed of a constellation of clinical signs, only some of which may be present in an individual. Elevated intraocular pressure may or may not be present in a patient with glaucoma. Alternatively, clinical signs belonging to several “pure” phenotypes may be present in the same individual. When examining an individual with pseudoexfoliative glaucoma, clinical signs of the optic disc do not exclusively belong to the pseudoexfoliative glaucoma phenotype. Both these situations preclude meaningful phenotypic classification into discrete disease states. Moreover, phenotypes may vary with respect to age of onset of clinical symptoms. Chronic late-onset disorders are typically the result of decades-long processes, developing slowly along a continuum from health to pathology. Therefore, clinical signs may be present at below the threshold for definite classification, and early cases may be misclassified. For the same reason, it is often difficult to characterise an individual as unaffected. Accordingly, for gene polymorphisms and mutations to be correctly related to the variable aspects of a complex disease, it is important to ensure that phenotypic variation is captured with the same precision as genetic variation.12 61

Balancing measurement precision and feasibility is a difficult task in research, especially when aiming to recruit thousands of participants. Wong et al presented the formula for calculating the sample size required to study the interaction between a continuous exposure and a genetic factor. According to their calculations, smaller studies with better measurements would be as powerful as studies even 20 times bigger, which employ fewer and less accurate measurements.23

However, a large number of measurements alone cannot guarantee a high-quality phenotypic dataset. It is imperative that the fundamental principles of “traditional” epidemiology, including use of standardised and reproducible measurements, strict criteria for training, certification and quality control be adopted in genetic association studies.28 Standardisation is of particular importance to ensure that a uniform set of data is collected across the study and to avoid data unreliability and inconsistency. Also, because biobanks usually rely on multicentre collaborations, standardisation should be the goal both within and between centres. Based on the consensus meeting of the Human Genome Epidemiology Network (HuGENet) Working Group on the Assessment of Cumulative Evidence, any bias due to phenotypic measurements could affect not only the magnitude, but even the presence or absence of an association.25 Also, prospective standardisation of phenotypes is the only way to ensure that there is low to no likelihood of bias to invalidate an observed association, even in small effect sizes (odds ratio<1.15).25


In the context of detailed and standardised phenotyping, we present the basic principles and overall methodology of the EGS GlaucoGENE project, which is a large-scale pan-European genetic epidemiology research network. This initiative has been developed by GlaucoGENE, a Special Interest Group of the EGS. Its objective is to create a central database consisting of genetic and standardised phenotypic information from people throughout Europe. With the additional component of proteomics, the database is expected to become a major resource for future studies on glaucoma genetics.

The combined genotype–phenotype approach of glaucoma should inform the strategy for future advances in glaucoma risk stratification and therapy. In addition, because recent studies suggest that glaucoma patients reveal specific patterns of protein and peptides,6266 the identification of potential protein biomarkers and, furthermore, the correlation between protein expression and genotype is likely to lead to a better understanding of disease mechanisms.

The EGS GlaucoGENE project focuses on several subtypes of open-angle glaucoma and angle-closure glaucoma. With systematic phenotyping and ascertainment of probands, family members and controls, genetic analysis will be possible at a number of different levels. These range from the estimation of heritability to quantify the relative importance of genetic and environmental factors, commingling and segregation analyses to identify genes of major effect, to genetic mapping by linkage and association. In addition, the relationships between various glaucoma-related phenotypes and possible gene–environmental interactions may be examined at all these levels.

Standard operating procedures for a most detailed clinical examination, special training and certification have been incorporated to ensure standardisation within and between centres. Also, discrete levels of the phenotypic dataset have been defined to surmount anticipated differences in equipment and infrastructure among centres. The complete dataset involves, among others, imaging of the optic nerve structure with laser imaging technologies, diurnal intraocular pressure curves and laboratory diagnostics. Similarly, standard operating procedures have been employed for biological samples handling to minimise genotyping errors and to ensure high quality of serum samples. A web-based system with limited access to authorised personnel allows reliable, high-quality and accessible data management, and ensures data integrity and safety. Applications have also been developed to facilitate data completeness and data flow control, as well as automated perimetry and imaging quality control. All data are anonymised and held in a central database. Guidelines address the circumstances, in which data and samples will be reidentified, the designation of the personnel who will approve and perform the reidentification and the procedures to be used for this purpose.

A feasibility study for the EGS GlaucoGENE project began in May 2007 and recruitment was completed at the end of 2008, with the participation of four centres: Moorfields Eye Hospital, London, UK, Aristotle University of Thessaloniki, Greece, University of Genoa, Italy and University of Mainz, Germany. The Institute of Ophthalmology, University College London (UCL), UK, and the University of Mainz are responsible for handling and storage biological samples. Prospective standardisation of phenotypes, validation of the information system for digital data entry and storage, and evaluation of the web service supporting digital data transfer are among the goals of the feasibility study. During the main study, a large number of European centres are expected to participate, since many of them have already expressed their interest to participate in the project. A website will soon be available to take expressions of interest; in the mean time, interested parties may contact the Chair of GlaucoGENE, Dr AC Viswanathan.


Major advances in genetics,6769 coupled with progress in bioinformatics and statistics, have revolutionised genetic studies of complex diseases. In addition, large sample sizes have become feasible through national and international collaborative initiatives, such as biobanks and consortia. The recent findings of the Wellcome Trust Case Control Consortium (WTCCC) in seven complex diseases denote the effectiveness of the genomewide association approach.70 On the other hand, since there are major problems in dissecting the molecular basis of even simple monogenic diseases, this challenge is far greater in complex diseases.29 Considering the amount of human and financial resources invested in biobanking, issues related to study design become of critical importance. Among these, phenotyping requires special attention in terms of both adequacy and standardisation, but has not been properly addressed in several large-scale biobanks. This is partly due to the trade-off between sample size and measurement precision. However, employing better measurements may be a more appropriate strategy than attempting to deal with error by increasing sample size.23 In order ultimately to implement genetic findings in clinical practice, more refined questions should be addressed, and this may not be possible through broad phenotypes.60

The concept of multifactorial genetics holds the promise for future advances in glaucoma management. A personalised approach involving effective screening to identify individuals at risk, establishing a precise diagnosis and predicting rate of progression and response to treatment is clearly a long-term but not an unrealistic goal.71 The progress achieved so far in glaucoma complex genetics involves only the reported association of LOXL1 gene with exfoliation glaucoma,72 which has already been replicated in independent studies.7376 However, genetic association studies on primary open-angle glaucoma have had conflicting results or have not been replicated.37 Poor specificity in the currently used phenotype parameters is a possible explanation,37 indicating that glaucoma genetics should focus on quantitative trait locus (QTL) studies, using variables such as intraocular pressure and cup-to-disc ratio.77 78

To this end, a glaucoma-specific biobank would be of great scientific value. Although we agree that cohort studies, case-control studies and family studies will all be needed for optimum progress,9 the case-control design holds two major advantages: far greater statistical power to detect associations can be achieved, because a larger number of cases can be studied, and a more detailed and disease-specific ascertainment of the phenotype is feasible than in a cohort design.22 To date, there are very few glaucoma-specific biobanks,79 80 while the EGS GlaucoGENE project is the only large multicentre biobank covering this need. Under the umbrella of the WTCCC2, another promising initiative is currently under construction, where data are available from three well-designed population-based studies in glaucoma.81 On the other hand, the eyeGENE, which is a biobank involving specifically ophthalmic diseases, focuses on Mendelian disorders82 and therefore may not be of great value for glaucoma complex genetics.

Unlike previous biobanks, the EGS GlaucoGENE project focuses on both detailed and standardised phenotyping, and therefore may be regarded as an innovative effort in genetic epidemiology overall. Based on its multicentre structure, a large number of well-characterised glaucoma cases and controls will be achieved, which are also expected to be representative of the European population. Special training, periodic control of data completeness and data quality control by certified centres are also among the strengths of the study. In addition, the feasibility study is almost completed, providing prospective standardisation of procedures, which will increase the likelihood to identify associations even in small effect sizes. For all these reasons, the EGS GlaucoGENE project should provide a broad and comprehensive framework for future studies in glaucoma genetics.



  • Competing interests: None.

  • Funding: The EGS GlaucoGENE Project is supported by a research grant from the European Glaucoma Society Foundation

Linked Articles

  • At a glance
    Harminder S Dua Arun D Singh