Celeste E Cohen
Since the official completion of the Human Genome Project in 2003, hundreds of thousands of human genomes have been sequenced and studied around the world. In genetics, however, “around the world” has often meant majoritarily populations of European descent, interspersed with few and often excluded samples from other ethnicities and geographical areas. As of 2021, 86% of studies on human genomes were based on individuals of European descent . This is in part due to lack of sequencing resources outside of areas with majority populations of European descent (Europe, North America, Oceania). Coupled with lack of data, another deterrent for researchers has often been the need for additional statistical steps when studying multiple populations, to avoid falsely detecting genetic variation between populations as disease-causing variation.
So why does this matter? With the rise of Genome-Wide Association Studies (GWAS), researchers have identified a plethora of genetic variants associated with various diseases. The synthesis of many such studies has led to the development of polygenic risk scores, which score the risk of an individual of developing a disease based on the variants they have in their genome. A 2021 study  found that polygenic risk scores for breast cancer, based on previous GWAS and other association studies, although significantly predictive of breast cancer risk for women from European, African and Latin ancestry, remained less accurate predictors in African women. Interestingly, even this study, which outlines the importance of including individuals of non-European descent in genetic studies, notes that its effect sizes for women of African descent were significantly smaller. Not only this, but all participating individuals were recorded in US medical centers, when genomes from African American women are not necessarily representative of the genetic makeup of women from various parts of continental Africa .
This is only one of many examples. Cystic fibrosis, for instance, is a Mendelian disease whose genetic cause varies greatly between European and African American populations. 70% of European patients have a causal mutation called ΔF508 in the CFTR gene, only present in 29% of African American patients. Over 2,000 mutations in the CFTR gene can cause cystic fibrosis, and different mutations are thought to affect pathology and treatment requirements , making it essential to understand the main causes of the disease in populations beyond those of European ancestry. Another study from 2019 looked at polygenic risk scores from the UK BioBank across 17 different traits and showed that the accuracy of these predictions significantly declined for continental African populations (fig.1). We thus lack the genetic tools to accurately predict and treat genetic diseases in African and African American populations.
Fig. 1: “Prediction accuracy relative to European-ancestry individuals across 17 quantitative traits and 5 continental populations in the UKBB.” Figure from .
What does all of this mean? Institutional and government funding, along with scientific effort and initiative, must be directed towards diversity in genomic studies, if the existing gap in the healthcare of patients of non-European descent is not to be exacerbated. Even the most recent studies, such as a recent GWAS on height, praised for diversity as the largest GWAS ever conducted , are nowhere near representing the ethnic and geographic diversity of the global population.
1. Fatumo, S., et al., A roadmap to increase diversity in genomic studies. Nature Medicine, 2022. 28(2): p. 243-250.
2. Liu, C., et al., Generalizability of Polygenic Risk Scores for Breast Cancer Among Women With European, African, and Latinx Ancestry. JAMA Network Open, 2021. 4(8): p. e2119084-e2119084.
3. Sirugo, G., S.M. Williams, and S.A. Tishkoff, The Missing Diversity in Human Genetic Studies. Cell, 2019. 177(1): p. 26-31.
4. Martin, A.R., et al., Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics, 2019. 51(4): p. 584-591.
5. Yengo, L., et al., A saturated map of common genetic variants associated with human height. Nature, 2022. 610(7933): p. 704-712.