Skip to main content

Polygenic score distribution differences across European ancestry populations: implications for breast cancer risk prediction

Abstract

Background

The 313-variant polygenic risk score (PRS313) provides a promising tool for clinical breast cancer risk prediction. However, evaluation of the PRS313 across different European populations which could influence risk estimation has not been performed.

Methods

We explored the distribution of PRS313 across European populations using genotype data from 94,072 females without breast cancer diagnosis, of European-ancestry from 21 countries participating in the Breast Cancer Association Consortium (BCAC) and 223,316 females without breast cancer diagnosis from the UK Biobank. The mean PRS was calculated by country in the BCAC dataset and by country of birth in the UK Biobank. We explored different approaches to reduce the observed heterogeneity in the mean PRS across the countries, and investigated the implications of the distribution variability in risk prediction.

Results

The mean PRS313 differed markedly across European countries, being highest in individuals from Greece and Italy and lowest in individuals from Ireland. Using the overall European PRS313 distribution to define risk categories, leads to overestimation and underestimation of risk in some individuals from these countries. Adjustment for principal components explained most of the observed heterogeneity in the mean PRS. The mean estimates derived when using an empirical Bayes approach were similar to the predicted means after principal component adjustment.

Conclusions

Our results demonstrate that PRS distribution differs even within European ancestry populations leading to underestimation or overestimation of risk in specific European countries, which could potentially influence clinical management of some individuals if is not appropriately accounted for. Population-specific PRS distributions may be used in breast cancer risk estimation to ensure predicted risks are correctly calibrated across risk categories.

Background

Genetic susceptibility to breast cancer is influenced by multiple genetic variants that contribute to different levels of risk [1,2,3,4,5,6]. Genome-wide association studies (GWAS) have identified a large number of common variants that each contribute a small risk to the disease but can be combined into polygenic risk scores (PRSs) with greater effects [7, 8]. PRSs provide a promising tool for clinical breast cancer risk prediction by stratifying women into different risk categories [9,10,11] and may be used to inform targeted screening and prevention strategies [12,13,14,15,16,17,18,19,20].

Mavaddat et al. [11] constructed a 313-variant PRS (PRS313) for breast cancer using data from women of European ancestry participating in the Breast Cancer Association Consortium (BCAC). In prospective validation studies, this PRS was estimated to be associated with a relative risk for breast cancer of ~ 1.6 per standard deviation (SD) increase. The lifetime absolute risk of developing overall breast cancer for women in the 1% of the PRS313 risk distribution was ~ 2%; while for those in the 99% was 32.6%. PRS313 has been incorporated into the CanRisk tool (www.canrisk.org) [14, 21, 22] and together with other lifestyle and genetic risk factors, has been shown to improve risk stratification in European ancestry populations [14, 23,24,25,26,27]. Several large studies have investigated the transferability of PRSs developed in European ancestry population to non-European populations, finding that the strength of associations with breast cancer risk were attenuated, particularly among women of African ancestry, compared to association among women of European ancestry [28,29,30].

PRS distributions across different European countries have not, however, been extensively evaluated. Differences in the PRS distribution, if not appropriately accounted for, could lead to inappropriate risk classification, with implications for clinical management. Here, we examined the distribution of the PRS313 across 17 countries in Europe, together with individuals of European ancestry from Australia, Canada, Israel and the USA. Similar analyses were performed using data from the UK Biobank, stratifying individuals by country of birth. We explored different approaches to account for PRS313 distribution differences across countries, and investigated the implications of the observed variability for breast cancer risk prediction.

Methods

Study populations

Breast Cancer Association Consortium dataset

The BCAC dataset used here consisted of 110,260 female invasive breast cancer cases and 94,072 female controls of European ancestry who were recruited into 84 studies from 21 countries participating in the BCAC (Table S1A). For simplicity and in an attempt to explore the effect on the general female population, only the control data were used. Samples were genotyped using the iCOGS [1] or OncoArray [3, 31] genotyping arrays. The iCOGS and OncoArray datasets were imputed separately and ancestry-informative principal components (PCs) were calculated, as described previously [2, 3, 31].

UK Biobank dataset

Genotype data from females (genetically reported sex) participating in the UK Biobank were used. Individuals were excluded if they had a recorded breast cancer diagnosis (malignant neoplasm or carcinoma in situ of the breast) or had a personal history of malignant neoplasm of the breast, based on the cancer registry or self-reported. Individuals with a SNP call rate < 0.95 were removed from the analysis. Genetic ancestry was inferred using FastPop software [32]. Individuals self-reported as “white” and with an estimated European ancestry proportion ≥ 80% were retained in the analysis. Individuals were subsequently stratified by the “country of birth” field in the UK Biobank; only countries with at least 100 participants were included. After filtering, 223,316 females from 21 countries were included in the analyses (Table S1B). More details on the genotyping, quality control, imputation procedures used, and calculation of PCs are given elsewhere [33, 34].

All participants provided written informed consent, and all the studies were approved by the relevant ethics committees. The use of UK Biobank data has been approved under the application with ID102655, and BCAC data under the application with access number 712.

Statistical analysis

PRS313 was developed previously [11] and included variants independently associated with breast cancer risk at a P cut-off < 10−05. The PRS313 was calculated for each study participant using the following formula:

$$PRS_{j} = \beta_{1} x_{j1} + \cdots \beta_{k} x_{jk} + \beta_{313} x_{j;313}$$

where \(PRS_{j}\) is the PRS of individual j, \(x_{jk}\) is the estimated effect allele dosage for \(SNP_{k}\) carried by individual j and can take values between 0 and 2, and \(\beta_{k}\) is the weight for \(SNP_{k}\) in the PRS for overall breast cancer, as derived by Mavaddat et al. [11] PRS313 was standardized to have unit SD in controls in the pooled dataset. Mavaddat et al. also derived specific versions of PRS313 for oestrogen receptor (ER) subtypes, with weights optimised for predicting ER-positive or ER-negative breast cancer risk (Table S2). The main analyses focused on calculating the mean standardized PRS313 in BCAC controls using both the iCOGS and OncoArray datasets. These values were derived using linear regression with array type as a covariate and no intercept (so that estimates were generated for every country). Heterogeneity in the mean PRS313 between countries was assessed using I2 statistics and Q statistic P-values.

We also evaluated the distribution of the mean PRS by country of birth in the UK Biobank dataset. Seven of the 313 variants were not available in the UK Biobank data; thus, we used the remaining 306 variants in the analysis (PRS306) (Table S2). PRS306 was standardized to have unit SD in controls in the pooled UK Biobank dataset. We also evaluated a “standard” breast cancer PRS available in the UK Biobank data, previously generated from external GWAS data [35] and was available for 222,989 individuals (Table S1B). This PRS was also standardized to have unit SD in controls in the pooled UK Biobank dataset.

Potential sources of the variability in the mean PRS313 across the countries were explored in the BCAC dataset using three approaches. The PRS was first recalculated excluding variants in the CHEK2 region. The protein truncating variant CHEK2 c.1100delC is a relatively common founder variant that exhibits a large variation in frequency across Europe [36]. Although it is not included in PRS313, other variants in PRS313 are correlated with this variant (Table S2) and were removed.

Second, we examined the effect of removing variants with the most variable frequency across countries. The mean and SD of the effect allele frequency across countries, in controls of the pooled dataset were calculated for each of the 313 variants. Variants with a coefficient of variation (SD/mean) > 0.3 were removed.

Third, we explored the effect of adjusting for up to 10 ancestry-informative PCs, in addition to array type. As the PCs derived from the iCOGS and OncoArray databases are not comparable, separate PCs for each were included in the regression. We explored the number of PCs that were required to eliminate heterogeneity in the adjusted mean PRS313 using the thresholds I2 < 10% and P > 0.05. Similarly, for the UK Biobank dataset, PRS306 was adjusted for up to 10 PCs, which were available in the UK Biobank.

As a complementary approach to generating population-specific estimates, we explored an empirical Bayes approach similar to that described by Clayton and Kaldor [37] for mapping disease rates (details in Additional File 1).

To investigate the implications of PRS distribution differences in breast cancer risk prediction, we explored the proportion of women by country by percentile based on the distribution cut-offs of either the full dataset or country-specific values, separately in the BCAC and the UK Biobank. We also examined two specific risk estimation examples using the CanRisk tool [14, 21, 22].

All analyses were performed in R (version 4.2.1).

Results

Geographic diversity in the mean PRS 313 across European ancestry populations

The mean PRS313 in the BCAC controls differed markedly across European countries, with I2 = 80% (P = \(5.6 \times 10^{ - 13}\)). The mean was highest in the Republic of North Macedonia and Greece and lowest in Ireland. A similar level of heterogeneity was observed for the ER-positive (I2 = 84%) and ER-negative (I2 = 64%) PRSs. There was no evidence of a difference in the SD of the PRS between countries (Fig. 1; Tables 1, S3).

Fig. 1
figure 1

Standardized PRS313 distribution across countries for overall, ER-positive and ER-negative breast cancer in BCAC. The squares represent the mean PRS by country, and the error bars represent the corresponding 95% confidence intervals. ER, Oestrogen receptor; FE Model, Fixed-effects Model; PRS, Polygenic risk score

Table 1 Mean standardized PRS313 by country in controls of the pooled BCAC dataset

The mean PRS306 in female UK Biobank participants, stratified by country of birth, was also calculated. There was strong evidence of heterogeneity in the PRS distribution (I2 = 63%, P = \(1.7 \times 10^{ - 05}\)). The pattern was generally similar to that seen in the BCAC dataset, with a higher mean PRS in individuals born in Cyprus, Russia, and Italy) and a lower PRS in Ireland). Similar results were found for the “standard” UK Biobank PRS (I2 = 85%, P = \(8.5 \times 10^{ - 21}\)) (Fig. 2; Table S4).

Fig. 2
figure 2

PRS distribution across countries for overall breast cancer in the UK Biobank. Distribution of the mean PRS306 and “standard” PRS for breast cancer, as defined in the UK Biobank, across countries of origin for participating white females. The squares represent the mean PRS by country, and the error bars represent the corresponding 95% confidence intervals. FE Model, Fixed-effects Model; PRS, Polygenic risk score

Exploring potential reasons for differences in the mean PRS between countries

Potential sources of the variability in the mean PRS313 across the countries were explored in the BCAC dataset using three approaches. After removing variants in the CHEK2 region, the variation in the mean PRS across countries remained similar to PRS313 (I2 = 83%, P \(= 9.4 \times 10^{ - 16}\)). We next identified the variants with the most variable frequency among the countries. Seventeen variants had a coefficient of variation > 0.3 (Table S2). Excluding these 17 variants did not reduce the variation in the mean PRS (I2 = 80%, P \(= 2.4 \times 10^{ - 12}\)).

We next explored the effect of adjusting for PCs. When individuals in the BCAC dataset genotyped with OncoArray were plotted by the first two PCs, those from the same country separated clearly in a pattern consistent with their geographical relationship (Fig. S1). This finding suggested that adjusting for PCs maybe an effective approach for reducing the variation in PRS distribution. When we adjusted the PRS for the leading PCs in the BCAC dataset, the I2 decreased as each PC was added to the model and reached < 10% when adjusted for the first six PCs (Table 1, Table S3, Fig. S2). A similar result was obtained for the ER-positive PRS, after adjustment for the first six PCs (I2 = 0%, P = 0.69). For the ER-negative PRS, however, heterogeneity was not eliminated even when the PRS was adjusted for 10 PCs (I2 = 56%, P = 0.001) (Table S3). The predicted PRS of each individual, as derived from the fitted values of the linear regression model of PRS adjusted for the first six PCs and array type, was subsequently used to calculate a predicted mean PRS313 by country (Tables 1, S3). We repeated these analyses for PRS306 using the UK Biobank dataset. I2 decreased as each PC was added to the model and reached < 10% and ~ 0% when adjusted for the first seven and eight PCs, respectively (Fig. S3, Table S4).

Mean PRS estimates by country calculated using an empirical Bayes approach

The empirical Bayes estimates by country for the mean PRS were calculated in the BCAC dataset (Table 1, Table S5). Compared with the unadjusted estimates, the estimates shrunk toward the overall mean, with shrinkage being greatest for countries with small available sample sizes. The adjusted mean PRS by country were generally similar to those predicted by the model adjusted for six PCs. When PRSs were adjusted for the first six PCs, applying the empirical Bayes approach made little difference in the estimates.

Implications for Breast Cancer Risk Prediction

To explore the effect of PRS distribution differences among European populations on risk stratification, we first defined risk thresholds based on the distribution of the controls in the full BCAC and the UK Biobank datasets separately. We then calculated the percentage of controls by country that would be categorized in each percentile based on the distribution in the full dataset and compared these to the percentages based on the country-specific distributions (Tables S6, S7, S8). PRS313 percentile distribution in the full BCAC dataset, Greece, Italy (highest PRS313 and including > 100 controls) and Ireland (lowest PRS313) are illustrated (Fig. 3, Table S7). Based on the overall distribution, ~ 1.3% and ~ 0.5% additional women from Greece, and Italy, respectively, were incorrectly classified in the 95–99th percentile instead of in the 90–95th percentile, while ~ 1.4% additional women from Ireland were incorrectly classified in the 90–95th instead of the 95–99th percentile (Table S6C). Similar results were observed for the UK Biobank (Fig. S4).

Fig. 3
figure 3

PRS313 distribution by percentiles in the pooled BCAC dataset, Greece, Ireland and Italy. The dashed line corresponds to the 95th percentile of the PRS313 distribution in controls of the pooled BCAC dataset

An example a 50-year-old female from Greece with a raw PRS313 of 0.34 (falling into the 90–95th percentile-in the full BCAC dataset) and no other risk factors known was considered. Using the CanRisk tool she would be classified in the moderate risk category. If the PRS were standardized based on the mean and SD of the controls from Greece or based on the values of PRS for Greece predicted by adjustment for the first six PCs, she would be classified into the population risk category. If the PRS were standardized based on the values of the empirical Bayes approach she would be classified into the moderate risk category (Table 2).

Table 2 Risk estimation examples using the CanRisk tool

A second example based on a 50-year-old female from Ireland with a raw PRS313 equal to 0.27 (falling into the 85–90th percentile-in the full BCAC dataset), and no other risk factors known was considered. Using the CanRisk tool, she would be classified in the population risk category. If the PRS were standardized based on the mean and SD of PRS313 as derived from the controls in Ireland or based on the values of the empirical Bayes approach, she would be classified in the moderate risk category. If the PRS was standardized based on values of PRS for Ireland predicted by adjustment for the first six PCs, she would be classified in the population risk category (Table 2).

Discussion

The transferability of PRSs across different populations remains a major challenge in the field of personalized cancer risk prediction [38, 39]. Here, we explored the distribution of PRS313 for breast cancer in women of European ancestry from 21 countries using data from studies participating in the BCAC and further investigated how the observed variability might be accounted for in breast cancer risk prediction.

The results indicated that the PRS313 distribution varies markedly even within European ancestry populations, with a higher mean in Greece and Italy and a lower mean in Ireland. We observed a very similar pattern in females participating in the UK Biobank based on country of birth. If not accounted for, these differences could lead to an over- or underestimation of risk, thus affecting the risk categorization and possibly the clinical management of some women. This may be important not only at the individual country level but also for individuals living in a different country than their origin.

The variability in the mean PRS313 could not be explained by removing variants with the most variable frequency, indicating that a large number of variants may contribute to this difference. Removing such variants to reduce heterogeneity would not be desirable, as it would reduce the risk discrimination provided by the PRS. The results do, however, indicate that most, if not all, of the variability in the mean PRS313 across countries in controls can be explained by adjusting for the leading ancestry-informative PCs.

We also explored generating country-specific mean PRS using an empirical Bayes approach. This approach considers both the uncertainty due to the small sample size and the true variation in the means across the countries; these country-specific mean PRSs were similar to those generated by adjusting for PCs. These values can then be used to standardize the PRS before, for example, it is implemented in the CanRisk tool. CanRisk is an online tool that enables healthcare professionals to calculate an individual’s future risk of developing breast and ovarian cancer using a combination of genetic factors (including the PRS), lifestyle/hormonal risk factors, breast density and family history. The risks are provided both over a period of time (e.g. 10 years) and lifetime, and these risks can be used to classify an individual according to management guidelines, including the National Institute for Health and Care Excellence guideline (NICE-CG164) on familial breast cancer (which classifies individuals as “Near population risk”, “Moderate risk” and “High risk”) [40].

The optimal approach to calibration will depend on what data are available. If a large control sample (n > 1,000) is available, it will be preferable to utilise estimates from this. If sample sizes are smaller, there seems little to choose between adjustment for PCs or an empirical Bayes approach. Adjustment for PCs has the advantage into account spatial variation. Using PCs has the advantage that they do not require any prior data from the population in question, and the approach naturally takes into account spatial variation in the PRS. A disadvantage, however, is that PCs require array genotyping data to generate, making them less attractive when implemented using sequencing panels. Moreover, the PCs generated using different genotyping arrays are not necessarily comparable. We also note that the heterogeneity of the ER-negative specific PRS was not eliminated even with the adjustment for 10 PCs. The empirical Bayes approach is simpler to implement, providing some control data are available for the population of interest.

The risk categorization of the two examples when using the CanRisk tool in the Results section, was changed depending on the mean and SD of the sample used for the standardization of the PRS. According to the NICE guideline CG164, women classified in the “Moderate risk” category have different managing guidelines than women classified in the “Near population risk” category [40].

While adjustment of the PRS distribution at the population level is clearly necessary, the results raise the question as to whether it is appropriate in general to adjust the PRS for PCs at the individual level, which gives different scores and potentially different risk classifications. This is a difficult question to address and hinges on whether the PCs should be regarded as nuisance parameters correcting for confounding factors, such as screening or lifestyle factors. Reanalysis of prospective studies with the BCAC OncoArray dataset showed that the first two PCs are associated with the PRS (PC1 negatively, PC2 positively) and are also associated with risk (in the same direction). The PRS effect size (OR per 1 SD) was essentially unchanged whether or not adjustment was made for PCs (data not shown). This finding implies that risk discrimination could be slightly improved by including the effect of PCs in the PRS and that adjusting the PRS for PCs further reduces the discrimination ability. Fortunately, the association between PC1 and risk is weak, and within a country, the variation in PC1 is not large enough to materially change risk categories.

The differences in the PRS distribution across Europe are a manifestation, on a continental scale, of the larger intercontinental differences—the mean PRS is higher in both East Asian and African populations than in the European dataset examined here [28, 29, 41]. Interestingly, the pattern within European ancestry women appears to be unrelated to population-specific incidence which is lower in Italy and Greece than in north-western Europe, including Ireland, UK, and Scandinavia [42], presumably because the effect on disease incidence is counterbalanced by greater effects of lifestyle (or other genetic) factors. It remains unclear whether the differences in PRS can be attributed purely to random genetic drift or whether selection pressures relevant to breast cancer aetiology are involved.

We should emphasise that, while adjustment for the PRS distribution is clearly important, there is no evidence for variation in the effect size (relative risk per standard deviation). Different effect sizes could result from different variant allele frequencies and (since most of the SNPs in the PRS are not causal) differences in linkage disequilibrium patterns. However, there is no evidence for this—the effect sizes (relative risks per standard deviation) are very similar across prospective validation studies [11, 26], though there is admittedly not yet good prospective data for southern/eastern-European populations. Whilst attenuation of the effect size is seen in non-European populations, the any different in effect size among European populations is likely to be very small.

We would like to acknowledge several potential limitations of our study. The dataset we used was genetically homogeneous and may not be completely representative of the population of each country. How to interpret the PRS in individuals classified as mixed ancestry is an important issue that could be explored. Furthermore, evaluation of the country-specific calibrated PRS in combination with classical breast cancer risk factors should be performed to explore the ability of these findings to predict the final risk. Finally, while we have evaluated the variation in PRS among European populations, similar issues will apply to PRS in other ancestries and in other countries, and to groups of more mixed ancestry. Similar approaches, using a combination of population-specific control data, principal component adjustment and/or empirical Bayes estimation, should also be useful for PRS calibration more generally.

In summary, these results demonstrate that the implementation of the PRS313 in risk prediction models such as CanRisk/BOADICEA could require country-specific calibration. This can be achieved by genotyping a large control group to obtain population-specific means, by using a PC adjustment, or the empirical Bayes approach described here.

Conclusions

In this study, we observed a remarkable difference in the mean breast cancer PRS within European ancestry populations, when we used data from more than 300,000 women with no previous breast cancer diagnosis. This heterogeneity could influence the classification of some individuals if not appropriately accounted for, leading to risk overestimation in some individuals and risk underestimation inothers, with potential implications for clinical management. Adjusting for principal components seems to correct distribution differences across populations. Therefore, the implementation of PRS for breast cancer risk prediction in European ancestry populations, will required population-specific calibration, for more accurate risk estimation. This is particularly important for countries not represented in the original PRS development.

Availability of data and materials

The BCAC and the UK Biobank data that support the findings of this study are available via application to the Data Access and Co-ordination Committee (BCAC@medschl.cam.ac.uk) and via application to https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access, respectively. The Breast Cancer Association Consortium data have been used under the application with access number 712. The UK Biobank data have been used under the access application with ID: 102655.

Abbreviations

BCAC:

Breast Cancer Association Consortium

BOADICEA:

The Breast and Ovarian Analysis of the Disease Incidence and Carrier Estimation Algorithm

COGS:

Collaborative Oncological Gene-Environment Study

ER:

Oestrogen receptors

FE Model:

Fixed-effects Model

GWAS:

Genome-wide association studies

OR:

Odds ratio

P :

P-Value

PC:

Principal component

PRS:

Polygenic risk score

SD:

Standard deviation

SE:

Standard error

References

  1. Michailidou K, Hall P, Gonzalez-Neira A, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013;45(4):353–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Michailidou K, Beesley J, Lindstrom S, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat Genet. 2015;47(4):373–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Michailidou K, Lindström S, Dennis J, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551(7678):92–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Zhang H, Ahearn TU, Lecarpentier J, et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat Genet. 2020;52(6):572–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Dorling L, Carvalho S, Allen J, et al. Breast cancer risk genes—association analysis in more than 113,000 women. N Engl J Med. 2021;384(5):428–39.

    Article  CAS  PubMed  Google Scholar 

  6. Kuchenbaecker KB, Hopper JL, Barnes DR, et al. Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA. 2017;317(23):2402–16.

    Article  CAS  PubMed  Google Scholar 

  7. Choi SW, Mak TS, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Wand H, Lambert SA, Tamburro C, et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature. 2021;591(7849):211–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mavaddat N, Pharoah PD, Michailidou K, et al. Prediction of breast cancer risk based on profiling with common genetic variants. J Natl Cancer Inst. 2015;107(5):djv036.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Khera AV, Chaffin M, Aragam KG, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50(9):1219–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Mavaddat N, Michailidou K, Dennis J, et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet. 2019;104(1):21–34.

    Article  CAS  PubMed  Google Scholar 

  12. Shieh Y, Eklund M, Madlensky L, et al. Breast cancer screening in the precision medicine era: risk-based screening in a population-based trial. J Natl Cancer Inst. 2017;109(5):djw290.

    Article  Google Scholar 

  13. Pashayan N, Morris S, Gilbert FJ, Pharoah PDP. Cost-effectiveness and benefit-to-harm ratio of risk-stratified screening for breast cancer: a life-table model. JAMA Oncol. 2018;4(11):1504–10.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Lee A, Mavaddat N, Wilcox AN, et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet Med. 2019;21(8):1708–18.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12(1):44.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Pashayan N, Antoniou AC, Ivanus U, et al. Personalized early detection and prevention of breast cancer: ENVISION consensus statement. Nat Rev Clin Oncol. 2020;17(11):687–705.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Brooks JD, Nabi HH, Andrulis IL, et al. Personalized risk assessment for prevention and early detection of breast cancer: integration and implementation (PERSPECTIVE I&I). J Pers Med. 2021;11(6):511.

    Article  PubMed  PubMed Central  Google Scholar 

  18. van den Broek JJ, Schechter CB, van Ravesteyn NT, et al. Personalizing breast cancer screening based on polygenic risk and family history. J Natl Cancer Inst. 2021;113(4):434–42.

    Article  PubMed  Google Scholar 

  19. Pashayan N, Easton DF, Michailidou K. Polygenic risk scores in cancer screening: A glass half full or half empty? Lancet Oncol. 2023;24(6):579–81.

    Article  PubMed  Google Scholar 

  20. Yang X, Kar S, Antoniou AC, Pharoah PDP. Polygenic scores in cancer. Nat Rev Cancer. 2023;23(9):619–30.

    Article  PubMed  Google Scholar 

  21. Carver T, Hartley S, Lee A, et al. CanRisk tool—a web interface for the prediction of breast and ovarian cancer risk and the likelihood of carrying genetic pathogenic variants. Cancer Epidemiol Biomark Prev. 2021;30(3):469–73.

    Article  CAS  Google Scholar 

  22. Archer S, Babb de Villiers C, Scheibl F, et al. Evaluating clinician acceptability of the prototype CanRisk tool for predicting risk of breast and ovarian cancer: A multi-methods study. PLoS One. 2020;15(3):e0229999.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lakeman IMM, Rodríguez-Girondo M, Lee A, et al. Validation of the BOADICEA model and a 313-variant polygenic risk score for breast cancer risk prediction in a Dutch prospective cohort. Genet Med. 2020;22(11):1803–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Pal Choudhury P, Brook MN, Hurson AN, et al. Comparative validation of the BOADICEA and Tyrer-Cuzick breast cancer risk models incorporating classical risk factors and polygenic risk in a population-based prospective cohort of women of European ancestry. Breast Cancer Res. 2021;23(1):22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Li SX, Milne RL, Nguyen-Dumont T, et al. Prospective evaluation of the addition of polygenic risk scores to breast cancer risk models. JNCI Cancer Spectr. 2021;5(3):pkab021.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Yang X, Eriksson M, Czene K, et al. Prospective validation of the BOADICEA multifactorial breast cancer risk prediction model in a large prospective cohort study. J Med Genet. 2022;59(12):1196–205.

    Article  CAS  PubMed  Google Scholar 

  27. Lee A, Mavaddat N, Cunningham A, et al. Enhancing the BOADICEA cancer risk prediction model to incorporate new data on RAD51C, RAD51D, BARD1 updates to tumour pathology and cancer incidence. J Med Genet. 2022;59(12):1206–18.

    Article  CAS  PubMed  Google Scholar 

  28. Ho WK, Tan MM, Mavaddat N, et al. European polygenic risk score for prediction of breast cancer shows similar performance in Asian women. Nat Commun. 2020;11(1):3833.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Du Z, Gao G, Adedokun B, et al. Evaluating polygenic risk scores for breast cancer in women of African ancestry. J Natl Cancer Inst. 2021;113(9):1168–76.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Liu C, Zeinomar N, Chung WK, et al. Generalizability of polygenic risk scores for breast cancer among women with European, African, and Latinx Ancestry. JAMA Netw Open. 2021;4(8):e2119084.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Amos CI, Dennis J, Wang Z, et al. The OncoArray consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol Biomark Prev. 2017;26(1):126–35.

    Article  Google Scholar 

  32. Li Y, Byun J, Cai G, et al. FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data. BMC Bioinform. 2016;17:122.

    Article  Google Scholar 

  33. Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562(7726):203–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sudlow C, Gallacher J, Allen N, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Thompson DJ, Wells D, Selzam S, et al. UK Biobank release and systematic evaluation of optimised polygenic risk scores for 53 diseases and quantitative traits. medRxiv. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2022.06.16.22276246, 16 June 2022, preprint: not peer reviewed.

  36. Schmidt MK, Hogervorst F, van Hien R, et al. Age- and tumor subtype-specific breast cancer risk estimates for CHEK2*1100delC carriers. J Clin Oncol. 2016;34(23):2750–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Clayton D, Kaldor J. Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics. 1987;43(3):671–81.

    Article  CAS  PubMed  Google Scholar 

  38. Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu Rev Biomed Data Sci. 2022;5:293–320.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51(4):584–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. National Institute for Health and Care Excellence: Guidelines. In: Familial breast cancer: classification, care and managing breast cancer and related risks in people with a family history of breast cancer. London: National Institute for Health and Care Excellence (NICE) Copyright © NICE 2020.; 2019.

  41. Ho WK, Tai MC, Dennis J, et al. Polygenic risk scores for prediction of breast cancer risk in Asian populations. Genet Med. 2022;24(3):586–600.

    Article  CAS  PubMed  Google Scholar 

  42. Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This research has been conducted using the UK Biobank Resource under application number 102655. The remaining acknowledgements are available online.

Funding

All funding information are available online.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

Writing Group: KY, KMi, DFE, ACA, NMa, and JSi; Study design: KMi, DFE, ACA, NMa, JSi, and KY; Data management: MKB, QW; Statistical Analysis: KY, NMa, JD, MZ, DFE, KMi; Provided data: MA, TUA, ILA, HA-C, NNA, VA, KJA, AAu, ABat, SBe, MBerm, ABer, KBia, NB, CBo, NVB, SEB, KBr, HBra, HBre, NJC, FC, JEC, JC-C, GC-T, WKC, NBCS Collaborators, SVC, FJC, ACox, SSC, KCz, MBD, PD, TD, AMD, DME, AHE, CEn, ME, DGE, PAF, OF, HF, MG-D, AG-M, AG-N, PGu, EHah, CAH, PHall, UH, JMH, VH, JH, AHol, EHon, MJH, RH, JLHo, SH, AHow, ABCTB Investigators, kConFab Investigators, SJ, AJak, HJ, NJ, RKa, EKK, CMKi, SKou, VNK, JVL, DLa, FLej, ALin, MLus, RJM, AMan, DM, UM, RLM, RAM, HNe, NOb, KOf, T-WP-S, AVP, CP, PPe, PDPP, GPi, DPK, KPy, PRa, MUR, GR, ER, JR, ARo, EHR, ES, DPS, EJS, MKS, RKS, CSc, X-OS, MCS, JSt, JAT, LRT, CMV, IVDB, WW, RWi, WZ, JSi, ACA, DFE. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Kyriaki Michailidou.

Ethics declarations

Ethics approval and consent to participate

All study participants gave written informed consent, and all the Breast Cancer Association Consortium studies were approved by the relevant ethics committees. The Breast Cancer Association Consortium data have been used under the application with access number 712. The use of the UK Biobank has been approved under application ID102655.

Consent for publication

Not applicable.

Competing interests

The following authors declare conflicts not directly relevant to this work as stated below: U.M. has a patent (no: EP10178345.4) for Breast Cancer Diagnostics and held personal shares in Abcodia Ltd between 2011 and 2021. She has research collaborations with Mercy Bioanalytics, iLOF, RNA Guardian and Micronoma in the field of early detection of cancer. P.A.F. conducts research funded by Amgen, Novartis and Pfizer. He received Honoraria from Roche, Novartis and Pfizer. R.A.M. is a Consultant for Pharmavite.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yiangou, K., Mavaddat, N., Dennis, J. et al. Polygenic score distribution differences across European ancestry populations: implications for breast cancer risk prediction. Breast Cancer Res 26, 189 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-024-01947-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-024-01947-x

Keywords