- Research
- Open access
- Published:
Evolutionary measures show that recurrence of DCIS is distinct from progression to breast cancer
Breast Cancer Research volume 27, Article number: 43 (2025)
Abstract
Background
Progression from pre-cancers like ductal carcinoma in situ (DCIS) to invasive disease (cancer) is driven by somatic evolution and is altered by clinical interventions. We hypothesized that genetic and/or phenotypic intra-tumor heterogeneity would predict clinical outcomes for DCIS since it serves as the substrate for natural selection among cells.
Methods
We profiled two samples from two geographically distinct foci from each DCIS in both cross-sectional (n = 119) and longitudinal cohorts (n = 224), with whole exome sequencing, low-pass whole genome sequencing, and a panel of immunohistochemical markers.
Results
In the longitudinal cohorts, the only statistically significant associations with time to non-invasive DCIS recurrence were the combination of treatment (lumpectomy only vs mastectomy or lumpectomy with radiation, HR 12.13, p = 0.003, Wald test with FDR correction), ER status (HR 0.16 for ER+ compared to ER−, p = 0.0045), and divergence in SNVs between the two samples (HR 1.33 per 10% divergence, p = 0.018). SNV divergence also distinguished between pure DCIS and DCIS synchronous with invasive disease in the cross-sectional cohort. In contrast, the only statistically significant associations with time to progression to invasive disease were the combination of the width of the surgical margin (HR 0.67 per mm, p = 0.043) and the number of mutations that were detectable at high allele frequencies (HR 1.30 per 10 SNVs, p = 0.02). No predictors were significantly associated with both DCIS recurrence and progression to invasive disease, suggesting that the evolutionary scenarios that lead to these clinical outcomes are markedly different.
Conclusions
These results imply that recurrence with DCIS is a clinical and biological process different from invasive progression.
Background
The improvement of radiological techniques and preventive screening of breast cancer conducted on a large scale makes it possible to identify mammary gland neoplasms at an early stage of development, when they are still confined within the glandular ducts. This neoplasm is termed ductal carcinoma in situ (DCIS) [1]. Estimates from several natural history studies of DCIS indicate that 20–30% will progress to invasive cancer without definitive surgical treatment [2, 3], implying that as many as 70% of patients who have surgery for DCIS may not derive benefit and thus suffer from overtreatment [4].
The ability to recognize which pre-cancerous tumors are likely to progress to invasive cancer is of great importance because it would identify high-risk patients for surgical, pharmacological, and/or radiation treatment. In contrast, low-risk patients could be managed by watchful waiting, avoiding the unnecessary harms and side effects associated with these therapies [5]. Furthermore, selecting patients most at risk would facilitate reallocating healthcare resources to those who would benefit most from treatment.
Evolutionary mechanisms drive tumor progression [6]. The impairment of control mechanisms of genetic integrity [7] accelerates the accumulation of new genetic alterations in cancer cells [8]. The combination of these alterations in an increasing number of clones represents a critical factor in tumor progression, as these clones constitute the substrate upon which selection acts [9]. The identification of mutations and the level of genetic (and phenotypic) heterogeneity have been shown to be associated with the risk of tumor progression in other pre-cancers, like Barrett’s esophagus [10,11,12]. The higher the number of mutations and the greater the intratumor genetic heterogeneity, the higher the risk of developing clones that are cancerous, metastatic, and treatment-resistant [13,14,15,16,17].
It is challenging to integrate the combined effect of many mutations and genetic alterations that act simultaneously in cancer cells [18]. Investigating the number of mutations and the level of heterogeneity allows us to introduce a quantitative parameter independent of the functional consequences of specific combinations of mutations, serving as a surrogate measure of the degree of evolvability of the neoplastic cells [19, 20].
Both genetic and phenotypic heterogeneity can be measured by comparing different regions of the same tumor, ideally through analysis of longitudinal cohorts with linked clinical outcomes. Such studies often necessitate analysis of archival formalin-fixed paraffin-embedded (FFPE) samples, which is challenging due to partial degradation of the DNA, FFPE-induced artifacts, which manifest as sequence alterations, and low yield of nucleic acids from a limited number of sections. We recently published a workflow that overcomes these challenges, enabling the assessment of measures of genetic divergence between regions of the same tumor [21]. This work aimed to test the hypothesis that genetic and phenotypic heterogeneity within DCIS can predict the recurrence of DCIS and/or progression to invasive ductal carcinoma (IDC).
Materials and methods
Experimental design
We performed two observational studies (Fig. 1, Table 1) to study DCIS progression. In a cross-sectional study (Fig. 1A), we compared DCIS samples from patients with DCIS only (Pure DCIS, n = 58) versus DCIS samples from patients with synchronous invasive ductal carcinoma (Synchronous DCIS, n = 61). In a longitudinal case–control study (Fig. 1B), we collected samples from patients with primary DCIS who were treated and then followed until they were diagnosed with an IDC recurrence (Progressors, n = 56), were diagnosed with a DCIS-only recurrence (Recurrents, n = 69), or did not recur within their follow-up time (Nonrecurrents, n = 99, minimum five years). We calculated the median follow-up time using the reverse Kaplan–Meier method [22]. In both cohorts, we characterized the genotype and IHC phenotype of two DCIS regions per patient. All samples came from different FFPE blocks or were separated by at least 8 mm. For some progressors, we also obtained a subsequent IDC sample. The Institutional Review Board (IRB) of Duke University Medical Center approved this study, and a waiver of consent was obtained according to the approved protocol.
Schematic of the two study designs. A Cross-sectional study: Synchronous DCIS tumors are presumed to have evolved from pure DCIS that existed before the progression of the synchronous IDC. In patients with synchronous DCIS, only the DCIS component was sampled and assayed unless otherwise specified. B Longitudinal case–control study: pure-DCIS samples from patients treated and followed up for at least five years or until they progress or recur. n: number of patients per cohort
Clinical specimens
We classified breast tumors according to the World Health Organization (WHO) criteria [23]. We graded the IDC and DCIS samples according to the Nottingham grading system [24] or recommendations from the Consensus Conference on DCIS classification [25], respectively. ER status for each tumor was obtained from the clinical records based on the standard clinical assay previously performed on that tumor.
All samples were obtained from formalin-fixed paraffin-embedded (FFPE) breast tissue blocks. Cases from the cross-sectional studies were obtained from the Duke Pathology archives. Cases from the longitudinal study were obtained from Translational Breast Cancer Research Consortium (TBCRC) sites, a national multi-center consortium of cancer centers that treat breast cancer patients. All cases underwent detailed pathology review (by Dr Hall) for histologic features and case eligibility.
DNA extraction and sequencing
The DNA extraction, sequencing, and data processing protocol for the whole exome sequencing (WES) data has been previously reported [21]. For each neoplastic sample, we extracted the DNA from multiple serial archival FFPE tissue block sections. After macro-dissecting the areas of interest, we used the FFPE GeneRead DNA Kit, which incorporates enzymatic cleavage of DNA at uracil residues via uracil DNA glycosylase that reduces the problem of cytosine deamination in FFPE samples. To estimate the germline sequence, we also extracted DNA from either distant benign breast tissue or a benign lymph node. The study pathologist confirmed the presence of ≥ 70% neoplastic cells in the microdissected areas of neoplastic samples and their absence from control samples.
After DNA extraction, hybrid capture was performed using two targeted panels (all exons of a gene panel including 83 genes associated with breast cancer [26] and the human exome), and the multiplexed libraries were sequenced using either an Illumina HiSeq with 4-channel chemistry (cross-sectional study) or a NovaSeq 6000 machine with 2-channel chemistry (longitudinal study). After alignment to the Genome Reference Consortium Human Build 37 and marking duplicates, we obtained a mean de-duplicated depth of 115.9 ± 52.2 (SD). The resulting BAM files were the input data for our single nucleotide variant (SNV) calling and heterogeneity calculation pipeline. We discarded samples with less than 40% of the target covered at 40X. Sequencing was performed at the McDonnell Genome Institute at Washington University School of Medicine in St. Louis.
Additionally, we performed low-pass whole genome sequencing for the longitudinal study as previously described [27]. The resulting BAM files were the input data for the copy number alteration (CNA) characterization pipeline.
SNV characterization
We used our previously reported software ITHE [21] to calculate by-patient SNV burden and divergence, leveraging the two neoplastic geographically distant samples and a control sample from the same patient. In preparation for the current study, we recently developed, optimized, and validated this pipeline using 28 pairs of technical replicates (same extracted DNA, two aliquots were independently sequenced) of macrodissected FFPE DCIS samples [21]. The samples in both studies are very similar; thus, here, we used the filtering parameters we had found optimal previously [21]. ITHE was optimized for accurate divergence estimation and thus tries to maximize variant calling’s precision. We measured SNV divergence as the percentage of mutations detected in the union of the mutations from the two samples that are not shared by both samples. We required that the union set of mutations had at least five mutations to calculate divergence. SNV burden was calculated as the union of mutations in both samples. When comparing DCIS and IDC samples in the cross-sectional study, we report the mean of the two comparisons between one of the two DCIS samples and the IDC sample.
Functional analysis
We performed the functional enrichment analysis of genes that harbored non-synonymous SNV mutations with PANTHER [28] and DAVID [29, 30]. We corrected the fold enrichment p-values considering the false discovery rate (FDR).
CNA characterization
We followed our previously published protocols for low-pass whole genome sequencing (WGS) data processing and CNA calling [27]. Briefly, we used Nextflow-base’s Sarek pipeline to align the lpWGS data to the GRCh38/hg38 reference genome, marked duplicates, and re-calibrated quality scores. We used the resulting alignments to call autosomal CNA variants using QDNAseq [31] on 50-kb genomic bins after filtering genomic regions and reads for mappability and QC content while estimating ploidy and purity. We corrected the log2 ratio for the latter. CNAs with \(\left|corrected log2 ratio\right|>0.3\) were considered altered and normal otherwise. To maximize the robustness of our statistics, we measured CNA burden per sample as the proportion of the genome that was altered (over the total genome considered) and CNA divergence per patient as the proportion of the altered genome that is not shared between the two samples over the altered genome per patient (i.e.,\(CNA divergence = \frac{A\Delta B}{{A \cup B}}\), with A and B defined as the set of altered genomic regions of each homonymous sample, and Δ the set symmetric difference operator).
Immunohistochemistry characterization
We chose a series of 15 candidate proteins (Supplementary Table S1) that are known to be relevant to breast cancer or DCIS and its progression and to be heterogeneously expressed in subsets of these conditions. They represent several categories, including essential breast cancer drivers (ER, PR, HER2), immune-related (FOXP3, CD68), resource and microenvironmental measures (GLUT1, CA9, CD31, FASN), myoepithelial and basement membrane (TP63, COL15A1) and progenitor or stem cell-related (ALDH1 and RANK) markers. Additional proteins included the proliferation marker KI-67, the adhesion marker phospho-FAK, and COX2 (PTGS2), which were previously described as being associated with DCIS progression. In the longitudinal study, these were reduced to ER and GLUT1 only (Supplementary Tables S2-3), based on the results from the cross-sectional study and the paucity of samples. We measured stain intensity using pathologist visual scoring. For most markers, the study pathologist used a scoring system that captures the distribution of intensities in an immunohistochemistry (IHC) profile, while for five of them, it was based on the percentage of cells with any staining (Supplementary Table S1). The IHC profile was quantified as the percentage of the slide presenting different levels of increasing staining intensity: absence, low, medium, and high for 10 of them, absence and presence for the other five. Medium staining was deemed approximately twice as intense as low staining and high staining three times as intense as low staining.
We evaluated the IHC at three different scales of comparison:
-
1.
The average intensity of immunofluorescence across samples for each patient, measuring the typical intensity of IHC signal per patient.
-
2.
The variance of the intensity between samples for each patient, measuring the variations of IHC signal between distant locations in each patient.
-
3.
The variance of intensity within samples, measuring the variations of IHC signal at short distances in each patient.
These three measures are quantified by the Mean of Intensity Score, the Earth Mover’s Distance, and the Cumulative Density Index. Briefly, the Intensity Score is the weighted sum of the IHC profile proportions normalized by the maximum possible staining, the Earth Mover’s Distance represents the minimum movement of quantities between categories to turn one profile into another [32], and the Cumulative Density Index represents how close from a uniform distribution the observed profile is and ranges from 0 (all the profile weight in one of the extreme categories) to 1 (uniform profile). See a detailed description of these statistics in Supplementary Text 1.
Statistical analysis
Cohort characterization
For each study, we compared differences in the central tendency of genetic and phenotypic variables per patient between cohorts using the Kruskal–Wallis Rank Sum or the Mann–Whitney U tests for many or two cohorts, respectively. We followed the Kruskal–Wallis Rank Sum test with Dunn’s post-hoc test while controlling for multiple tests using the Holm-Šidák adjustment [33]. CNA divergence met the assumptions of a parametric test, and thus, we used an ANOVA followed by Tukey HSD post-hoc tests. In cases where we used multiple measurements per patient (CNA burden), we used a Mixed-effects ANOVA with different random effect intercepts per patient to account for data dependencies (on the square-root-transformed variable), followed by Tukey’s HSD on the estimated marginal means.
Distinguishing pure DCIS from synchronous DCIS
We performed variable selection among the phenotypic measurements with significant differences between cohorts using a Random Forest classification model [34] under the Gini impurity criterion to return the importance ranking of each feature given by their predictive power. We used the two top measurements to build a generalized linear logistic model. Similarly, we built a generalized linear logistic model with the genetic measurements that showed significant differences between cohorts and the combination of the three. Due to missing data, we compared the models under the Akaike information criterion (AIC) on the smallest dataset for all models [35].
Association with clinical outcomes
Using our longitudinal study, we determined whether genetic and phenotypic statistics were independently associated with the time to clinical outcome (non-invasive recurrence or progression) using Cox regression analyses after checking they met the proportional hazards assumption. Nonrecurrent patients were right-censored using their follow-up time, and progressors’ recurrence time was used as their time to clinical outcome. Recurrents were discarded when considering progression, and progressors were discarded when considering non-invasive recurrence. We also provide supplementary results in which the clinical outcomes are “any recurrence” and “progression without discarding recurrent patients.” In this case, recurrents were right-censored at the time of recurrence when considering progression, and otherwise, their recurrence time was used as their time to clinical outcome. We evaluated the statistical significance of Cox regressors using the Wald test. We used the proportional hazard regression model for one variable (SNV burden) to stratify patients into low and high SNV burden and plotted their event-free survival curves. We stratified using the risk relative to the patient with all variables (i.e., SNV burden here) set at the mean value (i.e., type = "risk", reference = "sample", in the predict.coxph function of the survival R package). We chose the threshold that maximized Youden’s J statistic [36] using the true outcomes. In all cases, we used the log-rank test to compare the survival trends of two or more groups.
We also integrated 18 clinical covariates (Supplementary Table S4) with our eight genetic and phenotypic measurements to model time to non-invasive recurrence and time to progression. We performed variable selection using Cox LASSO and chose the regularization parameter that minimized the partial-likelihood deviance via tenfold cross-validation. To reduce the stochasticity of the results, we performed this process 100 independent times per model and selected the variables that were selected in at least 90% of them. To reduce missingness, we performed mean imputation on the clinical covariates before variable selection. The selected variables were used to build the final Cox regression models using all patients with available (imputed) data for those variables. Alternatively, we selected patients with data for all covariates chosen without imputation. We used the final models to stratify patients as in the univariate proportional hazards regression above. In all cases, the model used to stratify patients and plot their event-free survival curves includes all the variables included in the forest plot. All variables were standardized to make hazard ratios (HRs) comparable, and thus, HRs are relative to a change of 1 standard deviation unless specified otherwise.
Results
Study cohorts
We investigated DCIS progression to invasive cancer using two independent observational studies with different patients: a cross-sectional study and a longitudinal study (Fig. 1, Table 1). In the cross-sectional study (Fig. 1A), we compared DCIS samples from patients with DCIS only (Pure DCIS, n = 58) versus DCIS samples from patients with synchronous DCIS with invasive ductal carcinoma (Synchronous DCIS, n = 61). In the separate longitudinal study (Fig. 1B), we compared pure DCIS samples from patients who were treated and had long-term follow-up (median = 117 months, 95% CI [104, 132]). This cohort consisted of patients who progressed to IDC (progressors) (n = 56), patients who had a DCIS-only recurrence (recurrents, n = 69), or patients who did not recur during the follow-up interval (nonrecurrents, n = 99). In both studies, we characterized the genotype and phenotype of two formalin-fixed paraffin-embedded DCIS samples per patient, enabling measures of evolutionary divergence (see Methods). We also obtained a single sample of their IDC recurrence for some progressors.
Cross-sectional study
Single nucleotide mutational burden
Pure DCIS carried fewer SNVs per patient (mean 7.5 ± 10.6 standard deviation) than synchronous DCIS (10.4 ± 15.3), but this difference was not statistically significant (Fig. 2A).
Cross-sectional SNV burden and divergence. Distribution of the number of SNVs per patient in the two cross-sectional cohorts A and the two lesion types (DCIS vs. IDC) present in the synchronous cohort B. Distribution of SNV genetic divergence (percentage of private mutations) per patient in the two cross-sectional cohorts C. We calculated divergence for tumors with at least five mutations in the union of the two samples, which explains the lower number of tumors per group. P-values shown if p ≤ 0.1, A, C Mann–Whitney U, B Paired-samples sign test. Interquartile range (vertical line) and median (point) in burgundy, N: number of patients
The invasive component in synchronous DCIS patients showed a statistically significantly increased number of SNVs (18.1 ± 31.5, Fig. 2B) compared with their DCIS counterpart (Paired-samples sign test, p = 0.04) largely due to four cases of IDC with a dramatic increase in mutation burden.
SNV genetic divergence
We measured the SNV genetic divergence as the percentage of mutations that are private to either sample from the same tumor. This measures the accumulation of different mutations since the cells in the samples shared a common ancestor. Synchronous DCIS showed higher genetic divergence (21.5% ± 17.5%) than pure DCIS (10.8% ± 17.4%, Fig. 2C) (Mann–Whitney U test, p = 0.009). Additionally, we also characterized the genetic divergence between the two synchronous components (i.e., DCIS vs. IDC in synchronous patients) (44.5% ± 29.0%), which is higher than the paired synchronous DCIS divergence (Supplementary Fig. S1, Paired-samples sign test, p = 0.002).
Phenotypic characterization
Synchronous DCIS samples presented higher levels of GLUT1 staining (p = 0.004) and lower levels of CA9 staining (p = 0.01) than pure DCIS samples (Fig. 3A, pairwise Mann–Whitney U tests of mean intensity scores [MIS], unadjusted p values); all other markers showed non-significant differences between groups. This result holds when one of the two DCIS samples per patient is used randomly instead of the MIS (Supplementary Fig. S2).
Cross-sectional phenotypic characterization and divergence. Distribution of mean intensity scores (MIS) per patient (see Methods) A, between-sample divergence (B, Earth Mover’s Distance [EMD]) and within-sample divergence (C, Cumulative Density Index [CDI]). A for each patient and IHC marker, B and C only markers with significant differences between cohorts (unadjusted p-values). Unadjusted pairwise Mann–Whitney U p-values shown if p ≤ 0.1. Interquartile range (vertical line) and median (point) in burgundy. N: number of patients
Phenotypic divergence
We characterized the between-sample phenotypic divergence for each marker using a distance between staining intensity profiles (Earth Mover’s Distance) and the within-sample divergence using a measure of staining intensity uniformity (Cumulative Density Index; see Supplementary Methods for detailed definition of these indices).
Multiple markers presented differences in between-sample divergence between pure DCIS and synchronous DCIS samples, with the latter showing increased divergence for GLUT1 (p = 0.01), FOXP3 (p = 0.01), and HER2 (p = 0.04) staining, but decreased divergence of ER (p = 0.01) staining (Fig. 3B, Supplementary Fig. S3, Pairwise Mann–Whitney U tests, unadjusted p values). This reduction of ER phenotypic divergence in synchronous DCIS samples was replicated in the within-sample measures (p = 0.01) and mimicked by CA9 (p = 0.01) (Fig. 3C, Supplementary Fig. S4, Pairwise Mann–Whitney U tests, unadjusted p values). A reduction in the phenotypic divergence for ER in synchronous DCIS samples indicates larger uniformity across and within samples, while the mean intensity of ER signal is not markedly different (Fig. 3A).
Distinguishing pure DCIS from synchronous DCIS
All eight significant phenotypic divergence features—MIS for GLUT1 and CA9 (Fig. 3A), EMD for GLUT1, FOXP3, ER and HER2 (Fig. 3B), and CDI for ER and CA9 (Fig. 3C)—were combined in a mixed logistic regression to model the progression status of the samples, from which the most important features were selected according to their relative predictive power. A reduced logistic model including between-sample diversity (EMD) for GLUT1 and within-sample diversity (CID) for ER had statistically significant coefficients (GLUT1 EMD, p = 0.01; ER CDI, p = 0.01) and included 40 pure DCIS cases and 52 synchronous DCIS cases. Therefore, we selected these two IHC markers (GLUT1 and ER) as the targets for phenotypic divergence to be included in the longitudinal study.
Logistic regression showed that the only statistically significant genetic measurement (SNV divergence) was strongly associated with the cohort, with p = 0.0136, so it was also selected for evaluation in the longitudinal study.
Longitudinal study: associations with recurrence and progression
We used the cross-sectional cohort as a discovery cohort, using the synchronous DCIS as a proxy for high-risk DCIS likely to progress to IDC. Samples in our validation cohorts come from patients with pure DCIS with known outcomes (nonrecurrent, recurred as DCIS, progressed to IDC) and were obtained before treatment (Fig. 1B). We sequenced the exomes of two regions of each baseline DCIS in the longitudinal cohorts, mirroring the methods for the cross-sectional cohorts, and also performed low-pass whole genome sequencing data for most samples.
Mutational burden
Primary DCIS tissue from nonrecurrent patients carried the fewest SNVs (13.4 ± 18.2), followed by that of recurrent patients (19.2 ± 26.4) and progressors (39.7 ± 46.2). These relationships between cohorts were mirrored by the CNA alteration burden (nonrecurrents: 15.9% ± 15.0% genome altered, recurrents: 17.3% ± 14.8%, progressors: 24.6% ± 17.1%) but presented higher p-values. Thus, SNV burden shows statistically significant differences between nonrecurrents and progressors (p = 0.003) and between recurrents and progressors (p = 0.05, Dunn’s test corrected for multiple tests with the Holm-Šidák adjustment) (Fig. 4A). In contrast, CNA burden was significantly different only between nonrecurrents and progressors (p = 0.03, Tukey HSD) (Fig. 4B).
Longitudinal mutational burden and divergence. Distribution of SNV (A, C) and CNA (B, D) mutational burdens (A, B) and divergences (C, D) in the three longitudinal cohorts (Nonrec: nonrecurrents, Rec: recurrents, Prog: progressors). A: number of SNVs per patient; Omnibus test: Kruskal–Wallis Rank Sum, Post-hoc test: Dunn’s test with control for multiple tests using the Holm-Šidák adjustment. B: proportion of genome with copy number alterations per sample; Omnibus test: Mixed-effects ANOVA on the square-root-transformed proportion of genome altered, Post-hoc test: Tukey HSD on estimated marginal means. C: percentage of private SNV mutations per patient; Omnibus test: Kruskal–Wallis Rank Sum. D: percentage of the genome with copy number alterations private to either sample per patient; Omnibus test: ANOVA, Post-hoc test: Tukey HSD. P-values shown if adjusted p ≤ 0.1. Interquartile range (vertical line) and median (point) in burgundy, N: number of data points (A, C, and D: patients, B: samples). We only calculated divergence for tumors with at least five mutations in the union of the two samples, which explains the lower number of tumors in C
Genetic divergence
Similar to SNV divergence, we measured CNA divergence as the percentage of the altered genome that is private to either sample per patient. SNV divergence was highest in recurrent patients but not statistically different between cohorts (nonrecurrents: 17.0% ± 13.8%, recurrents: 28.2% ± 25.5%, progressors: 18.4% ± 19.3%, Fig. 4C). In contrast, CNA divergence followed a decreasing pattern of divergence with progression (Fig. 4D), by which nonrecurrents were the most divergent (77.4% ± 16.4%), followed by recurrents (67.7% ± 23.4%) and progressors (63.7% ± 21.7%). Only progressors and nonrecurrents showed statistically significant differences in CNA divergence in pairwise comparisons (Fig. 7B, p = 0.03, Tukey HSD).
Functional analysis of non-synonymous SNV mutations
The functional analyses highlighted significant differences between the three cohorts. According to DAVID, recurrent patients showed enrichment of mutated genes involved in taste reception (TAS2R30, TAS2R31, TAS2R43, and TAS2R46), while progressors showed enrichment of genes typically mutated in cancers such as endometrial, small cell lung, prostate, and breast cancer, glioma and melanoma (PIK3CA, ERBB2, PTEN, AKT1, PIK3R2, TP53, PIK3CG), and genes involved in the determination of cell shape, arrangement of transmembrane proteins, and organization of organelles (SPTA1, SPTBN5, DST, SPTAN1). Nonrecurrents did not show significant functional enrichment (Supplementary Table S5). In addition, PANTHER functional analysis revealed an enrichment of several pathways only in progressors (Supplementary Table S6), such as Hypoxia response via HIF activation (p < 0.001, false discovery rate correction herein this section), Insulin/IGF pathway-protein kinase B signaling cascade (p < 0.001), p53 pathway (p = 0.003), Endothelin signaling pathway (p = 0.003), Hedgehog signaling pathway (p = 0.02), and PI3 kinase pathway (p = 0.03).
Phenotypic characterization and divergence
We characterized the DCIS phenotypes of the three cohorts using the immunohistochemical profiles of the two markers that showed the highest discriminating power in the cross-sectional study, ER and GLUT1. GLUT1 intensity was different between longitudinal cohorts (Fig. 5A, p = 0.04, Kruskal–Wallis Rank Sum), like in the cross-sectional study (Fig. 3A). Progressors had a generally higher intensity than nonprogressor cohorts, but the pairwise differences were not statistically significant (vs. nonrecurrents p = 0.06, vs. recurrents p = 0.07). ER intensity (Fig. 5B) was higher in ER + progressors (p = 0.02) and recurrents (p = 0.03) than in nonrecurrents (Dunn’s test corrected for multiple tests with the Holm-Šidák adjustment). This pattern was not found in the cross-sectional study, and the difference between progressors and nonrecurrents is robust to ER status stratification (Supplementary Fig. S5).
Longitudinal phenotypic characterization. Distribution of mean normalized intensities (MIS) per patient (see Methods) in the three longitudinal cohorts (Nonrec: nonrecurrents, Rec: recurrents, Prog: progressors). A: GLUT1 marker, B: ER marker in ER + patients only (where ER status was taken from the clinical records). Omnibus test: Kruskal–Wallis Rank Sum, Post-hoc test: Dunn’s test with control for multiple tests using the Holm-Šidák adjustment. P-values shown if adjusted p ≤ 0.1. Interquartile range (vertical line) and median (point) in burgundy. N: number of patients
We assessed the phenotypic divergence for these two markers like in the cross-sectional study, evaluating ER within-sample divergence and GLUT1 between-sample divergence. Neither showed a statistically significant difference between longitudinal cohorts (Supplementary Fig. S6).
Association with clinical outcomes
We tested if our genetic and phenotypic markers were independently associated with the time to clinical outcome using Cox regression analyses. Our primary clinical outcomes were non-invasive recurrence and progression. Additionally, we considered any recurrence (including progression) and progression without discarding non-invasive recurrents. Results for these alternative outcomes can be found in the supplementary materials (Supplementary Figs. S7–S8, S10, Supplementary Tables S7-S8).
Time to non-invasive recurrence was associated with divergences: SNV (p = 0.024), within-sample ER (p = 0.026), and CNA (p = 0.038), while time to progression was primarily associated with totals: SNV burden (p < 0.0001), ER intensity (p = 0.025), GLUT1 intensity (p = 0.027), and CNA burden (p = 0.045), but also CNA divergence (p = 0.025) (Supplementary Tables S9–S10, Wald test). The association between SNV burden and progression was the only one that survived multiple-test correction (Supplementary Tables S7-S8, progression adjusted p < 0.0001, Holm correction). We show the capability of this genetic measurement to stratify patients by their outcome-free survival (non-invasive-recurrence-free, Fig. 6A; progression, Fig. 6B) by splitting patients into low and high SNV burden and comparing their survival curves. They were different in both cases, with median times to event that differed in 100 months for non-invasive recurrence (Fig. 6A, p = 0.026) and 57 months for progression (Fig. 6B, p < 0.0001, Log-rank test).
Event-free survival curves of patients stratified by SNV burden. Kaplan–Meier plots of stratified patients. A Non-invasive-recurrence-free survival. B Progression-free survival. SNV burden thresholds maximize Youden’s J statistic of the outcomes (17 SNVs for non-invasive recurrence and 21 for progression). Log-rank test. The table below the Kaplan–Meier plot shows the number of samples at risk at different times
Finally, we integrated 18 clinical covariates (Supplementary Table S4) with our genetic and phenotypic measurements to develop comprehensive models of DCIS non-invasive recurrence and progression. We built proportional hazard regressions with variables selected using LASSO. The model for non-invasive recurrence (Fig. 7A) contained three significant variables: treatment option (p < 0.001), ER status (p = 0.003), and SNV divergence (p = 0.018, Wald test). The model for progression (Fig. 7C) contained two: surgical margin (p = 0.017) and SNV burden (p = 0.004, Wald test). The event-free survival curves of patients stratified using their relative risk were highly significant, with median times to event that differed between groups in 123 months for non-invasive recurrence (Fig. 7B) and > 69 months for progression (Fig. 7D). An alternative parameterization of the surgical margin as a 2 mm threshold showed very similar results (Supplementary Fig. S9, p = 0.048, Wald test). The associations with the treatment option and ER status were repeatable without using covariate imputation (Supplementary Fig. S10, while the surgical margin association was only robust when not excluding recurrent patients (Supplementary Figs. S11-S12). No other significant variables in these models were imputed.
Associations with time to clinical outcome. Forest plots describing proportional hazard regressions using variables selected with LASSO (A, C) and corresponding Kaplan–Meier plots of patients stratified by the relative risk threshold that maximizes Youden’s J statistic of the outcomes (B, D). A and B Non-invasive-recurrence-free survival. C and D Progression-free survival. Hazard Ratios (second column, A, C) are relative to 1 standard deviation. Lumpectomy Only is compared to Lumpectomy + Radiation and Mastectomy and ER+ is compared to ER-. No microcalc(ification)s is compared to having microcalcifications in DCIS-only and/or benign ducts. Tables below Kaplan–Meier plots show the number of samples at risk at different times. Log-rank test
Discussion
Evolutionary measurements summarize the results of complex evolutionary dynamics, and equivalent observations may result from very different evolutionary scenarios. For example, both a low mutation rate under neutral evolution and a recent selective sweep can generate low divergence between samples. Divergence also has multiple scales, and evolutionary processes may affect them differently. For example, clonal expansion may reduce within-sample divergence but increase between-sample divergence. Intra-tumor heterogeneity provides the fuel for natural selection, but it is not clear what form of intra-tumor heterogeneity (genetic, epigenetic, or phenotypic) is most relevant to the clinical outcomes of a particular tumor, and it is not clear how best to measure it [19].
The improved efficacy of preventive screenings provided the ability to identify many tumors in the earliest phases of their evolution, demanding the development of new approaches to stratify the risk to these patients to avoid over- and undertreatment. However, every neoplasm develops a unique set of alterations [19], making it unlikely that any set of molecular markers will be universally applicable, even within a given cancer type. In contrast, measures of the evolvability of a neoplasm may predict neoplastic progression in many different types of cancers and pre-cancers [10, 11, 16]. By taking two spatially distinct samples for each primary pre-cancer, we measured genetic and phenotypic divergence within and between samples, and their relationship with two key clinical outcomes: (1) recurrence of precancer following treatment and (2) progression of precancer to invasive cancer.
DCIS recurrence and progression are different biological processes
Based on our results, progression from DCIS to invasive breast cancer appears to be a qualitatively and biologically different process from recurrence of DCIS. We had assumed that progression to invasion first requires recurrence of the DCIS and so expected that the factors that predicted recurrence would also predict progression. We were surprised that there was no overlap in their multivariate models (Fig. 7).
SNV burden (measured with our software ITHE [21]) had the strongest associations with DCIS outcomes among all the variables we studied. LpWGS CNA burden from the same samples corroborated most of those results (with higher p values). Evolutionarily, these increases in mutation burden may result from an increase in mutation rate, evolutionary time, or self-renewing cell population size. However, due to limitations in detecting variants at low allele frequency, measured mutation burdens are most likely the result of early increases in mutation rates or the selective evolutionary forces that drive clonal expansion [37, 38]. This bias towards detecting high allele frequency mutations is especially true when using ITHE since, by design, it maximizes specificity in exchange for a lower sensitivity for low-frequency mutations. For these reasons, we do not necessarily expect SNV burden measured differently to show the associations found here.
The best multivariate model of progression included SNV burden and the size of the surgical margin as significant predictors. Previous studies have shown that surgical margins are clinically important in reducing the risk of ipsilateral breast tumor recurrence after breast-conserving surgery [39, 40]. Positive margins (i.e., DCIS at the edge of the resected tissue) clearly increase recurrence risk, but they were excluded from our study. Instead, we analyzed how the size of the negative margins associate with the clinical outcome. The evidence for this association is mixed in the literature [39, 41], but current consensus guidelines consider margins > 2 mm adequate. Notably, these studies do not typically differentiate recurrence of DCIS from progression to invasive disease in their endpoints, as we did here. We found that the size of the surgical margins was one of the strongest predictors of progression but was not a statistically significant predictor of recurrence with DCIS (neither in the selected multivariate model nor in isolation). This may be a false negative result, but even if such an association exists, it is likely to be weaker than that observed for progression. We hypothesize that a micro-invasive phenotype could reduce the probability of obtaining large surgical margins, or a phenotype that makes DCIS cells more able to survive in isolation could allow small clusters of cells left over during surgical treatment to survive and further progress to invasive disease more readily. This finding highlights the importance of segregating non-invasive recurrence from progression and how associations with recurrence (of any kind) are a mix of the associations with non-invasive recurrence and progression (Supplementary Fig. S8A). The fact that we found weaker results using the consensus guideline > 2 mm threshold instead of the margin measurements shows there is prognostic information in the size of the surgical margin.
In contrast, time to non-invasive recurrence was associated with the extent of genetic divergence of SNVs between the two assayed regions of DCIS. We could not corroborate this finding with CNA divergence, which followed the opposite trend but was also correlated to time to recurrence in the univariate models. Given large enough mutational burdens, the true (i.e., known without error) genetic divergence measured using different mutation types should yield equivalent results. Several estimation biases may explain this discordance we found between SNV and CNA divergences. A low CNA burden may increase the estimated divergence due to a higher false positive rate in the segmentation process without a broad range of true relative intensity values. In fact, CNA burden and CNA divergence were moderately anticorrelated across the study (p = − 0.36, p < 0.001), and this anticorrelation was driven by the cohort with the lowest CNA burden. High within-sample heterogeneity is also expected to reduce the accuracy of between-sample divergence estimates and lead to the underestimation of the mutation burden. Low SNV burden also leads to missing data in SNV divergence estimates since divergence cannot be calculated accurately with few alterations. ER divergence followed the same direction as CNA divergence, but SNV divergence followed the opposite trend. These divergences were the only three measurements associated with time to non-invasive recurrence in the univariate analyses (Supplementary Table S9). Non-invasive recurrence is associated exclusively with divergence statistics, while progression was primarily associated with totals (SNV burden and mean GLUT1 intensity). Intratumor heterogeneity can arise from an increase in the amount of evolution (i.e., evolutionary rate * evolutionary time, by the same mechanisms mentioned when discussing mutation burden) but also with diversifying selection, and we have previously associated it with poor prognosis in other pre-cancers [10].
Non-invasive recurrence was also associated with the type of DCIS treatment and estrogen-negative status. The fact that patients treated with lumpectomy alone were more likely to recur than those treated with lumpectomy and radiation or mastectomy has been well described. Adjuvant radiation therapy has been previously shown to reduce the risk of recurrence [42]. After mastectomy, patients are no longer screened using mammograms, making it unlikely to detect asymptomatic noninvasive recurrences. The association between recurrence and ER status may be unsurprising since patients with ER + breast cancers have better prognoses than ER− ones [43, 44]. However, its association with DCIS recurrence is unclear [45,46,47], and the balance of evidence points against it [48]. As for surgical margins, most studies are limited by not differentiating between recurrence and progression endpoints. At least one of the studies that made this distinction [45] found a decrease in non-invasive recurrences, but not in invasive ones, among ER+ patients. This finding is consistent with our results. Different endpoints may partially explain the mixed evidence on the association between ER and DCIS recurrence and progression.
Functional genetic analysis also showed a difference between the three cohorts, particularly between those DCIS that recurred compared to those that progressed. DCIS that will recur without invasion shows enrichment of mutations in genes involved in the TAS2R signaling network. The activation of these genes determines a pro-apoptotic, anti-proliferative, and anti-migratory response action in highly metastatic breast cancer cell lines [49]. These genes also appear to be involved in the regulation of apoptosis in head and neck squamous cell carcinoma, and their impairment could favor the survival of cancer cells [50]. On the other hand, DCIS that will progress to invasion demonstrates a broader variety of biological processes and pathways involved, such as hypoxia response, insulin/IGF, endothelin, hedgehog, p53, and PI3 kinase signaling pathways. These biological processes are typically altered in various types of cancer and also show an enrichment of mutations in genes involved in the reorganization of the cytoskeleton. The ability to metastasize outside the mammary gland and to relapse observed in these patients is supported by mutations in those pathways.
Synchronous DCIS is not a good model for DCIS progression
Cross-sectional studies are much less resource-intensive, faster, and simpler to conduct than longitudinal cohort studies. If synchronous DCIS (adjacent to IDC) was a good model for primary DCIS that later progressed to IDC, cross-sectional studies could be more readily employed as relevant surrogates for cancer progression. However, our results show this is not possible for our purpose, and in fact, synchronous DCIS shares more similarities with DCIS that will recur as DCIS than with DCIS that will progress.
The pure DCIS samples in our cross-sectional study are equivalent to a mixture of samples from the three cohorts in our longitudinal study since their future outcomes are not considered. Thus, characteristics associated with clinical outcomes are expected to be mixed in the cross-sectional study. The increased divergence we found in the DCIS adjacent to IDC may result from divergent evolution facilitated by longer evolutionary times, the interaction with IDC, or an intrinsic characteristic of early-progression DCIS. If we assume that IDC originates from DCIS (stepwise progression model), synchronous DCIS samples are (on average) evolutionarily older than pure DCIS samples and, therefore, represent a later evolutionary stage than samples from either study. In this case, the cross-sectional study would reveal differences between early and late DCIS. Alternatively, if we assume that an early progression model is also possible (i.e., born to be bad [51]), synchronous DCIS would be enriched with this DCIS sub-type. In this case, the cross-sectional study would show evolutionary characteristics that distinguish those DCIS fated for invasive progression. Additionally, the presence of IDC near synchronous DCIS may also alter its characteristics, modifying its environment systemically (e.g., immune response) and locally (e.g., microenvironment and cell composition through cell migration).
The higher between-sample genetic divergence we found in synchronous DCIS compared to pure DCIS aligns better with stepwise DCIS progression, in which late DCIS would have had more evolutionary time to undergo divergent evolution. Under the early progression model, this may be an intrinsic characteristic of such a DCIS subtype that could facilitate the rapid invasion of nearby tissues. Most (75%) markers with significantly different between-sample divergences showed higher divergence in synchronous DCIS, and all markers with significantly different within-sample divergences showed the opposite trend. These results are concordant with the genetic results and our expectations under a stepwise progression model but did not survive multiple-test correction.
Integrating the results with clonal evolution in neoplastic progression
The two observational studies we conducted here are complementary and together improve our understanding of the evolutionary process leading to DCIS progression and recurrence. We find that primary DCIS that will progress to IDC is more genetically and phenotypically evolved, with higher SNV and CNA burden and more aggressive phenotypes, both metabolically and with respect to its estrogen sensitivity. At least one selective sweep is likely a part of their evolutionary history, which would reduce genetic divergence in the tumor. Higher cell motility could also reduce between-sample heterogeneity. Surgical margins show the strongest association with progression, suggesting that there may be features of the growth pattern of these lesions that make it more difficult to completely excise surgically. In contrast, DCIS recurrence may be primarily enabled by suboptimal clinical management. The few evolutionary features associated with DCIS recurrence suggest an increased accumulation of evolutionary changes in those lesions compared to those that do not recur, which nevertheless do not attain the degree of divergence necessary for invasive progression. In aggregate, the evolutionary history of DCIS recurrences may lack the strong selective sweeps that may be necessary conditions to invade other tissues successfully. DCIS adjacent to IDC shows increased divergence, which may result from divergent evolution facilitated by longer evolutionary times, the interaction with IDC, or an intrinsic characteristic of early-progression DCIS (i.e., born to be bad).
Conclusions
The evolutionary and clinical measures that predict the recurrence of DCIS differ from those that predict progression to IDC. Furthermore, DCIS adjacent to concurrent invasive cancer appears to be distinct from DCIS that will progress to invasive cancer over time. These findings suggest that the biological dynamics that make DCIS likely to recur differ from those that make it likely to progress, and those dynamics interact differently with our clinical interventions. These insights have the potential to improve both risk stratification and individualized patient management for high-risk DCIS.
Availability of data and materials
All the sequencing data used in this manuscript is publicly available. The cross-sectional WES data at SRA and dbGAP with IDs (SRP298346 and XX) and the longitudinal WGS and WES data on the HTAN data portal (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002371.v3.p1). Scripts to reproduce most data pre-processing and statistical analysis can be found at https://github.com/adamallo/ManuscriptScripts_DCISRecurrenceVsProgression.
Abbreviations
- CDI:
-
Cumulative density index
- CNA:
-
Copy number alteration
- DCIS:
-
Ductal carcinoma in situ
- EMD:
-
Earth mover’s distance
- ER:
-
Estrogen receptor
- FDR:
-
False discovery rate
- FFPE:
-
Formalin-fixed paraffin-embedded
- HR:
-
Hazard ratio
- IDC:
-
Invasive ductal carcinoma
- IHC:
-
Immunohistochemistry
- MIS:
-
Mean intensity score
- WES:
-
Whole exome sequencing
- WGS:
-
Whole genome sequencing
References
Rosenberg RD, Seidenwurm D. Optimizing breast cancer screening programs: experience and structures. Radiology. 2019;292:297–8.
Ozanne EM, Shieh Y, Barnes J, Bouzan C, Hwang ES, Esserman LJ. Characterizing the impact of 25 years of DCIS treatment. Breast Cancer Res Treat. 2011;129:165–73.
Maxwell AJ, Clements K, Hilton B, Dodwell DJ, Evans A, Kearins O, et al. Risk factors for the development of invasive cancer in unresected ductal carcinoma in situ. Eur J Surg Oncol. 2018;44:429–35.
Esserman LJ, Thompson IM, Reid B, Nelson P, Ransohoff DF, Welch HG, et al. Addressing overdiagnosis and overtreatment in cancer: a prescription for change. Lancet Oncol. 2014;15:e234–42.
Hwang ES, Hyslop T, Lynch T, Frank E, Pinto D, Basila D, et al. The COMET (Comparison of operative versus monitoring and endocrine therapy) trial: a phase III randomised controlled clinical trial for low-risk ductal carcinoma in situ (DCIS). BMJ Open. 2019;9: e026797.
Merlo LMF, Pepper JW, Reid BJ, Maley CC. Cancer as an evolutionary and ecological process. Nat Rev Cancer. 2006;6:924–35.
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74.
Takeshima H, Ushijima T. Accumulation of genetic and epigenetic alterations in normal cells and cancer risk. NPJ Precis Oncol. 2019;3:7.
Fortunato A, Boddy A, Mallo D, Aktipis A, Maley CC, Pepper JW. Natural selection in cancer biology: from molecular snowflakes to trait hallmarks. Cold Spring Harb Perspect Med. 2017;7:a029652.
Maley CC, Galipeau PC, Finley JC, Wongsurawat VJ, Li X, Sanchez CA, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat Genet. 2006;38:468–73.
Merlo LMF, Shah NA, Li X, Blount PL, Vaughan TL, Reid BJ, et al. A comprehensive survey of clonal diversity measures in Barrett’s esophagus as biomarkers of progression to esophageal adenocarcinoma. Cancer Prev Res. 2010;3:1388–97.
Martinez P, Timmer MR, Lau CT, Calpe S, Sancho-Serra MDC, Straub D, et al. Dynamic clonal equilibrium and predetermined cancer risk in Barrett’s oesophagus. Nat Commun. 2016;7:12158.
Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta. 2010;1805:105–17.
Bedard PL, Hansen AR, Ratain MJ, Siu LL. Tumour heterogeneity in the clinic. Nature. 2013;501:355–64.
McGranahan N, Swanton C. Biological and therapeutic impact of intratumor heterogeneity in cancer evolution. Cancer Cell. 2015;27:15–26.
Andor N, Graham TA, Jansen M, Xia LC, Aktipis CA, Petritsch C, et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat Med. 2015;22:105–13.
Morris LGT, Riaz N, Desrichard A, Şenbabaoğlu Y, Hakimi AA, Makarov V, et al. Pan-cancer analysis of intratumor heterogeneity as a prognostic determinant of survival. Oncotarget. 2016;7:10051–63.
Dash S, Kinney NA, Varghese RT, Garner HR, Feng W-C, Anandakrishnan R. Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutations. Sci Rep. 2019;9:1005.
Maley CC, Aktipis A, Graham TA, Sottoriva A, Boddy AM, Janiszewska M, et al. Classifying the evolutionary and ecological features of neoplasms. Nat Rev Cancer. 2017;17:605–19.
Dentro SC, Leshchiner I, Haase K, Tarabichi M, Wintersinger J, Deshwar AG, et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell. 2021;184:2239-2254.e39.
Fortunato A, Mallo D, Rupp SM, King LM, Hardman T, Lo JY, et al. A new method to accurately identify single nucleotide variants using small FFPE breast samples. Brief Bioinform. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbab221.
Schemper M, Smith TL. A note on quantifying follow-up in studies of failure time. Control Clin Trials. 1996;17:343–6.
Tan PH, Ellis I, Allison K, Brogi E, Fox SB, Lakhani S, et al. The 2019 WHO classification of tumours of the breast. Histopathology. 2020;77(2):181–5.
Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. C. W. Elston & I. O. Ellis. Histopathology 1991;19; 403–410. AUTHOR COMMENTARY. Histopathology. 2002. pp. 151–151.
Consensus conference on the classification of ductal carcinoma in situ. Hum Pathol. 1997;28:1221–5.
Miller CA, Gindin Y, Lu C, Griffith OL, Griffith M, Shen D, et al. Aromatase inhibition remodels the clonal architecture of estrogen-receptor-positive breast cancers. Nat Commun. 2016;7:12498.
Strand SH, Rivero-Gutiérrez B, Houlahan KE, Seoane JA, King LM, Risom T, et al. Molecular classification and biomarkers of clinical outcome in breast ductal carcinoma in situ: analysis of TBCRC 038 and RAHBT cohorts. Cancer Cell. 2022;40:1521-1536.e7.
Mi H, Muruganujan A, Huang X, Ebert D, Mills C, Guo X, et al. Protocol update for large-scale genome and gene function analysis with the PANTHER classification system (v.14.0). Nat Protoc. 2019;14:703–21.
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4:44–57.
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50:W216–21.
Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 2014;24:2022–32.
Rubner Y, Tomasi C, Guibas LJ. A metric for distributions with applications to image databases. In: Sixth international conference on computer vision (IEEE Cat No98CH36271). Narosa Publishing House; 2002.
Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6:65–70.
Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A. Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med. 2012;51:74–81.
Akaike H. Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki BF, editors. Second international symposium on information theory. Academiai Kiado: Budapest; 1973. p. 267–81.
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–5.
Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat Genet. 2016;48:238–44.
Williams MJ, Werner B, Heide T, Curtis C, Barnes CP, Sottoriva A, et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat Genet. 2018;50:895–903.
Wang S-Y, Chu H, Shamliyan T, Jalal H, Kuntz KM, Kane RL, et al. Network meta-analysis of margin threshold for women with ductal carcinoma in situ. J Natl Cancer Inst. 2012;104:507–16.
Dunne C, Burke JP, Morrow M, Kell MR. Effect of margin status on local recurrence after breast conservation and radiation therapy for ductal carcinoma in situ. J Clin Oncol. 2009;27:1615–20.
Marinovich ML, Azizi L, Macaskill P, Irwig L, Morrow M, Solin LJ, et al. The association of surgical margins and local recurrence in women with ductal carcinoma in situ treated with breast-conserving therapy: a meta-analysis. Ann Surg Oncol. 2016;23:3811–21.
Garg PK, Jakhetiya A, Pandey R, Chishi N, Pandey D. Adjuvant radiotherapy versus observation following lumpectomy in ductal carcinoma in-situ: a meta-analysis of randomized controlled trials. Breast J. 2018;24:233–9.
Carey LA, Perou CM, Livasy CA, Dressler LG, Cowan D, Conway K, et al. Race, breast cancer subtypes, and survival in the Carolina breast cancer study. JAMA. 2006;295:2492–502.
Jayasekara H, MacInnis RJ, Chamberlain JA, Dite GS, Leoce NM, Dowty JG, et al. Mortality after breast cancer as a function of time since diagnosis by estrogen receptor status and age at diagnosis. Int J Cancer. 2019;145:3207–17.
Kerlikowske K, Molinaro AM, Gauthier ML, Berman HK, Waldman F, Bennington J, et al. Biomarker expression and risk of subsequent tumors after initial ductal carcinoma in situ diagnosis. J Natl Cancer Inst. 2010;102:627–37.
Zabrocka E, Stessin A. Risk of recurrence and patterns of failure in estrogen receptor-negative ductal carcinoma in situ patients. Int J Radiat Oncol Biol Phys. 2021;111: e204.
Miligy IM, Toss MS, Shiino S, Oni G, Syed BM, Khout H, et al. Correction: the clinical significance of oestrogen receptor expression in breast ductal carcinoma in situ. Br J Cancer. 2021;124:856.
Lari SA, Kuerer HM. Biological markers in DCIS and risk of breast recurrence: a systematic review. J Cancer. 2011;2:232–61.
Singh N, Shaik FA, Myal Y, Chelikani P. Chemosensory bitter taste receptors T2R4 and T2R14 activation attenuates proliferation and migration of breast cancer cells. Mol Cell Biochem. 2020;465:199–214.
Carey RM, McMahon DB, Miller ZA, Kim T, Rajasekaran K, Gopallawa I, et al. T2R bitter taste receptors regulate apoptosis and may be associated with survival in head and neck squamous cell carcinoma. Mol Oncol. 2022;16:1474–92.
Ryser MD, Min B-H, Siegmund KD, Shibata D. Spatial mutation patterns as markers of early colorectal tumor cell mobility. Proc Natl Acad Sci U S A. 2018;115:5774–9.
Jennewein DM, Lee J, Kurtz C, Dizon W, Shaeffer I, Chapman A, et al. The Sol Supercomputer at Arizona State University. Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good. New York, NY, USA: Association for Computing Machinery; 2023. pp. 296–301.
Acknowledgements
We thank the Research Computing at Arizona State University for providing HPC [52] and storage resources that have contributed to the research results reported here. The findings, opinions, and recommendations expressed here are those of the authors and not necessarily those of the universities where the research was performed or the National Institutes of Health.
Funding
This work is supported in part by NIH grants U54 CA217376, U2C CA233254, R21 CA257980, and R01 CA140657, as well as CDMRP Breast Cancer Research Program Award BC132057 and the Arizona Biomedical Research Commission grant ADHS18-198847.
Author information
Authors and Affiliations
Contributions
This project was conceptualized by JRM, ESH, and CCM; ESH and CCM also provided the funding and resources needed to conduct it. AF, DM, LC, LMK, AK, MDR, AH, JRM, ESH, and CCM designed different aspects of the methodology of this study. ESH collected biological samples, reviewed and processed by AH, and administered the project in conjunction with LMK. AF, DM, LC, and AK analyzed the data under the supervision of CC, JRM, ESH, and CCM. The figures were prepared by DM and CCM, and the original draft was written by AF, DM, and LC, who also reviewed and edited it with LMK, CC, MDR, JYL, JRM, ESH, and CCM. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The Institutional Review Board (IRB) of Duke University Medical Center, as well as the IRB at each participating institution, approved this study, and a waiver of consent was obtained according to the approved protocol.
Consent for publication
Not applicable.
Competing interests
CC serves on the Scientific Advisory Board and/or as consultant for Bristol Myers Squibb, Deepcell, Genentech, NanoString, Ravel, Viosera, and holds equity in Deepcell, Illumina/Grail, and Ravel.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fortunato, A., Mallo, D., Cisneros, L. et al. Evolutionary measures show that recurrence of DCIS is distinct from progression to breast cancer. Breast Cancer Res 27, 43 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-025-01966-2
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13058-025-01966-2