In the United States, urothelial carcinoma (UC) of the bladder

Save this PDF as:
 WORD  PNG  TXT  JPG

Μέγεθος: px
Εμφάνιση ξεκινά από τη σελίδα:

Download "In the United States, urothelial carcinoma (UC) of the bladder"

Transcript

1 Intrinsic subtypes of high-grade bladder cancer reflect the hallmarks of breast cancer biology Jeffrey S. Damrauer a,b, Katherine A. Hoadley a,b, David D. Chism a,c, Cheng Fan a,b, Christopher J. Tiganelli d, Sara E. Wobker e, Jen Jen Yeh a,d,f, Matthew I. Milowsky a,c, Gopa Iyer g, Joel S. Parker a,b, and William Y. Kim a,b,c,1 b Department of Genetics, c Department of Medicine, Division of Hematology/Oncology, d Department of Surgery, e Department of Pathology and Laboratory Medicine, and f Department of Pharmacology, a Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599; and g Department of Medicine, Memorial Sloan-Kettering Cancer Center, New York, NY Edited* by William G. Kaelin, Jr., Harvard Medical School, Boston, MA, and approved January 15, 2014 (received for review October 2, 2013) We sought to define whether there are intrinsic molecular subtypes of high-grade bladder cancer. Consensus clustering performed on gene expression data from a meta-dataset of highgrade, muscle-invasive bladder tumors identified two intrinsic, molecular subsets of high-grade bladder cancer, termed luminal and basal-like, which have characteristics of different stages of urothelial differentiation, reflect the luminal and basal-like molecular subtypes of breast cancer, and have clinically meaningful differences in outcome. A gene set predictor, bladder cancer analysis of subtypes by gene expression (BASE47) was defined by prediction analysis of microarrays (PAM) and accurately classifies the subtypes. Our data demonstrate that there are at least two molecularly and clinically distinct subtypes of high-grade bladder cancer and validate the BASE47 as a subtype predictor. Future studies exploring the predictive value of the BASE47 subtypes for standard of care bladder cancer therapies, as well as novel subtypespecific therapies, will be of interest. In the United States, urothelial carcinoma (UC) of the bladder is the fourth most common malignancy in men and the ninth mostcommoninwomen,with72,570newcasesand15,210deaths expected in 2013 (1). Bladder cancer is heterogeneous and can be histologically divided into low-grade and high-grade disease. Whereas low-grade tumors are almost invariably noninvasive (Ta), high-grade tumors can be classified based on invasion into the muscularis propria of the bladder, as non muscle invasive bladder cancer (NMIBC; Tis, Ta, T1) or muscle invasive bladder cancer (MIBC; T2). Low-grade tumors are associated with a high rate of recurrence but an excellent overall prognosis, with a 5-y survival in the range of 90%. In contrast, high-grade MIBC has a relatively poor 5-y overall survival, 68% for T2 and decreasing to 15% for non organ-confined disease (i.e., pt3 and pt4) (1, 2). Along with divergent pathologies and prognosis, low-grade and high-grade UCs are associated with distinct genetic alterations. For example, low-grade UCs are enriched for activating mutations in fibroblast growth factor 3 (FGFR3), phosphatidylinositol-4,5- bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), and inactivating lysine (K)-specific demethylase 6A (KDM6A) mutations, whereas high-grade, muscle-invasive tumors are enriched for tumor protein p53 (TP53) and retinoblastoma 1 (RB1) pathway alterations (3 10). Several reports have examined the gene expression profiles of primary bladder tumors. From these studies, it is apparent that low-grade noninvasive and high-grade muscle-invasive tumors harbor distinct gene expression patterns, and that further molecular subsets can be identified within low-grade and high-grade tumors (5, 11 14). Moreover, a number of gene signatures have been developed that can predict tumor stage, lymph node metastases, and bladder cancer progression (11 13, 15 18). There are established gene expression patterns that differentiate lowgrade and high-grade tumors; however, there are little data identifying intrinsic subtypes specifically within high-grade disease. We have identified two intrinsic molecular subsets of highgrade bladder cancer, termed luminal and basal-like, with differences in clinical outcome. In addition, we have developed a 47-gene predictor, bladder cancer analysis of subtypes by expression (BASE47), which can accurately classify high-grade UCs into luminal and basal-like tumors. The molecular subtypes appear to reflect different stages of urothelial differentiation and strikingly recapitulate aspects of breast cancer biology, including a claudin-low subtype. Results Consensus Clustering Reveals Two Distinct Molecular Subtypes of High-Grade Bladder Cancer. Previous studies examining the gene expression changes associated with bladder cancer have assessed both low-grade and high-grade tumors in aggregate (5, 11 13, 19, 20). In the present study, we looked exclusively for intrinsic subtypes of high-grade disease agnostic to clinical stage or outcome. We first created a meta-dataset of 262 high-grade muscleinvasive tumors, curated from four publically available datasets (12, 19 21) (SI Appendix, Table S1). In parallel, an independent set of 49 high-grade tumors from Memorial Sloan-Kettering Cancer Center (MSKCC) served as a validation set (8). In both the meta-dataset and the MSKCC dataset, consensus clustering identified two groups (designated K1 and K2) as the optimal number of molecular subtypes, as defined by the criterion of subclass stability (Fig. 1 A and B and SI Appendix, Fig.S1A and B). To validate that the gene expression changes that define the two subtypes are similar, we determined the correlation between the median gene expression (using all common genes between datasets) for each subtype (Fig. 1C; yellow, correlation; blue, anticorrelation). There appeared to be a high level of correlation between the meta-dataset K1 cluster and MSKCC K2 cluster, as well as between the meta-dataset K2 cluster and MSKCC K1 cluster. Significance The identification of molecular subtype heterogeneity in breast cancer has allowed a deeper understanding of breast cancer biology. We present evidence that there are two intrinsic subtypes of high-grade bladder cancer, basal-like and luminal, which reflect the hallmarks of breast biology. Moreover, we have developed an accurate gene set predictor of molecular subtype, the BASE47, that should allow the incorporation of subtype stratification into clinical trials. Further clinical, etiologic, and therapeutic response associations will be of interest in future investigations. Author contributions: J.S.D. and W.Y.K. designed research; J.S.D., K.A.H., D.D.C., C.F., C.J.T., S.E.W., G.I., and J.S.P. performed research; J.S.D., K.A.H., C.F., J.J.Y., M.I.M., J.S.P., and W.Y.K. analyzed data; and J.S.D., D.D.C., S.E.W., J.J.Y., M.I.M., G.I., J.S.P., and W.Y.K. wrote the paper. Conflict of interest statement: J.S.D. and W.Y.K. have submitted a patent application for the BASE47 gene classifier. *This Direct Submission article had a prearranged editor. 1 To whom correspondence should be addressed. This article contains supporting information online at /pnas /-/DCSupplemental. MEDICAL SCIENCES PNAS Early Edition 1of6

2 A Meta B MSKCC C K1 K1 K2 K2 K1 D CD44 KRT14 KRT5 KRT6B KRT20 UPK2 UPK1B UPK3A E Cancer Cell Survival Cell Movement Meta-dataset Invasion of Tumor Cells Metastasis Cell Transformation Growth of Tumor Breast Cancer and Tumors Cell Survival Cell Viability Cell Viability of Leucocytes Cell Viability of Monocytes Cell Viability of Lymphocytes Cell Death of Epithelial Cells Migration of Cells Homing of Cells Cell Movement Chemotaxis K1 K2 MSKCC Thus, the intrinsic molecular subtypes defined by independent discovery in the two datasets are defined by highly concordant gene expression patterns and suggest that the subtypes are robust. Intrinsic Molecular Subtypes of Bladder Cancer Differentially Express Markers of Urothelial Differentiation. To understand the gene expression patterns that differentiate the intrinsic subtypes of highgrade bladder cancer, we performed two-class significance analysis of microarrays (SAM) comparing clusters K1 and K2 from the meta-dataset. A total of 2,393 genes were found to be differentially expressed [false discovery rate (FDR) cutoff, 0] (SI Appendix, Table S2). The intrinsic molecular subtypes were characterized by gene expression patterns representative of urothelial differentiation. Cluster K1 of the meta-dataset expressed high levels of the high molecular weight keratins (HMWKs) keratin 14 (KRT14), keratin 5 (KRT5), and keratin 6B (KRT6B), as well as of CD44 molecule (CD44), which are expressed in urothelial basal cells (22, 23). In contrast, cluster K2 expressed high levels of uroplakins [uroplakin 1B (UPK1B), uroplakin 2 (UPK2), and uroplakin 3A (UPK3A)], as well as the low molecular weight keratin (LMWK) keratin 20 (KRT20) -2 Meta 0 K K1 K2 6 Activation z-score Fig. 1. Consensus clustering defines two distinct molecular subtypes of MIBC. (A) Consensus clustering was performed on 262 muscle-invasive tumors curated from four publicly available datasets (meta-dataset), yielding two subtypes. (B) Consensus clustering was performed independently on a dataset of high-grade bladder tumors obtained from MSKCC (n = 49). (C) The median gene expression of all common genes between the datasets were compared, and the Pearson correlation was plotted (yellow, correlation; blue, anticorrelation). Numerical values represent the Pearson correlation. (D) Gene expression of epithelial and urothelial markers were visualized by heatmap, supervised by ConsensusClusterPlus calls in the meta-dataset. (E) Significantly differentially expressed genes between K1 and K2 from the meta-dataset and their respective fold changes, as determined by two-class SAM (2,393 genes; FDR, 0) were analyzed for predicted pathway enrichment by IPA. Selected significant pathways enriched in K1 are shown. 8 (Fig. 1D), characteristic of urothelial umbrella cells (23). Moreover, the gene expression of KRT5 was inversely correlated with that of both UPK2 and KRT20 across all tumors (SI Appendix, Fig. S1 E and F). Similar findings were seen in the MSKCC dataset (SI Appendix, Fig. S1C, F, andg). We performed ingenuity pathway analysis (IPA) to examine whether processes other than urothelial differentiation were associated with the intrinsic subtypes. IPA revealed that K1 cluster tumors were enriched in gene pathways involving cancer, cell survival, and cell movement (Fig. 1E). In aggregate, these findings demonstrate that the two molecular subtypes of high-grade MIBC represent different stages of urothelial differentiation, leading us to name cluster 1 basal-like and cluster 2 luminal. BASE47 Accurately Predicts -Like and Subtypes. We next sought to define a minimal set of genes that could accurately classify bladder tumors into the luminal and basal-like bladder intrinsic subtypes. To this end, we applied prediction analysis of microarrays (PAM) to our meta-dataset and derived a 47-gene signature, BASE47 (SI Appendix, Table S3), that could accurately classify basal-like and luminal tumors relative to consensus clustering (training error rate, 0.11 and 0.05, respectively) (Fig. 2). A pairwise comparison of the subtype classification by consensus clustering relative to classification by BASE47 showed a strong correlation in both the meta-dataset and the MSKCC datasets (both P < 0.001, χ 2 test). Cluster analysis using the MSKCC tumors illustrates the gene expression patterns that compose the BASE47 (SI Appendix, Fig. S2A). Intrinsic Bladder Subtypes Have Differential Survival. We next asked whether the intrinsic bladder subtypes had prognostic significance. -like tumors (as identified by BASE47) had significantly decreased disease-specific and overall survival (P = and P = respectively) (Fig. 3 A and B). Moreover, of the clinicopathological features available to us in the MSKCC dataset, only BASE47 subtype was found to be significant for disease-specific survival by univariate analysis (SI Appendix, Table S4 and Fig. S2B). Furthermore, to assess the prognostic value of the BASE47 relative to published prognostic signatures derived from highgrade, muscle-invasive tumors, we generated good and poor prognosis calls on the MSKCC tumors using the published gene lists (11, 13) (SI Appendix, Fig. S2 C and D). Neither gene signature demonstrated prognostic value (SI Appendix, Fig. S3 E and F). Thus, the BASE47 intrinsic bladder subtypes not only reflect bladder cancer biology, but also have prognostic value. PAM Consensus Clustering of Meta Dataset Centroid CCP Meta Training MSKCC Validation CCP BASE47 Prediction BASE p<0.001 BASE p<0.001 Fig. 2. Generation of the BASE47 subtype predictor. PAM was performed using the basal-like and luminal subtype calls generated by ConsensusClusterPlus. A predictor consisting of 47 genes was generated that accurately predicted the subtypes from the meta-dataset training set (P < 0.001), as well as a MSKCC validation dataset (45 of 47 genes present; P < 0.001). 2of6 Damrauer et al.

3 A C DSS Probability1.0 MSKCC DSS p= Months Superficial Tumors - - GSE Like Like E Meta-dataset Male Female χ2 pvalue= D Gender ** FGFR3 Mut ** TSC1 Mut/Del ** RB1 Pathway RB1 Mut/Del CCND1 Amp E2F3 Amp CCNE1 Amp TP53 Pathway MDM2 Amp TP53 Mut **p<0.05 MSKCC OS p= Months Pathway Alterations Genetic Alteration Female Male Fig. 3. and basal-like bladder cancer have differential survival and are associated with distinct genomic alterations. (A and B) Kaplan Meier plots for muscle-invasive tumors from the MSKCC dataset ( pt2) were generated for disease-specific survival (A) and overall survival (B). -like tumors (n = 22) had a significantly decreased disease-free and overall survival compared with luminal tumors (n = 16) (P = and , respectively). (C) Superficial tumors not included in the generation of BASE47 were subjected to BASE47 subtype prediction. (D) Sequencing and copy number analysis was performed on common mutations in bladder cancer. FGFR3 and TSC1 alterations were significantly enriched in luminal bladder cancer, whereas alterations of the RB1 pathway were enriched in basal-like bladder cancer. TP53 alterations were distributed evenly in both subtypes. (E) -like and luminal tumors from the meta-dataset were annotated for sex (two of four datasets). The prevalence of basal-like bladder cancer was significantly greater in female patients (P value = , χ 2 test). Interestingly, whereas the BASE47 predictor was developed on muscle-invasive tumors, we also noted that when applied to a meta-dataset of superficial tumors, it classified a significant proportion them as basal-like (Fig. 3C), suggesting that the intrinsic subtypes may exist in NMIBC. Intrinsic Subtypes Are Associated with Distinct Genomic Alterations. The MSKCC tumor set has been previously characterized for bladder cancer-relevant genetic alterations (8). We examined the relative enrichment of these molecular events in the bladder subtypes (Fig. 3D). Notably, FGFR3 (P < 0.001) and tuberous sclerosis 1 (TSC1) (P = 0.02) mutations were significantly enriched in the luminal subtype, whereas RB1 pathway alterations [i.e., RB1 mutations/deletions, Cyclin D1 (CCND1) amplification, E2F transcription factor 2 (E2F3) amplification, or Cyclin E1 (CCNE1) amplification] were significantly enriched in basal-like bladder cancer (P = 0.009). Multiple studies have shown poorer bladder cancer-specific outcomes in females compared with males (24). We found a trend toward enrichment of basal-like tumors in female patients in the MSKCC dataset (Fig. 3D; P = ), and a significantly higher incidence of basal-like bladder cancer in female patients in the meta-dataset with annotated sex (Fig. 3E). This enrichment of basal-like bladder cancer may explain in part the poorer bladder cancer-specific outcomes in women. B OS Probability Like Bladder Cancer Is Enriched for the Signatures of -Like Breast Cancer and Tumor-Initiating Cells. A number of genes fundamental for breast development and breast cancer were found to be coregulated with genes that regulate urothelial development (SI Appendix, Fig. S3). Moreover, Gene Set Enrichment Analysis (GSEA) performed on the meta-dataset to identify gene sets enriched in the intrinsic subtypes found multiple breast cancerrelated gene signatures enriched in the basal-like bladder subtype, as well as signatures related to mammary stem cells (SI Appendix, Table S5). In keeping with these findings, we found that a previously published bladder tumor initiating cell (TIC) signature (22) was enriched in the basal-like subtype by both hierarchical clustering (P = , χ 2 test) (SI Appendix, Fig.S4A) and GSEA (SI Appendix, Fig. S4B), suggesting that basal-like bladder cancer has a more stem-like phenotype, similar to that described previously in basal-like breast cancer (25). Intrinsic Bladder Subtypes Reflect the Attributes of Breast Cancer Subtypes. We next asked whether the basal-like and luminal bladder cancer subtypes correlated with any of the previously defined molecular subtypes of breast cancer (26). To this end, we generated breast cancer molecular subtype classifications (basal, Her2-enriched, luminal A, luminal B, and normal-like) on two independent sets of breast tumors [The Cancer Genome Atlas (TCGA) (27) and UNC337 (28)] using the PAM50 nearestcentroid classifier (29). To examine whether the gene expression patterns of luminal and basal-like bladder cancers were reflected in the intrinsic breast cancer subtypes, we correlated the centroid gene expression (using the breast intrinsic gene list) between the bladder and breast cancer subtypes (Fig. 4A; yellow, correlation; blue, anticorrelation). -like bladder cancers demonstrated positive correlations with basal-like breast as well as normal-like breast cancer subtypes, whereas luminal bladder cancers had positive correlations with both luminal A and luminal B breast cancer subtypes. A similar comparison using published gene expression data from TCGA showed that whereas there were other cross-cancer similarities, the molecular association between breast and bladder cancers was relatively strong (SI Appendix, Fig. S4C) (27, 30 34). Finally, strikingly, applying PAM50 to our meta-dataset of bladder tumors revealed positive correlations between basal-like bladder tumors and the basal centroid and luminal bladder tumors and the luminal A centroid subtype (SI Appendix, Figs. S4 D and E). To better visualize this association, we hierarchically clustered the bladder tumors using a comprehensive list of 1,906 genes (1,426 of which were present in the meta-dataset) that were previously shown to define the intrinsic subtypes of breast cancer (29). The breast-specific gene list clustered the bladder tumors along the lines of basal-like and luminal bladder subtypes (P = 2.2e-16, χ 2 test) (SI Appendix, Fig.S4F). Furthermore, gene signatures representative of basal-like and luminal breast cancers as well as well-defined breast cancer-related oncogenic pathway signatures faithfully clustered basal-like and luminal bladder tumors in both datasets (Fig. 4B and SI Appendix, Fig. S5A). -like bladder tumors displayed enhanced MYC and E2F3 pathway signatures, whereas luminal tumors appeared enriched in the set of genes characteristic of the HER2 amplicon. Taken together, these data strongly demonstrate that the gene expression patterns that distinguish basal-like and luminal bladder cancers reflect the RNA expression patterns that define the intrinsic subtypes of breast cancer. A Subset of -Like Bladder Tumors Are Claudin-Low. The recently described claudin-low molecular subtype of breast cancer is characterized by low expression of the claudin tight junction proteins (claudins 3, 4, and 7) and up-regulation of epithelial to mesenchymal transition (EMT) markers as well as stem cell-like MEDICAL SCIENCES Damrauer et al. PNAS Early Edition 3of6

4 A Meta Bladder B Breast Breast Pathway TCGA UNC 337 Her2 LumA LumB Normal Breast MSKCC Bladder PAM50 Calls LumA LumB Her2 Normal TCGA UNC 337 Her2 LumA LumB Normal Breast Pearson Correlation Cluster PMID: GATA3 PMID: GATA3 PMID: Estrogen Reg. PMID: Cluster PMID: Mature Down PMID: Progenitor PMID: Metaplastic PMID: MaSC PMID: Bladder Subtypes Myc Targets PMID: Oncogenic E2F3 PMID: Her2 Amplicon PMID: high levels of genes that typically mark urothelial basal cells (KRT14, KRT5, and KRT6B). The basal cell compartment is a common feature of most organs with stratified or pseudostratified epithelium. It is characterized by its proximity to the basement membrane and is thought to harbor multipotent tissue stem cells important for normal tissue homeostasis and orderly regeneration after injury. Because basal cells are a longlived population, they are potentially more likely to incur multiple genomic alterations, including changes in their chromatin landscape. In this regard, it is interesting to note that there appears to be a relatively high prevalence of mutations in histone- and chromatin-modifying genes in UC (3). The luminal and basal-like subtypes of bladder cancer reflect many of the hallmarks of the intrinsic breast cancer subtypes. For example, a number of basal-like and luminal breast cancerspecific gene signatures were enriched in the corresponding bladder subtype, including bonafide luminal breast cancer pathways, such as GATA3 and estrogen receptor signaling in the luminal bladder subtype. Moreover, the gene expression patterns that define luminal and basal-like bladder cancers closely corresponded with the gene expression patterns that define luminal (A and B) and basal-like breast cancers. These similarities may reflect the presence of urothelial basal cells and their corollary, the basal/myoepithelial cells of the breast. In both tissues, these basal cells represent a multipotent stem/progenitor cell population (35, 36) and their similar functional roles may explain their similar molecular profiles. We identified some differences between the breast and bladder cancer intrinsic subtypes as well. For example, whereas we identified a claudin-low subtype of bladder cancer, in contrast to breast cancer in which claudin-low tumors arise from multiple Fig. 4. -like and luminal bladder cancers are correlated with the intrinsic molecular subtypes of breast cancer. (A) Median gene expression of genes present in the breast cancer-specific intrinsic gene list were determined for each bladder and breast subtype, and 1-Pearson correlation was calculated comparing bladder subtypes (bladder tumors) with breast subtypes (breast tumors) in both the TCGA and UNC337 breast datasets. (B) The meta-dataset tumors were run against previously published breast cancer-related gene sets. The resulting pathway scores were clustered by hierarchical clustering, and heatmaps were generated for visualization. features (28). Tumors from the meta-dataset were classified based on an 807 gene signature, which accurately defines claudin-low breast cancer (28). Overall, 16% of the meta-dataset tumors (Fig. 5) and 26% of the MSKCC tumors (SI Appendix, Fig. S5B) were identified as claudin-low. When clustered based on genes that define key molecular pathways in claudin-low breast tumors (breast cancer subtype markers, EMT markers, and TIC markers) (Fig. 5 and SI Appendix, Fig. S5B) the claudin-low bladder tumors displayed expression patterns indicative of claudinlow breast tumors. In addition, whereas all of the claudin-low tumors were of the basal subtype, they did not appear to have different disease-specific or overall survival from basal-like bladder tumors (SI Appendix, Fig.S5 C and D), in keeping with the claudinlow subtype in breast cancer (28). Thus, a subset of basal-like bladder tumors have claudin-low features. Discussion Using independent discovery in distinct datasets, we have defined two molecular subtypes of high-grade UC with molecular features reflecting different stages of urothelial differentiation. bladder cancers express markers of terminal urothelial differentiation, such as those seen in umbrella cells (UPK1B, UPK2, UPK3A, and KRT20), whereas basal-like tumors express Breast Cancer Subtype Markers EMT Markers TIC Markers Claudin Low KRT5 KRT14 KRT17 KRT18 KRT19 ESR1 PGR GATA3 ERBB2 CLDN3 CLDN7 CLDN4 CDH1 VIM SNAI2 TWIST1 ZEB1 ZEB2 CD44 EPCAM MUC1 THY1 ALDH1A Other Fig. 5. A subset of basal-like bladder tumors are claudin-low. The metadataset was clustered hierarchically using representative genes known to define claudin-low breast tumors. Claudin-low subtype designation was assigned using a previously defined 807-gene signature. 4of6 Damrauer et al.

5 A Recurrence B Breast Cancer FGFR3 RAS LG papillary Normal urothelium BC BC Claudin Low A B Her 2 PTEN RB1 TP53 BC PTEN RB1 TP53 HG non-papillary HG papillary Claudin Low Bladder Cancer Fig. 6. Proposed model of urothelial tumorigenesis and relationships with intrinsic subtypes of breast cancer. (A) Low-grade (LG) and high-grade (HG) UCs are associated with specific genetic alterations. Low-grade papillary tumors often incur FGFR3, RAS, and receptor tyrosine kinase alterations, whereas high-grade tumors are characterized by loss of tumor-suppressor genes, such as PTEN, TP53, and RB1 pathway alterations. Whereas most lowgrade tumors recur as low-grade, a small proportion progress to high-grade tumors in association with PTEN, TP53, and RB1 pathway alterations. We propose that low-grade tumors that progress are likely to be high-grade papillary tumors of the luminal molecular subtype, and that de novo highgrade tumors are likely to be of the basal-like expression subtype. Whether luminal, nonpapillary tumors arise from low-grade tumors is unclear. (B) Diagram showing the proposed relationship between intrinsic subtypes of breast and bladder cancers. intrinsic subtypes, all of the claudin-low bladder tumors are a subpopulation of the basal-like subtype. Furthermore, despite a subset of luminal bladder tumors with elevated expression of the HER2 amplicon, we did not find any significant correlation with the Her2-enriched breast subtype using our correlation matrix (Fig. 4A). Our study has created a gene signature, BASE47, which accurately discriminates intrinsic bladder subtypes. Interestingly, even in superficial bladder tumors, there appears to be a significant number of basal-like tumors. Although the characteristics of our meta-dataset did not allow us to determine whether the subtypes were prognostic or predicted the progression to muscleinvasive disease in superficial bladder tumors, these are important questions with significant clinical implications, such as early cystectomy for patients with high-grade T1 disease. The ability to accurately classify basal-like and luminal bladder subtypes with only 47 genes should allow the adoption of BASE47 to formalinfixed, paraffin embedded tissues, allowing its widespread use. Female patients with UC have inferior outcomes compared with males with UC, even when controlling for other known prognostic variables, such as stage and grade (24). Interestingly, we found that females have an increased incidence of basal-like bladder cancer, which is associated with worse outcomes. To the extent to which this increased prevalence of basal-like bladder tumors in women contributes to their poorer outcomes remains unclear. Moreover, whether this association suggests that the pathogenesis of bladder cancer in females (i.e., chronic inflammation) is different is of interest. In summary, the basal-like and luminal intrinsic subtypes of bladder cancer reflect many aspects of urothelial physiological development, as well as breast cancer biology (Fig. 6 A and B).? These findings underscore the idea of common themes underlying the development and maintenance of solid tumors that extend beyond overlapping mutational spectra. An appreciation of subtype heterogeneity has substantially furthered our understanding of breast cancer biology. Our results suggest that the intrinsic subtypes of high-grade bladder cancer strikingly reflect many aspects of breast cancer. It will be particularly interesting to see whether the bladder subtypes, like the breast subtypes, are useful in stratification for therapy. Materials and Methods Training Dataset Analysis. A meta-dataset was generated by combining the muscle-invasive ( T2) UC samples from four publicly available datasets (GSE13507, GSE31684, GSE32894, and GSE5287) with clinical annotation provided by the Michor laboratory at Dana-Farber Cancer Institute. The data were normalized, median-centered by gene, and merged into a single dataset comprising 262 tumors. The mean absolute deviation (MAD) was computed across samples by gene. Genes with a MAD score of >0.10 were selected for clustering analysis (7,303 genes). Consensus hierarchical clustering was performed as described previously (37), with 90% resampling and 1,000 iterations. Two-class SAM (FDR, 0) was performed to generate subtype-specific gene lists (38). The significant genes and corresponding fold changes as determined by SAM were analyzed by IPA (Ingenuity Systems) for predicted pathway activation. GSEA was performed comparing basal and luminal tumors against MSigDB.c2.all.v4.0 (39, 40). Validation Dataset. Gene expression data were derived from 49 high-grade tumors from MSKCC using human HT-12 expression BeadChip arrays (Illumina) as described previously (8). The MSKCC dataset was normalized and median-centered, and the MAD was computed across samples by gene. Genes with a MAD score of >0.10 were selected for clustering analysis (3,357 genes). Consensus clustering was performed identically to the meta-dataset (37). The resulting subtype assignments for K2 were used to validate the training dataset. Centroids were generated for both the meta-dataset and MSKCC dataset using all common genes, and correlations were calculated by 1-Pearson correlation. Copy number alterations and hotspot mutation analyses were determined as described previously (8). Subtype Predictor. PAM was used to determine the minimal number of genes that could accurately predict subtype classification on the meta-dataset, using the consensus clustering calls as a reference (41). The resulting 47-gene predictor (Δ = 6.3) was used to classify the MSKCC samples (41). Tumors were then analyzed for enrichment of mutations or copy number alterations (8) by the χ 2 or Fisher exact test as appropriate. Categorical survival analyses were performed using the log-rank test and visualized with Kaplan Meier plots. BASE47 was then applied to superficial tumors, which were excluded from the meta-dataset. The superficial tumors were normalized and mediancentered as described previously, and BASE47 calls were made using PAM. Correlation with Breast Cancer. TCGA (27) and UNC337 (GSE18229) breast mrna datasets were log-transformed and median centered. Breast and bladder subtypes were correlated using Pearson correlation (visualized as 1-Pearson) by the median gene expression of the breast cancer intrinsic gene list (29). PAM50 calls were made on the individual datasets that composed the meta-dataset and the MSKCC dataset as described previously (29). Breast cancer signature enrichment was performed as described previously (42). Samples were hierarchically clustered and visualized on a heatmap. Claudinlow subtype calls were made as described previously (28). Note Added in Proof. During the course of our studies, we became aware of the work by Choi et al. (43) whose results independently demonstrate the existence of basal and luminal subtypes of high grade bladder cancers with molecular features that resemble their corresponding breast cancer subtypes. ACKNOWLEDGMENTS. We thank the members of the W.Y.K. and Perou laboratories, as well as Matthew Wilkerson, for useful discussions. We also thank Dr. Keith Chan for sharing his bladder TIC gene signature. This work was supported by National Institutes of Health Grant R01 CA (to W.Y.K.), T32-CA (to C.J.T.), and Integrative Vascular Biology Training Grant T32-HL (to J.S.D.). W.Y.K. is a Damon Runyon Merck Clinical Investigator. MEDICAL SCIENCES Damrauer et al. PNAS Early Edition 5of6

6 1. Siegel R, Naishadham D, Jemal A (2013) Cancer statistics, CA Cancer J Clin 63(1): Goebell PJ, Knowles MA (2010) Bladder cancer or bladder cancers? Genetically distinct malignant conditions of the urothelium. Urol Oncol 28(4): Gui Y, et al. (2011) Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet 43(9): Lindgren D, et al. (2012) Integrated genomic and gene expression profiling identifies two major genomic circuits in urothelial carcinoma. PLoS ONE 7(6):e Lindgren D, et al. (2010) Combined gene expression and genomic profiling define two intrinsic molecular subtypes of urothelial carcinoma and gene signatures for molecular grading and outcome. Cancer Res 70(9): López-Knowles E, et al. (2006) PIK3CA mutations are an early genetic alteration associated with FGFR3 mutations in superficial papillary bladder tumors. Cancer Res 66(15): Sjödahl GG, et al. (2011) A systematic study of gene mutations in urothelial carcinoma; inactivating mutations in TSC2 and PIK3R1. PLoS ONE 6(4):e Iyer G, et al. (2013) Prevalence and co-occurrence of actionable genomic alterations in high-grade bladder cancer. J Clin Oncol 31(25): Dinney CPN, et al. (2004) Focus on bladder cancer. Cancer Cell 6(2): Wu X-R (2005) Urothelial tumorigenesis: A tale of divergent pathways. Nat Rev Cancer 5(9): Sanchez-Carbayo M, Socci ND, Lozano J, Saint F, Cordon-Cardo C (2006) Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J Clin Oncol 24(5): Riester M, et al. (2012) Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer. Clin Cancer Res 18(5): Blaveri E, et al. (2005) Bladder cancer outcome and subtype classification by gene expression. Clin Cancer Res 11(11): Dyrskjøt L, et al. (2003) Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33(1): Smith SC, et al. (2011) A 20-gene model for molecular nodal staging of bladder cancer: Development and prospective assessment. Lancet Oncol 12(2): Lee J-S, et al. (2010) Expression signature of E2F1 and its associated genes predict superficial to invasive progression of bladder tumors. J Clin Oncol 28(16): Modlich O, et al. (2004) Identifying superficial, muscle-invasive, and metastasizing transitional cell carcinoma of the bladder: Use of cdna array analysis of gene expression profiles. Clin Cancer Res 10(10): Kim W-J, et al. (2011) A four-gene signature predicts disease progression in muscle invasive bladder cancer. Mol Med 17(5-6): Sjödahl G, et al. (2012) A molecular taxonomy for urothelial carcinoma. Clin Cancer Res 18(12): Kim W-J, et al. (2010) Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol Cancer 9: Als AB, et al. (2007) Emmprin and survivin predict response and survival following cisplatin-containing chemotherapy in patients with advanced bladder cancer. Clin Cancer Res 13(15 Pt 1): Chan KS, et al. (2009) Identification, molecular characterization, clinical prognosis, and therapeutic targeting of human bladder tumor-initiating cells. Proc Natl Acad Sci USA 106(33): Castillo-Martin M, Domingo-Domenech J, Karni-Schmidt O, Matos T, Cordon-Cardo C (2011) Molecular pathways of urothelial development and bladder tumorigenesis. Urol Oncol 28: Donsky H, Coyle S, Scosyrev E, Messing EM (2014) Sex differences in incidence and mortality of bladder and kidney cancers: National estimates from 49 countries. Urol Oncol 32(1):e23 e Lim E, et al.; kconfab (2009) Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nat Med 15(8): Perou CM, et al. (2000) Molecular portraits of human breast tumours. Nature 406(6797): Cancer Genome Atlas Network (2012) Comprehensive molecular portraits of human breast tumours. Nature 490(7418): Prat A, et al. (2010) Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res 12(5):R Parker JS, et al. (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27(8): Cancer Genome Atlas Network (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487(7407): Cancer Genome Atlas Research Network (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216): Cancer Genome Atlas Research Network (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499(7456): Cancer Genome Atlas Research Network (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474(7353): Kandoth C, et al.; Cancer Genome Atlas Research Network (2013) Integrated genomic characterization of endometrial carcinoma. Nature 497(7447): Rakha EA, Reis-Filho JS, Ellis IO (2008) -like breast cancer: A critical review. J Clin Oncol 26(15): Kurzrock EA, Lieu DK, Degraffenried LA, Chan CW, Isseroff RR (2008) Label-retaining cells of the bladder: Candidate urothelial stem cells. Am J Physiol Renal Physiol 294(6):F1415 F Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: A class discovery tool with confidence assessments and item tracking. Bioinformatics 26(12): Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9): Mootha VK, et al. (2003) PGC-1alpha responsive genes involved in oxidative phosphorylation are coordinately down-regulated in human diabetes. Nat Genet 34(3): Subramanian A, et al. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102(43): Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99(10): Fan C, et al. (2011) Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures. BMC Med Genomics 4: Choi W, et al. (2014) Identification of distinct basal and luminal subsets of human bladder cancers with different sensitivities to frontline chemotherapy. Cancer Cell, in press. 6of6 Damrauer et al.

7 SUPPLEMENTAL FIGURE LEGENDS Supplemental Figure S1. Consensus Clustering defines two distinct molecular subtypes of invasive bladder cancer (A) Consensus Cumulative Distribution Function (CDF) plot were generated by Consensus Cluster Plus clustering (1) on the Meta-dataset and the (B) MSKCC dataset. (C) Genes representative of epithelial and urothelial differentiation were used to generate a heatmap, supervised by subtype in the MSKCC dataset. KRT5 mrna expression was plotted against UPK2 and KRT20 expression in the (D, E) Metadataset and (F, G) MSKCC dataset. Supplemental Figure S2. Clinical variable are not significantly associated with disease specific or overall survival and previous muscle invasive signatures are not prognostic in the MSKCC dataset. (A) The BASE47 gene list was used to cluster the MSKCC dataset, showing two distinct expression profiles. BASE47 genes are listed along the right. (B) Kaplan-Meier plots for disease specific survival were generated based on gender, stage and whether a tumor was pure urothelial or of mixed differentiation. (C) The MSKCC dataset was hierarchically clustered by prognostic signatures from Blaveri et. al. and (D) Sanchez-Carbayo et. al. (E,F) The tumors were then classified based on the two major clusters (black and red) and Kaplan-Meier plots were made to determine the prognostic value of these signatures.

8 Supplemental Figure S3. Makers of luminal breast cancer are co-expressed with markers of urothelial differentiation Hierarchal clustering was performed on all significantly differentially expressed (2-class SAM) genes from the Metadataset. The node containing the urothelial makers of differentiation, UPK2 and UPK1A, was isolated and overlapped with the Parker et. al. breast cancer intrinsic gene list (2). A significant number of genes within the node overlapped the breast cancer gene list (19/65 genes, chi squared p value=0.006). Supplemental Figure S4. -like bladder cancer possesses tumor initiating cell traits and is molecular similar to basal breast cancer. (A) A 467-gene signature of bladder TIC expression (330/467 genes present in the meta-dataset) was used to hierarchically cluster the tumors from the metadataset. (B) GSEA was performed using the bladder TIC gene set. The basal subtype was significantly enriched for the TIC signature (nominal p value=0.006). (C) The intrinsic breast cancer gene list was used to correlate basal, her2, Lum A and Lum B breast tumors from the TCGA to the basal and luminal bladder cancer subtypes from the Meta-dataset tumors as well as 6 additional tumors from The Caner Genome Atlas (TCGA). (D) Waterfall plots representing the correlation of basal-like (black) and luminal (red) bladder tumors to the and (E) A breast cancer centroid as determined by the PAM50. (F) The meta-dataset of bladder tumors were clustered by genes that defined the intrinsic subtypes of breast cancer. Tracks indicated bladder cancer subtypes as well as subtypes predicted by the breast cancer PAM50 bioclassifier.

9 Supplemental Figure S5., luminal, claudin low and oncogenic breast cancer signatures are associated with intrinsic molecular subtypes of bladder cancer. (A) The MSKCC datasets tumors were run against previously published breast cancer related gene sets (3) and the resulting pathway scores were clustered by hierarchical clustering and heatmaps were generated for visualization. (B) The MSKCC dataset was hierarchically clustered using representative genes known to define claudin-low breast tumors. Claudin low subtype designation was performed using an 807 gene signature. (C) A Kaplan- Meier plot was generated comparing disease specific survival and (D) overall survival of the basal, luminal and claudin low subtypes. Supplemental Table S1: Description of the dataset sets used to create the Meta-dataset Supplemental Table S2: Significantly differentially expressed genes between basal-like and luminal bladder cancer subtypes (FDR=0). Supplemental Table S3: PAM Output. Supplemental Table S4. Univariable Cox Regression Analysis of Overall Survival on the MSKCC dataset. Supplemental Table S5: Gene sets enriched in basal-like and luminal bladder cancer by GSEA.

10 Supplemental References 1. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26: Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27: Fan C, Prat A, Parker JS, Liu Y, Carey LA, Troester MA, et al. Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures. BMC Med Genomics; 2011;4:3.

11 Supplemental Figure S1 A consensus CDF 1.0 Meta B 1.0 consensus CDF MSKCC CDF CDF consensus index MSKCC consensus index C CD44 K1 K2 KRT14 KRT5 KRT6B UPK1A KRT20 UPK2 UPK3A D UPK2 Expression F KRT20 Expression KRT5 VS. UPK2 (Meta) K1 K2 p = 2.2e KRT5 Expression KRT5 vs. KRT20 (MSKCC) K1 K2 p= KRT5 Expression E KRT20 Expression G UPK2 Expression KRT20 VS. KRT5 (Meta) K1 K2 p = 2.1e KRT5 Expression KRT5 vs. UPK2 (MSKCC) K1 K2 p=3.07e KRT5 Expression

12 Supplemental Figure S2 A B MSKCC Disease Specific Survival AHNAK2 CDK6 ALOX5AP CHST15 EMP3 MT1X MT2A GLIPR1 MSN TUBB6 PDGFC PALLD PRKCDBP PRRX1 GAREM GDPD3 BHMT SLC9A2 GPD1L CYP4B1 FBP1 GATA3 CYP2J2 TOX3 CAPN5 FAM174B PPFIBP2 PPARG RAB15 HMGCS2 RNF128 ADIRF SCNN1B SCNN1G TRAK1 UPK2 UPK1A TMEM97 SPINK1 TMPRSS2 PLEKHG6 VGLL1 SEMA5A TBX2 SLC27A2 E OS Probability C Disease Specific Survival Disease Specific Survival Stage II Stage III Stage IV Months MSKCC BASE47 Log Rank p= Months Stage MSKCC by Blaveri et. al. Signature Log Rank p=0.463 Good Poor Months BASE47 Blaveri et. al. Log Rank p=0.236 Disease Specific Survival Disease Specific Survival D F OS Probability Female Male Gender Log Rank p= Months Histology Log Rank p=0.939 TCC Divergent Differentiation Months MSKCC BASE47 Sanchez-Carbayo et. al. MSKCC by Sanchez-Carbayo et. al. Signature Log Rank p=0.414 Good Poor Months

13 Supplemental Figure S3 CYP2J2 DHRS2 FXYD3 CROT GPX2 MECOM MAOA RAPGEFL 1 S100P MGST2 ICA1 CNGA1 MCCC1 IFNA14 ATP8B1 ACSF2 CYP4F12 TJP3 CYP4B1 ELF3 FOXA1 HMGCS2 FAM174B BCAS1 CYB5A GPD1L GATA3 EPN3 ERBB3 EVPL BCAT2 ADIRF CAPN5 PPFIBP2 UPK1A PLCE1 UPK2 TRAK1 SCNN1B SCNN1G TMPRSS2 TLE2 TBX2 TMEM97 ZBTB7C TOP2B RAB15 NR2F6 TM7SF2 PIK3C2B PLEKHG6 SUOX SNCG RAB11A VIPR1 VGLL1 SH3GLB2 ZNF443 TBX3 SLC9A2 SPINK1 SLC29A3 PSCA PPARG K1 K2 Urothelial Related Breast Cancer Related Text Color Meta-dataset

14 Supplemental Figure S4 A B Meta-dataset Meta-dataset (ES = 0.76, NOM pvalue = 0.006) C LumB EC C U D EA AD TCGA R V O O C BM G C al F R KI al in m Lu s Ba Meta Dataset 0 50 E Samples Correlation to LumA Centroid LumA 1 - Pearson 1.2 Her TCGA Breast Correlation to Centroid D Samples

15 Supplemental Figure S5 A MSKCC Cluster PMID: Breast Signatures GATA3 PMID: GATA3 PMID: Estrogen Reg. PMID: Cluster PMID: Breast Signatures Mature Down PMID: Progenitor PMID: Metaplastic PMID: MaSC PMID: Pathway Signatures Myc Targets PMID: Oncogenic E2F3 PMID: Her2 Amplicon PMID: B Bladder Bladder Claudin Low Other C Disease Specific Survival Breast Cancer Subtype Markers EMT Markers TIC Markers MSKCC Disease Specific Survival Claudin Log Rank p= Months D Overall Survival Claudin MSKCC Overall Survival KRT5 KRT14 KRT17 KRT18 KRT19 ESR1 PGR GATA3 ERBB2 CLDN3 CLDN7 CLDN4 VIM SNAI2 TWIST 1 ZEB1 ZEB2 CD44 EPCAM MUC1 THY1 ALDH1A Log Rank p= Months

16 Supplemental Table S1: Dataset Characteristics Training Datasets (Meta) Validation Dataset Als Kim Riester Sjödahl Iyer GEO ID GSE5287 GSE13507 GSE31684 GSE32894 cbioportal Clinical Characteristic No. No. No. No. No. Sex Male NA NA Female NA NA Stage pt pt pt pt pt NA

17 Supplemental Table S2 Two Class SAM vs FDR=0 Postive Genes Negative Genes Gene ID Score(d) Numerator(r) Denominator(s+s0) Fold Change q- value(%) Gene ID Score(d) Numerator(r) Denominator(s+s0) Fold Change q- value(%) CHST PPARG EMP RAB CORO1C UPK AHNAK GAREM CLIC TRAK CDK SCNN1B MSN TOX SACS GATA PDGFC SEMA5A PALLD RNF GLIPR TMEM MT1X PLEKHG TUBB ADIRF OSMR ERBB PRKCDBP SLC27A CD SCNN1G DEGS ACSF PRRX GPD1L FLNA VGLL LRIG TBX DPYD TMPRSS ANXA ATP8B PTGS PLCE ALOX5AP EPB41L NOD BCAT FAP HMGCS DSE SPAG CERK FBP TNC SLC9A MPP MYCL TLR TBX MT2A CYP2J TNFAIP VIPR SNAI PPFIBP RRAS SLC29A FAS GDPD KCTD EVPL KIAA TM7SF ST3GAL DHRS CD PIK3C2B CD ACSL MT1M TLE C1S ACOX SIRPA CAPN PTRF FAM174B ITGA CNGA ZEB SNCG LY NR2F LAMP CYP4B HCK VSIG EFEMP EPN PRNP UPK1A CAV BHMT TGFBI ZBTB7C SLA TJP ATXN MCCC AXL ZSCAN FPR ACOXL CSF1R CYP4F IL15RA SPINK FCGR2A FOXA SAMSN SORL NMT SYT SPI THEM IFITM CAB39L SERPINA PSCA NCF SLC37A RNASE SSH SYNC CBLC RGS TMEM C1QA SUOX AP1S ZNF NFIL TNFRSF TMEM45A PPP1R3C FCER1G P4HTM