Measuring and Modelling the Epithelial Mesenchymal Hybrid State in Cancer: Clinical Implications

Measuring and Modelling the Epithelial Mesenchymal Hybrid State in Cancer: Clinical Implications Mohit Kumar Jolly1,*, Ryan J. Murphy2, Sugandha Bhatia3,4,5, Holly J. Whitfield6,7, Andrew Redfern8, Melissa J. Davis6,7,9,*,#, Erik W. Thompson3,4*,# 1Centre for BioSystems Science and Engineering, Indian Institute of Science, Bangalore 560012, India 2Queensland University of Technology, School of Mathematical Sciences, Brisbane, QLD, 4000, Australia 3Queensland University of Technology, Institute of Health and Biomedical Innovation and School of Biomedical Sciences, Brisbane, QLD, 4102, Australia 4Queensland University of Technology, Translational Research Institute, Brisbane, QLD, 4102, Australia 5The University of Queensland Diamantina Institute, The University of Queensland, Brisbane, QLD, 4102, Australia 6Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia 7Department of Medical Biology, University of Melbourne, Parkville, 3010, Australia 8 Department of Medicine, School of Medicine, University of Western Australia, Fiona Stanley Hospital Campus, Perth, WA, 6150, Australia. 9Department of Clinical Pathology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, 3010, Australia

change may vary both due to variable E/M induction depending on the initial E/M status of individual cells as well as uneven distribution of drug through a tumour depending on variations in blood flow and stromal density [Trédan et al., 2007].
The term metastable state was coined to describe the instability of these E/M hybrid cells [Klymkowsky andSavagner, 2009, Jia et al., 2018], although it may not apply equally to all E/M hybrids.
Although a continuum of states have been shown in clinical tumours as discussed above [Karacosta et al., 2019], certain E/M hybrid states appear to be favoured, and these have (i) relative stability and/or Thus, the development of robust and quantifiable methods for estimating the type and extent of the E/M hybrid state in clinical specimens is becoming an important goal.

The E/M hybrid phenotype in experimental systems
EMP-related changes occupy a nonlinear landscape in the multi-dimensional space of cancer progression . The clinical consequences of the physical and functional heterogeneity in cancers resulting from plasticity and changes across the EMP spectrum make it essential to explore the E/M hybrid phenotype, for which model systems including cell line systems, preclinical models (e.g. syngeneic mouse cancers), and patient-derived xenografts ( [Puram et al., 2017]. The E/M hybrid signature correlated with a malignant basal HNSCC subtype and lymph node metastasis, signifying that partial EMT promotes loco-regional invasion In situ, these E/M hybrid cells were seen at the leading edge of tumours associated with cancer-associated fibroblasts [Puram et al., 2017]. Similarly, cells at different positions on the E/M axis were observed to associate with different stromal cell populations in a range of human tumours in a second study [Pastushenko et al., 2018]. The advent of more combinatorial single cell approaches utilising NGS and mass cytometry is now allowing the portrayal of integrated genome,

Deriving signatures with DE
Plastic phenotypes are often quantified using a workflow that combines a perturbation experiment in vitro with a differential expression analysis. To generate an EMT gene expression signature, an EMT program is stimulated using an established EMT inducer such as TGFb [Taube et al., 2010, Du et al., 2016 or EGF [Cursons et al., 2015] to enable the identification of a group of genes that are differentially expressed in response to the stimulus within the given system, and in correspondence with morphological changes.
These differentially expressed genes establish a stimulus-specific gene set for EMT, which can also be separated into "epithelial" and "mesenchymal" components representing the up-and down-regulated genes . Along these lines, Figure   2 displays Cancer Cell Line Encyclopedia (CCLE) breast cancer cell lines and The Cancer Genome Atlas (TCGA) tumour samples across an epithelial-mesenchymal landscape. The landscape shows that basal and claudin-low cell lines have lower epithelial scores but higher mesenchymal scores, while the luminal cell lines are less mesenchymal and more epithelial.
We would expect that samples expressing an E/M hybrid phenotype would be found in the top right quadrant, with both a high mesenchymal and high epithelial score. Importantly, many basal and even some of the more fully mesenchymal claudin-low cell lines exhibit an They found considerable overlap between the HMLER E/M signature and a Lung E/M signature also compiled from combining "epithelial" and "mesenchymal" components derived from a microarray dataset of 93 lung cancer cell lines [Loboda et al., 2011]. This Lung E/M signature was also found to best differentiate two major unsupervised subpopulations of colon cancers, where the mesenchymal sub-population also correlated with poorer outcome, although the hybrid state was not studied [Loboda et al., 2011]. Signatures from scRNA-seq are discussed below.

Methods for applying signatures
The higher-level biological information captured by gene expression signatures can be mapped onto new datasets using gene set analysis methods, revealing the extent to which they express similar molecular phenotypes. Gene set analysis methods (reviewed and benchmarked recently by Geistlinger and colleagues [Geistlinger et al., 2020]) can assess specific hypotheses, such as "Does this group of samples express an EMT program?", or competitively test a group of gene signatures e.g. "Which molecular program is upregulated in these samples?". When applied to differential analysis workflows, EMT gene expression signatures can evaluate the differential enrichment for an EMT program between two groups of samples. Alternatively, single-sample gene set analysis methods such as ssGSEA [Barbie et al., 2009] or Singscore , Bhuva et al., 2020 can be used to identify individual samples that are concordant with gene expression signatures, and estimate the EMP phenotype of a single sample. In recent work, a refinement of the Singscore method has been developed to remove the requirement for whole-transcriptome scale measurement in order to apply EMT signatures [Bhuva et al., 2020]. Bhuva and colleagues demonstrate that by adding as few as five stably expressed anchor genes to the genes included in an EMT signature, resolution of epithelial, mesenchymal and hybrid phenotypes can be achieved in the context of a low-or medium throughput measurement methods such as qRTPCR or Nanostring. This refinement enables the translation of EMT signatures commonly used with RNA sequencing in an experimental research context into clinical application, or application to archival FFPE tissues where whole-transcriptome RNA sequencing is frequently not possible.

Alternative scoring of EMT phenotypes
Similarly, various scoring methods have been developed for the quantification of EMT phenotypes in samples independent of a differential expression workflow. These approaches

Gene set scoring in single-cell data
Single cell RNA sequencing (scRNA-seq) offers a powerful tool to identify epithelial, mesenchymal and E/M hybrid states in mixed cell populations within heterogeneous samples.
Various single-cell specific methods are being developed for the phenotypic quantification of individual cells, as summarised in Table 1

Mapping to references in single-cell data
Lahnemann et al, 2020 has recently reviewed single cell "mapping" or cell type annotation [Lähnemann et al., 2020], where the authors argue in favour of reference-based methods.
Here, unsupervised clustering followed by cell type annotation is considered "referencefree". The marker genes used in these "reference-free" methods are found on a correlative basis, where the functional relevance of the genes in specific phenotypes is not guaranteed

Single-sample gene set testing in single cell data
Single sample gene set testing methods have been developed for bulk analyses that provide continuous phenotype scores for individual samples. If applied to scRNA-seq data these methods would phenotypically characterise individual cells, independent of clustering. for scRNA-seq data, however, no benchmarking or comparison between methods has been performed. There is work being done to identify cell-type specific drop out patterns [Qiu, 2020], suggesting that drop out perhaps cannot be modelled during cell type discovery, and that we need an approach for characterising phenotype that is independent of drop out patterns. By applying single-sample gene set enrichment using EMT gene sets from MSigDB, the authors identified EMT signatures that were associated with the most epithelial and most mesenchymal samples. Although mesenchymal profiles correlated with poorer outcome in some tumours such as ovarian and colorectal cancer, this was not universal, with mesenchymal breast tumours having a better prognosis except in a post-chemotherapy cohort. Again, although this analysis lent itself to the potential study of the hybrid state, this was not explored.

Need for new gene expression signatures
Many of these EMT gene signatures have been derived from microarray datasets and may not best represent the data we now obtain from modern RNA sequencing platforms [

Role of mathematical modelling in identifying the E/M hybrid state(s)
Mathematical models have played a major role in our understanding of the E/M hybrid state.
As experiments typically tend to be performed at a single spatial scale (cell-level or population-level) it can be challenging to interpret results from these different spatial scales when processes also occur over multiple overlapping time scales. Mathematical models are a powerful tool with which to integrate the observations from different spatial and time scales, to not only interpret these experimental results, but to also generate and test new hypotheses that can be experimentally tested. This approach has proven extremely insightful in transforming the perception of EMT from a binary (all-or-none) process to a multi-stable process. Significantly, mathematical modelling studies were among the first to predict the  Figure 4A and 4B, we see parameter regions where more than one phenotype can co-exist, suggesting that isogenic cells exposed to the same dose of EMTinducing signal can yet display 'non-genetic' heterogeneity in their EMT status due to various reasons such as stochasticity in biochemical reactions underlying these networks [Huang, 2009].
Alternative single cell models have focused on larger networks describing the multiple signalling pathways implicated with EMT. While continuous models for large networks can provide information regarding temporal network dynamics, such as phenotypic plasticity and transition rates, it can be experimentally intractable to determine all of the relevant parameters, therefore Boolean-logic approaches, where a gene is considered either active or inactive, have been considered. Using this approach, a study by Steinway et al. [150] identified two stable fixed points corresponding to the epithelial and mesenchymal phenotypes. A subsequent study identified new fixed points that exhibit features of both epithelial and mesenchymal phenotypes thought to be associated to E/M phenotypes and identified putative targets that supress hepatocellular carcinoma EMT. In vitro analysis indicated many of the predictions do supress the TGFß-driven EMT, and thus are potential therapeutic targets [Steinway et al., 2015]. Later work by Font-Clos et al. [55] extended the work of Steinway et al. [150] to identify multiple E/M fixed points between the more stable epithelial and mesenchymal fixed points. Hari et al. [73] have since compared the Boolean-logic approach with RACIPE (Random Circuit Perturbation), where an ensemble of continuous models is generated with random chosen kinetic parameters for a given network topology and steady state solutions are clustered to identify robust dynamical features of the given network. They show that multistability correlates with the net number of positive feedback loops. In contrast to traditional approaches where nodes in a network are targeted to control EMT plasticity, Hari et al. [73] propose breaking the feedback loops by targeting their links rather than nodes as a possible route to restrict the EMT plasticity.

Population dynamics of EMP
Experiments at the population scale have observed epithelial-mesenchymal heterogeneity where cells in the same tissue/population could be either epithelial, mesenchymal, or in an E/M hybrid state, and are able to switch between these states. Recent studies have demonstrated the ability of cells to switch states in the absence of EMT-inducing factors such as TGFß. Ruscetti et al. [138] show that an initial population of E/M hybrid cells can quickly generate a mixture of epithelial, E/M hybrid, and mesenchymal cells. Further, Bhatia et al. [18] demonstrates that an initial population consisting of either epithelial or mesenchymal cells has the ability to regain the phenotypic equilibrium of the parental population. . However, these models lack integration with the core regulatory network models previously discussed. Therefore, recent work has explored how to link the well-characterised dynamics of EMT regulatory networks to experimentally observed population-level behaviour. In an example of integrating experimental data with known interaction and regulatory networks, Cursons et al. [42] used an experimental model of EMP driven by the well-known ZEB1-miR200 network [Gregory et al., 2011]. Exploring plasticity across an EMP landscape, they integrated experimental data with a known miR-mRNA interaction network and known protein interaction networks to characterise a cotargeted regulatory network regulating EMT and MET.
One approach to incorporate core regulatory networks is to consider a population of cells where each cell contains a copy of the core EMT regulatory circuit . The dynamics of the EMT circuit can be calculated for each cell independently to determine the phenotype: epithelial, E/M hybrid, or mesenchymal. Motivated by spontaneous EMT observed in Ruscetti et al. [138], Tripathi et al. [159] developed a model which included cell proliferation, and show that noise in partitioning of signalling pathways during division could be a possible mechanism by which epithelial-mesenchymal heterogeneity can be maintained over extended periods of time involving multiple generations and passages.

Spatiotemporal dynamics and heterogeneity in EMP: Role of cell-cell communication and cell-ECM crosstalk
An alternative approach to link the cell-level EMT behaviour is to explicitly consider spatial structure. Early evidence of EMT in clinical samples arose from differences in spatial

Clinical utilisation of the E/M hybrid state
Advantages of an increasing focus on the E/M hybrid state are two-fold; (i) evidence emerging as described above that E/M hybrid cells associate better with carcinoma progression than frankly mesenchymal cells, and (ii) the increased certainty that the cells being identified are indeed carcinoma cells exhibiting EMP, since such bi-phenotypic cells are seldom observed outside of malignancy, unlike the abundance of cells exhibiting a mesenchymal phenotype, such as carcinoma-associated fibroblasts, endothelial cells, and various immune cells.

Immunohistochemical studies:
A number of studies have associated evidence of mesenchymal change (gain of mesenchymal markers and/or aberrant epithelial markers) in the tumour parenchyma with poor outcomes (e.g. [Domagala et al., 1990]). However, many early studies measured mesenchymal and epithelial marker expression in separate specimens without exploring co-staining, as mentioned above, and so would have potentially classified both fully mesenchymal and E/M hybrid tumours as mesenchymal. Interestingly, one such early study in breast cancer found the worst prognosis from tumours with relatively equal proportions of epithelial and mesenchymal cells rather than a preponderance of either one [Thomas et al., 1999].  [Ono et al., 1996], and Zeb2 expression at the invasive front in primary colorectal cancers correlated with cancer-specific survival in more recent work [Kahlert et al., 2011].

Novel technologies:
The impact of E/M axis positioning, inclusive of the hybrid state, on cancer pathobiology leads to the potential for EMP assessments of individual cancers to have prognostic / predictive clinical significance for patients, in terms of both malignant potential, therapy resistance and eventual survival (see Table 2 for summary of studies to date). Several studies have identified basal-like breast cancers co-expressing classical epithelial (e.g. cytokeratin 5/6) and mesenchymal markers (e.g. vimentin) [Thomas et al., 1999, Cakir et al., 2012, and such coexpression has also been seen in colon cancer , high grade serous ovarian cancer (HGSOC; [Gonzalez et al., 2018]), and lung carcinoma by novel single cell techniques such as DepArray and CyToff, respectively. High resolution, sub-cellular microscopy has also been used to assess E/M hybrid state in a variety of tumours [Godin et al., 2020]. Single cell RNA-seq analysis has also shown that partial EMT was an independent predictor of nodal metastasis, higher grade, and adverse pathologic features e.g. lymphovascular invasion in head and neck squamous cell carcinoma [Puram et al., 2017]. The concept that E/M hybrids contribute to tumour progression prompts the hypothesis that "phenotypic stability factors" that promote the E/M hybrid state, could also correlate with poor prognosis and, by extension, could prove useful therapeutic targets. In reality, this could depend on whether a given marker is just expressed in the hybrid state or whether it actually contributes to maintaining that state. In modelling, OVOL1/2 proteins inhibited EMT but variably expanded the E/M hybrid phenotype  or enhanced the movement to the epithelial state [Watanabe et al., 2014]. Possibly reflecting this variability, they were relatively highly expressed in non-small cell lung carcinoma E/M hybrid cells, which were resistant to EGFR blockade, expression levels only falling in fully mesenchymal cells, although the association of OVOL1/2 with survival was not reported in this study [Fustaino et al., 2017].
However, OVOL2A expression correlated with sensitivity to Palbociclib in breast cancer [Finn et al., 2009]. Further to this, high OVOL2 mRNA levels in breast cancer patients correlated with a better metastasis-free survival (HR 0.415, p=0.0002) and overall survival (HR 0.465, p=0.0046). It could be hypothesized that it is the prevention of full mesenchymal transition rather than the promotion of the E/M hybrid state that drives this association with better prognosis (Watanabe 2014 P-cadherin has also been postulated as an E/M hybrid marker. P-cadherin can interfere with epithelial cell-cell adhesion and promote invasion and metastasis, but only in the presence of E-cadherin. A three-protein EMT signature applied to 500 breast cancers, using E-cadherin and vimentin combined with P-cadherin expression, showed that P-cadherin identified an intermediate state between E and M phenotypes, associating with the expression of breast cancer stem cell markers and a poor prognosis in breast cancer [Ribeiro and Paredes, 2014].
This concurred with an earlier study also showing P-cadherin correlated with poorer 10 years overall survival, disease-specific survival and distant relapse-free interval in univariate analysis, although the association was not seen in multivariate analysis [Turashvili et al., 2011].
Taken together there is relatively good data linking specific factors associated with the hybrid state with treatment resistance and poor outcome. However, whether the pathogenesis of the hybrid state relates to traits of the hybrid state itself, such as cancer stem cell features, or whether it is the ability of hybrid cells to readily access to both E and M phenotypes through plasticity, remains unknown. Further work is required to assess the utility of targeting or manipulating these molecules for therapeutic gain. Collectively, the E/M hybrid phenotype appears to offer considerable promise as a unique state with prognostic potential, and utility in monitoring therapy response, for which specific assays can be developed using different technologies, which are summarised in Table 3.

Conclusions, state of play, and potential needs
Clearly there is growing evidence for a preponderance of E/M hybrid states in carcinoma systems, and that these states often are associated with more aggressive malignant or metastatic capacity. As detailed above, a growing number of computational approaches have been developed to illustrate and characterise E/M hybrid states, and theoretical prediction and support for the stability of particular intermediates has been gleaned from mathematical modelling around some of the well characterised molecular feedback systems. The E/M hybrid state, in a carcinoma context, is somewhat unique, such that opportunities exist in the prognostic and predictive biomarker setting that show promise, and new technologies are being developed to assess this. To our knowledge, whether these E/M hybrid states show specific sensitivity/resistance profiles to cancer therapies remains to be seen, however it is likely that many of the studies implicating EMT in therapy resistance have been carried out with cells that exhibit an E/M hybrid phenotype. Studies that shed further light on the reasons behind the metastatic advantages of the E/M hybrid state, their biomarker potential based on unique co-expression of epithelial and mesenchymal markers, and opportunities for specificity in therapeutic targeting, will be helpful in determining and achieving the exploitation of EMP in cancer care.    The level of resolution obtained in scRNA-seq can be leveraged to investigate the presence or absence of EMT markers, and combinations of different markers, in individual cells. We demonstrate this using breast cancer scRNA-seq data downloaded from GEO (GSE122743) where two cell lines, MCF7 (Fig a,b) and T47D (Fig c,d) were deprived from oestradiol (E2) to study endocrine therapy resistance within CD44 high and low populations ].
Many scRNA-seq analyses apply t-distributed stochastic neighbour embedding (tSNE) to visualise clusters and assess the relative heterogeneity of cell groups and cell types (Fig b,   d). Although this may capture epithelial and mesenchymal populations, scRNA-seq can also identify groups of cells with specific combinations of marker genes, as demonstrated here using upset plots (Fig 1, b. Intersection size indicates the number of cells expressing a given combination of markers). Through visualisations such as these, scRNA-seq grants the ability to pinpoint individual cells that express both epithelial and mesenchymal markers, thereby enabling the characterisation of hybrid phenotypes at a finer resolution and without concerns of heterogeneity