mCOPA: analysis of heterogeneous features in cancer expression data

Wang, Chenwei, Taciroglu, Alperen, Maetschke, Stefan R, Nelson, Colleen C., Ragan, Mark A, & Davis, Melissa J (2012) mCOPA: analysis of heterogeneous features in cancer expression data. Journal of Clinical Bioinformatics, 2(1), p. 22.

provisional online pdf file (PDF 1MB)
Published Version.

View at publisher (open access)


Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers

Impact and interest:

9 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

81 since deposited on 19 Dec 2012
13 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 55866
Item Type: Journal Article
Refereed: Yes
Keywords: cancer, outlier, expression data, cluster, subtype, heterogenous, feature selection
DOI: 10.1186/2043-9113-2-22
ISSN: 2043-9113
Subjects: Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Bioinformatics (060102)
Divisions: Current > Institutes > Institute of Health and Biomedical Innovation
Copyright Owner: Copyright 2012 BioMed Central
Deposited On: 19 Dec 2012 05:54
Last Modified: 25 Apr 2013 22:11

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page