Seed-based biclustering of gene expression data
Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner.
Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty.
Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Journal Article|
|Keywords:||bioinformatics, microarray, bicluster, genomics, data mining|
|Subjects:||Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > BIOCHEMISTRY AND CELL BIOLOGY (060100) > Bioinformatics (060102)
Australian and New Zealand Standard Research Classification > BIOLOGICAL SCIENCES (060000) > GENETICS (060400) > Gene Expression (incl. Microarray and other genome-wide approaches) (060405)
|Divisions:||Current > QUT Faculties and Divisions > Faculty of Health
Current > Institutes > Institute of Health and Biomedical Innovation
|Copyright Owner:||Copyright 2012 The authors|
|Copyright Statement:||Open access freely available online.|
|Deposited On:||07 Feb 2013 05:59|
|Last Modified:||20 Feb 2015 03:28|
Repository Staff Only: item control page