• Decrease font size
  • Return font size to normal
  • Increase font size
U.S. Department of Health and Human Services

Scientific Publications by FDA Staff

  • Print
  • Share
  • E-mail
-

Search Publications



Fields



Centers











Starting Date


Ending Date


Order by

Entry Details

PLoS One 2014 Oct 27;9(10):e111318

A Composite Model for Subgroup Identification and Prediction via Bicluster Analysis.

Chen HC, Zou W, Lu TP, Chen JJ

Abstract

BACKGROUND: A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response. METHODS: This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms. RESULTS: The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample's subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset. CONCLUSION: The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.


Category: Journal Article
PubMed ID: #25347824 DOI: 10.1371/journal.pone.0111318
PubMed Central ID: #PMC4210136
Includes FDA Authors from Scientific Area(s): Toxicological Research
Entry Created: 2014-10-28 Entry Last Modified: 2015-02-07
Feedback
-
-