Roederer M
,
Moore W
,
Treister A
,
Hardy RR
,
Herzenberg LA
Probability binning comparison: a metric for quantitating multivariate distribution differences
Cytometry. 2001 Sep 1;45(1)
:4755
PMID:
11598946
URL:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11598946
Abstract
BACKGROUND: While several algorithms for the comparison of univariate distributions arising from flow cytometric analyses have been developed and studied for many years, algorithms for comparing multivariate distributions remain elusive. Such algorithms could be useful for comparing differences between samples based on several independent measurements, rather than differences based on any single measurement. It is conceivable that distributions could be completely distinct in multivariate space, but unresolvable in any combination of univariate histograms. Multivariate comparisons could also be useful for providing feedback about instrument stability, when only subtle changes in measurements are occurring. METHODS: We apply a variant of Probability Binning, described in the accompanying article, to multidimensional data. In this approach, hyperrectangles of n dimensions (where n is the number of measurements being compared) comprise the bins used for the chisquared statistic. These hyperdimensional bins are constructed such that the control sample has the same number of events in each bin; the bins are then applied to the test samples for chisquared calculations. RESULTS: Using a MonteCarlo simulation, we determined the distribution of chisquared values obtained by comparing sets of events from the same distribution; this distribution of chisquared values was identical as for the univariate algorithm. Hence, the same formulae can be used to construct a metric, analogous to a tscore, that estimates the probability with which distributions are distinct. As for univariate comparisons, this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. We apply the algorithm to multivariate immunophenotyping data, and demonstrate that it can be used to discriminate distinct samples and to rank samples according to a biologicallymeaningful difference. CONCLUSION: Probability binning, as shown here, provides a useful metric for determining the probability with which two or more multivariate distributions represent distinct sets of data. The metric can be used to identify the similarity or dissimilarity of samples. Finally, as demonstrated in the accompanying paper, the algorithm can be used to gate on events in one sample that are different from a control sample, even if those events cannot be distinguished on the basis of any combination of univariate or bivariate displays. Published 2001 WileyLiss, Inc.
Notes
21481684
01964763
Journal Article
