Publikationsrepositorium - Helmholtz-Zentrum Dresden-Rossendorf

1 Publikation

Advances in Principal Balances for Compositional Data

Martin-Fernandez, J. A.; Pawlowsky-Glahn, V.; Egozcue, J. J.; Tolosana-Delgado, R.

Abstract

A prior reduction of dimension is often a necessary step when dealing with largedimensional data sets (geochemical surveys, microarray data, genetic Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.

Keywords: Aitchison norm; Cluster analysis; Compositions; Isometric logratio coordinates; Principal component analysis; Simplex

Permalink: https://www.hzdr.de/publications/Publ-18394