Principal component analysis for distributional data with application to particle size distributions


Principal component analysis for distributional data with application to particle size distributions

Pavlu, I.; van den Boogaart, K. G.; Tolosana Delgado, R.; Machalova, J.; Hron, K.

Abstract

Particle or grain size distributions often play an important role in understanding processes in the geosciences. Functional data analysis allows applying multivariate methods like principal components and discriminant analysis directly to such distributions. These are however often observed in the form of samples, and thus with a sampling error, i.e. each data point is a distribution, but one where the sampling error is present. This additional sampling error changes the properties of the multivariate variance and thus the value, number and direction of the principle components. The result of the principal component analysis becomes an artefact of the sampling error and can negatively affect the following data analysis.

Our contribution presents how to compute this sampling error and how to confront it in the context of principal component analysis. We demonstrate the effect of the sampling error and the effectiveness of the correction with a simulated dataset. We show how the interpretability and reproducibility of the principal components improve and become independent of the selection of the basis. We also demonstrate how the correction improves interpretability of the results on a grain size distribution dataset from river sediments.

Keywords: Compositional data analysis; Stratigraphy and Sedimentology

  • Vortrag (Konferenzbeitrag)
    The 22nd annual conference of the IAMG, 05.-12.08.2023, Trondheim, Norwegen

Permalink: https://www.hzdr.de/publications/Publ-38406