Some considerations on the usage of compositional data in artificial intelligence


Some considerations on the usage of compositional data in artificial intelligence

Tolosana Delgado, R.

Abstract

Spatial dependence between samples is not the only particularity that data science, machine learning, deep learning or artificial intelligence should deal with in the geosciences: quite pervasive also is the existence of data with multivariate unconventional statistical scales. As an example, this contribution discusses the effects of the compositional metric on predictive algorithms, i.e., the way to compare compositional data, or data informing of the relative importance of some parts forming a whole.

Most machine learning methods (lasso or ridge regression, partition trees, random forests, artificial neural networks of any depth, support-vector machines (SVM) and other kernel methods, etc.) are mostly used to extract a response prediction rule out of a training data set with known responses. If compositional data play any role, either as explanatory or as response variable, there is a chance that its specific scale affects the performance of these methods.

The compositional scale is induced by the fact that the information conveyed by compositional data is only relative (of one part to another) and for the need to work with closed and non-closed sub-compositions (formed by a subset of the parts either summing to 100% or not). Dealing with these issues is particularly eased by taking an invertible set of (log)ratios of the parts, and several alternative logratio transformations have been proposed in the literature. However, for each of the methods above there is either an objectively preferred transformation, or else they provide all identical results. In this contribution we will review the properties of the compositional scale, of the transformations to logratio scores available, and how do they interact with the methods mentioned before.

  • Vortrag (Konferenzbeitrag)
    The 22nd annual conference of the IAMG, 05.-12.08.2023, Trondheim, Norwegen

Permalink: https://www.hzdr.de/publications/Publ-38407