Poisson lognormal models for multivariate count data analysis

Rencontres chaire MMB

Julien Chiquet

MIA Paris-Saclay, INRAE

Mahendra Mariadassou

MaIAGE, INRAE

Stéphane Robin

LPSM, SU

Bastien Batardière

PhD 2022 - 2024

Jeanne Tous

PhD 2024-2026

May 6, 2025

Motivations

Multivariate count data

Routinely gathered in ecology/microbiology/genomics


Data tables

  • Abundances: read counts of species/transcripts \(j\) in sample \(i\)
  • Covariates: value of environmental variable \(k\) in sample \(i\)
  • Offsets: sampling effort for species/transcripts \(j\) in sample \(i\)

Goals

  • understand environmental effects (regression, classification)
  • exhibit patterns of diversity (clustering, dimension reduction)
  • understand between-species interactions (covariance selection)

\(\rightsquigarrow\) In ecology: help understanding underlying mechanisms like dispersion, abiotic/biotic effects

Two illustrative examples

Microcosm (Mariadassou et al. 2023)

Study carried out at INRAE (“Domaine Expérimental du Pin”)

  • microbiotas of 44 lactating cows at 4 body site \(\times\) 4 time points
  • Abundances of \(\approx\) 1200 “species” (Amplicon Sequence Variants)

Tropical freshwater fishes (Trinidad Island)

Collab between Jeanne Tous (PhD) and Anne Magurran (Saint-Andrews)

  • 16 sites (8 streams \(\times\) disturbed or not \(\times\) 19 dates
  • \(n=304\) measures of abundances of \(p=21\) species

                        

Tropical freshwater fishes in Trinidad

Table of Counts/ Abundances (first rows)

Anabhart Andipulc Crenfren Hemitaen Hyporobi Poecreti Astybima Coryaene Coryriis Ancimara Hoplmala Rhamquel Odonpulc Hemiunil Synbmarm Roebdien Dajamont Oreosp Awaobana Gymncara
2 14 0 0 0 187 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 4 3 1 1 197 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 44 0 73 156 0 10 0 0 0 0 0 0 0 0 0 0 0

Raw counts

Log-scale

Tropical fishes

Table of covariates/ exogenous variables (first rows)

latitude longitude altitude width depth volume garbage conductivity O2 pH temperature turbidity coarse_gravel fine_gravel leaf_litter cobble sand silt boulders canopy season year stream disturbance
10.70517 -61.31968 448 5.633333 11.00000 3.491128 9 114.7 8.00 0.00 23.8 2 6.667 10.000 0 30.000 3.333 3.333 18.333 1.619789 dry_start 2011 Lopinot no
10.68905 -61.31958 387 10.866667 24.45714 4.123472 15 146.1 7.86 0.00 23.8 2 25.000 11.667 0 25.000 8.333 0.000 30.000 1.669007 dry_start 2011 Lopinot yes
10.65030 -61.22068 59 13.200000 31.22581 4.314058 90 307.7 7.82 7.69 24.5 3 20.000 23.333 0 6.667 11.667 6.667 1.667 1.583577 dry_start 2011 LowerAripo yes

PCA on covariates: no strong temporal/season effect

Tropical fishes

Table of covariates/ exogenous variables (first rows)

latitude longitude altitude width depth volume garbage conductivity O2 pH temperature turbidity coarse_gravel fine_gravel leaf_litter cobble sand silt boulders canopy season year stream disturbance
10.70517 -61.31968 448 5.633333 11.00000 3.491128 9 114.7 8.00 0.00 23.8 2 6.667 10.000 0 30.000 3.333 3.333 18.333 1.619789 dry_start 2011 Lopinot no
10.68905 -61.31958 387 10.866667 24.45714 4.123472 15 146.1 7.86 0.00 23.8 2 25.000 11.667 0 25.000 8.333 0.000 30.000 1.669007 dry_start 2011 Lopinot yes
10.65030 -61.22068 59 13.200000 31.22581 4.314058 90 307.7 7.82 7.69 24.5 3 20.000 23.333 0 6.667 11.667 6.667 1.667 1.583577 dry_start 2011 LowerAripo yes

PCA on covariates: spatial effect

Models for multivariate count data

If we were in a Gaussian world…

The general linear model (Mardia, Kent, and Bibby 1979) would be appropriate! For each sample \(i = 1,\dots,n\),

\[\underbrace{\mathbf{Y}_i}_{\text{abundances}} = \underbrace{\mathbf{x}_i^\top \mathbf{B}}_{\text{covariates}} + \underbrace{\mathbf{o}_i}_{\text{sampling effort}} + \boldsymbol\varepsilon_i, \quad \boldsymbol\varepsilon_i \sim \mathcal{N}(\mathbf{0}_p, \underbrace{\boldsymbol\Sigma}_{\text{between-species dependencies}})\]

null covariance \(\Leftrightarrow\) independence \(\rightsquigarrow\) uncorrelated species/transcripts do not interact

This model gives birth to Principal Component Analysis, Discriminant Analysis, Gaussian Graphical Models, Gaussian Mixture models and many others \(\dots\)

With count data…

There is no generic model for multivariate counts

  • Data transformation (log, \(\sqrt{}\)): quick and dirty
  • Non-Gaussian multivariate distributions (Inouye et al. 2017): do not scale to data dimension yet
  • Latent variable models: interaction occur in a latent (unobserved) layer


Poisson-lognormal Family
Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

PLN (Aitchison and Ho 1989) is a multivariate generalized linear model, where

  • the counts \(\mathbf{Y}_i\in\mathbb{N}^p\) are the response variables
  • the main effect is due to a linear combination of the covariates \(\mathbf{x}_i\in\mathbb{R}^d\)
  • a vector of offsets \(\mathbf{o}_i\in\mathbb{R}^p\) can be specified for each sample

\[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\underbrace{\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}}}_{{\boldsymbol\mu}_i},\boldsymbol\Sigma). \]

Typically, \(n\approx 10s \to 1000s\), \(p\approx 10s \to 1000s\), \(d \approx 1 \to 10\)

Properties: over-dispersion, arbitrary-signed covariances

  • mean: \(\mathbb{E}(Y_{ij}) = \exp \left( o_{ij} + \mathbf{x}_i^\top {\mathbf{B}}_{\cdot j} + \sigma_{jj}/2\right) > 0\)
  • variance: \(\mathbb{V}(Y_{ij}) = \mathbb{E}(Y_{ij}) + \mathbb{E}(Y_{ij})^2 \left( e^{\sigma_{jj}} - 1 \right) > \mathbb{E}(Y_{ij})\)
  • covariance: \(\mathrm{Cov}(Y_{ij}, Y_{ik}) = \mathbb{E}(Y_{ij}) \mathbb{E}(Y_{ik}) \left( e^{\sigma_{jk}} - 1 \right).\)

Natural extensions of PLN (1)

Various tasks of multivariate analysis

  • Dimension Reduction: rank constraint matrix \(\boldsymbol\Sigma\). (Chiquet, Mariadassou, and Robin 2018)

    \[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]

  • Classification: maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.\]

\[\mathbf{Z}_i \mid i \in k \sim \mathcal{N}(\boldsymbol\mu_k, \boldsymbol\Sigma_k), \quad \text{for unknown memberships}.\]

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \boldsymbol\Omega^{-1}), \quad \|\boldsymbol\Omega \|_1 < c.\]

Natural extensions of PLN (2)

Recent/on-going extensions

\[\begin{array}{rrl} \text{PLN latent space} & \boldsymbol{Z}_i = (Z_{ij})_{j=1\dots p} & \sim \mathcal{N}(\mathbf{x}_i^\top \mathbf{B}, \mathbf{\Sigma}), \\[1.5ex] \text{excess of zeros} & \boldsymbol{W}_{i} = (W_{ij})_{j=1\dots p} & \sim \otimes_{j=1}^p \cal B(\pi_{ij}), \\[1.5ex] \text{observation space} & Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim^\text{indep} W_{ij}\delta_0 + (1-W_{ij})\mathcal{P} \left(\exp\{o_{ij} + Z_{ij}\}\right),\\ \end{array}\]

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma), \quad \mathbf{Z}_i| \mathbf{Z}_{i-1} = \boldsymbol{\Phi}\mathbf{Z}_{i-1} + \mathcal{N}({\boldsymbol\mu}_i^\varepsilon, \boldsymbol\Sigma^\varepsilon).\]

  • \({\boldsymbol\mu}_i^\varepsilon = {\boldsymbol\mu}_i - \boldsymbol{\Phi}{\boldsymbol\mu}_{i-1} \boldsymbol{\Phi}\) with
  • \({\boldsymbol\Sigma}^\varepsilon = {\boldsymbol\Sigma} - \boldsymbol{\Phi}{\boldsymbol\Sigma}_{i-1} \boldsymbol{\Phi}\)
  • \(\boldsymbol{\Phi}\) the positive-definite auto-correlation matrix such as \({\boldsymbol\Sigma}^\varepsilon \succeq 0\).

Estimation

Latent-variable models

Estimate \(\theta = (\mathbf{B}, {\boldsymbol\Sigma})\), predict the \(\mathbf{Z}_i\), while the model marginal likelihood is

\[\begin{equation*} p_\theta(\mathbf{Y}_i) = \int_{\mathbb{R}_p} \prod_{j=1}^p p_\theta(Y_{ij} | Z_{ij}) \, p_\theta(\mathbf{Z}_i) \mathrm{d}\mathbf{Z}_i \end{equation*}\]

Direct approach

Expectation-Maximization

\[\begin{equation*} \log p_\theta(\mathbf{Y}) = \mathbb{E}_{p_\theta(\mathbf{Z}\,|\,\mathbf{Y})} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[p_\theta(\mathbf{Z}\,|\,\mathbf{Y})], \text{ with } \mathcal{H}(p) = -\mathbb{E}_p(\log(p)). \end{equation*}\]

EM requires to evaluate (some moments of) \(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\), but there is no close form!

Variational Inference

Use a proxy \(q_\psi\) of \(p_\theta(\mathbf{Z}\,|\,\mathbf{Y})\) minimizing a divergence in a class \(\mathcal{Q}\) (e.g, Küllback-Leibler)

\[\begin{equation*} q_\psi(\mathbf{Z})^\star \in \arg\min_{q\in\mathcal{Q}} D\left(q(\mathbf{Z}), p(\mathbf{Z} | \mathbf{Y})\right), \, \text{e.g.}, D(.,.) = KL(., .) = \mathbb{E}_{q_\psi}\left[\log \frac{q(z)}{p(z)}\right]. \end{equation*}\]

and maximize the ELBO (Evidence Lower BOund)

\[\begin{equation*} J(\theta, \psi) = \log p_\theta(\mathbf{Y}) - KL[q_\psi (\mathbf{Z}) || p_\theta(\mathbf{Z} | \mathbf{Y})] = \mathbb{E}_{\psi} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[q_\psi(\mathbf{Z})] = \frac{1}{n} \sum_{i = 1}^n J_i(\theta, \psi_i) \end{equation*}\]

Resulting Variational EM

  • VE step: optimize \(\boldsymbol{\psi}\) (can be written individually) \[\begin{equation*} \psi_i^{(h)} = \arg \max J_{i}(\theta^{(h)}, \psi_i) \left( = \arg\min_{q_i} KL[q_i(\mathbf{Z}_i) \,||\, p_{\theta^h}(\mathbf{Z}_i\,|\,\mathbf{Y}_i)] \right) \end{equation*}\]

  • M step: optimize \(\theta\) \[\theta^{(h)} = \arg\max \frac{1}{n}\sum_{i=1}^{n}J_{Y_i}(\theta, \psi_i^{(h)})\]

Implementations

See https://pln-team.github.io/repositories/: documentation, vignettes, examples, etc.

PLNmodels Medium scale problems (R/C++ package)

Up to thousands of sites ( \(n \approx 1000s\) ), hundreds of species ( \(p\approx 100s\) )

  • V-EM: exact M-step + optimization for VE-stem
  • algorithm: conservative convex separable approximations (Svanberg 2002)
  • implementation: NLopt nonlinear-optimization library (Johnson 2011)

pyPLNmodels Large scale problems (Python/Pytorch module)

Up to \(n \to 100,000\) and \(p\approx 10,000s\)

  • V-EM: exact M-step + optimization for VE-stem
  • algorithm: Rprop (gradient sign + adaptive variable-specific update) (Riedmiller and Braun 1993)
  • implementation: torch with GPU auto-differentiation (Paszke et al. 2017)

Other optimisation approaches

MCMC techniques on reduced sampling space

Variational Auto-Encoders and PLN-PCA

  • The Decoder is the generative model \(p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\)
  • The Encoder approximates the posterior distribution with \(q_\psi(\mu_i,\sigma^2_i)\)
  • VAE maximize a lower bound of the marginal \(\log p_{\theta}(\mathbf{Y}_i)\)

Variational estimator: properties

M-estimation framework (Van der Vaart 2000)

Let \(\bar{J}_n \; : \theta \mapsto \frac{1}{n}\sum_{i=1}^{n} \arg \max_{\psi}J_i(\theta, \hat{\psi}_i) \stackrel{\Delta}{=} \frac{1}{n}\sum_{i=1}^{n} \bar{J}_i(\theta)\)

Theorem (Batardière, Chiquet, and Mariadassou 2024)

Let \(\hat{\theta}^{\text{ve}} = \arg\max_{\theta} \bar{J}_{n}(\theta)\). If \(\bar{J}_n\) is smooth enough (e.g. when \(\theta\) and \(\psi_i\) are restricted to compact sets), \[ \sqrt{n}(\hat{\theta}^{\text{ve}} - \bar{\theta}) \xrightarrow[]{d} \mathcal{N}(0, V(\bar{\theta})), \quad \text{where } V(\theta) = C(\theta)^{-1} D(\theta) C(\theta)^{-1} \] for \(C(\theta) = \mathbb{E}[\nabla_{\theta\theta} \bar{J}(\theta) ]\) and \(D(\theta) = \mathbb{E}\left[(\nabla_{\theta} \bar{J}(\theta)) (\nabla_{\theta} \bar{J}(\theta)^\intercal \right]\)

Caveat

  • Open question: \(\bar{\theta} = \theta^\star\) ? No formal results as \(\bar{J}\) is untractable but numerical evidence suggests so.
  • For \(\theta = (\mathbf{B}, \boldsymbol\Omega)\), \(\hat{C}_n\) requires the inversion of \(n\) matrices with \(dp^3\) rows/columns…
  • Assume blockwise diagonal matrix \(\hat{C}_n\) by neglecting the cross-terms between \(\mathbf{B}\) and \(\boldsymbol\Sigma\).

Illustration tropical freshwater fishes

\(n=304\) measures of abundances of \(p=21\) species in 16 sites (8 streams \(\times\) disturbed or not \(\times\) 19 dates

                        

Multivariate regression (1)

Assess environnemental/abiotic effects

\[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\underbrace{\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}}}_{{\boldsymbol\mu}_i},\boldsymbol\Sigma). \]

NR_PLN            <- PLN(Abundance ~ 1, data = NR_DATA)
NR_PLN_Stream     <- PLN(Abundance ~ 0 + stream, data = NR_DATA)
NR_PLN_Stream_PC  <- PLN(Abundance ~ 0 + stream + PC1, data = NR_DATA)
NR_PLN_Stream_PC2 <- PLN(Abundance ~ 0 + stream + PC1 + PC2, data = NR_DATA)


Model selection criteria
model nb_param loglik BIC ICL
PLN ~ 1 230 -9393.133 -10050.592 -17212.93
PLN ~ stream 370 -8292.598 -9350.248 -14740.56
PLN ~ stream + PC1 390 -8206.340 -9321.161 -14573.90
PLN ~ stream + PC1 + PC2 410 -8192.004 -9363.995 -14760.17

We continue with NR_PLN_Stream_PC.

Multivariate regression (2)

Coefficients: \(\hat{\mathbf{B}}\)

Multivariate regression (2)

Coefficients: variance \(\hat{\mathbb{V}}(\hat{\mathbf{B}})\) (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (2)

Coefficients: \(z\)-score (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (3)

Mean effect: \(\hat{\boldsymbol{\mu}} = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}}\)

Multivariate regression (4)

Fit: \(\log \hat{\mathbb{E}}(\mathbf{Y}) = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}} + \mathbf{M} + \frac 12 \mathbf{S^2}\)

Multivariate regression (4)

Fit (per site)

Dimension reduction (1)

Exhibit patterns of diversity

\[\quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]

Model selection criteria for rank (number of PC)

model nb_param loglik BIC ICL
PLN ~ stream + PC1 390 -8206.340 -9321.161 -14573.897
PLNPCA ~ PC1 159 -6565.395 -7019.898 -6699.009
PLNPCA ~ stream + PC1 270 -5881.358 -6653.157 -6562.650

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (3)

Visualization: correct for stream effect

Dimension reduction (4)

Coefficients (correct for Stream): \(\hat{\mathbf{B}}^\text{PLNPCA}\)

Dimension reduction (4)

Coefficients: \(\hat{\mathbf{B}}^\text{PLN}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLNPCA}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLN}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLNPCA}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLN}\)

Discriminant Analysis (1)

Assess environnemental/abiotic effects by classification

Objective

Maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021) \[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}. \]

\(\rightsquigarrow\) Find the linear combination(s) \(\mathbf{Z} u\) maximizing separation between groups


Solution

Find the first eigenvectors of \(\mathbf{\Sigma}_w^{-1} \mathbf{\Sigma}_b\) where

  • \(\mathbf{\Sigma}_w\) is the within-group variance matrix, i.e. the unbiased estimated of \({\boldsymbol\Sigma}\):
  • \(\mathbf{\Sigma}_b\) is the between-group variance matrix, estimated from \(\boldsymbol{\mu_k}\).

Discriminant Analysis (2)

PLN-LDA is a “special” PLN with a grouping variable added to the design matrix + post-treatment

NR_PLNLDA <- PLNLDA(Abundance ~ PC1, grouping = stream, data = NR_DATA)

\(\rightsquigarrow\) Group means are similar to PLN regression coefficients associated with stream

First two discriminant axes

Correlation with original variables

Discriminant Analysis (4)

Group means (PLN-LDA vs PLN)

Classifier

Supervised-learning: provide a classifier to predict group newly observed site:

set.seed(1969)
train_set <- sample.int(nrow(NR_DATA), 200)
test_set  <- setdiff(1:nrow(NR_DATA), train_set)
NR_train <- NR_DATA[train_set, ]
NR_test   <- NR_DATA[test_set, ]
pred_stream <- predict(NR_PLNLDA, newdata = NR_test, type = "response")
aricode::ARI(pred_stream, NR_test$stream)
[1] 0.8908636

Association Network (1)

Inference

NR_PLNNET_all <- PLNnetwork(Abundance ~ PC1, data = NR_DATA, control = PLNnetwork_param(penalize_diagonal = FALSE))
NR_PLNNET_all$stability_selection()
NR_PLNNET_Stream_all <- PLNnetwork(Abundance ~ PC1 + stream, data = NR_DATA, control = PLNnetwork_param(penalize_diagonal = FALSE))
NR_PLNNET_Stream_all$stability_selection()

Chosing the network’s density

Without stream as covariate

With stream as covariate

Association Network (2)

Network selected without stream as covariate

Association Network (2)

Network selected with stream as covariate

Association Network (2)

Precision/Covariance matrices (without stream)

Precision/Covariance matrices (with stream)

Association Network (3)

Model selection criteria

model nb_param loglik BIC ICL
PLN ~ stream + PC1 390 -8206.340 -9321.161 -14573.897
PLNPCA ~ PC1 159 -6565.395 -7019.898 -6699.009
PLNPCA ~ stream + PC1 270 -5881.358 -6653.157 -6562.650
PLNNET ~ PC1 77 -9619.069 -9839.174 -16524.285
PLNNET ~ stream + PC1 211 -8312.991 -8916.138 -14484.045

Fit

Zero-Inflation (1)

\[\begin{array}{rrl} \text{PLN latent space} & \boldsymbol{Z}_i = (Z_{ij})_{j=1\dots p} & \sim \mathcal{N}(\mathbf{x}_i^\top \mathbf{B}, \mathbf{\Sigma}), \\[1.5ex] \text{excess of zeros} & \boldsymbol{W}_{i} = (W_{ij})_{j=1\dots p} & \sim \otimes_{j=1}^p \cal B(\pi_{ij}), \\[1.5ex] \text{observation space} & Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim^\text{indep} W_{ij}\delta_0 + (1-W_{ij})\mathcal{P} \left(\exp\{o_{ij} + Z_{ij}\}\right),\\ \end{array}\]


Modeling of the pure zero component

\[\begin{align*} \pi_{ij} & = \pi \in [0,1] & \text{(single global parameter)} \\ \pi_{ij} & = \pi_j \in [0,1] & \text{(species dependent)} \\ \pi_{ij} & = \pi_i \in [0,1] & \text{(site dependent)} \\ \pi_{ij} & = \mathrm{logit}^{-1}( \boldsymbol X^{0}\boldsymbol B^0)_{ij}, ~ \boldsymbol X^0 \in \mathbb R^{n \times d_0}, ~ \boldsymbol B^0 \in \mathbb R^{d_0\times p} & \text{(site covariates)} \\ \pi_{ij} & = \mathrm{logit}^{-1}(\bar{\boldsymbol{B}}^0 \bar{\boldsymbol{X}}^{0})_{ij}, ~ \bar{\boldsymbol{B}}^0 \in \mathbb R^{n \times d_0}, ~ \bar{\boldsymbol X}^0 \in \mathbb R^{d_0\times p} & \text{(species covariates)} \end{align*}\]

Zero-Inflation (2)

Fit models with covariate effect in ZI/PLN part

NR_ZIPLN  <- ZIPLN(Abundance ~ PC1 | 0 + stream,  data = NR_DATA)
NR_ZIPLNNET_all <- ZIPLNnetwork(Abundance ~ PC1 | 0 + stream,  data = NR_DATA, control=ZIPLNnetwork_param(penalize_diagonal=FALSE))
NR_ZIPLNNET <- NR_ZIPLNNET_all$getBestModel("BIC")

\(\rightsquigarrow\) Stream effect now caught by the ZI component

Model selection criteria

model nb_param loglik BIC ICL
PLN ~ stream + PC1 390 -8206.340 -9321.161 -14573.897
PLNPCA ~ PC1 159 -6565.395 -7019.898 -6699.009
PLNPCA ~ stream + PC1 270 -5881.358 -6653.157 -6562.650
PLNNET ~ PC1 77 -9619.069 -9839.174 -16524.285
PLNNET ~ stream + PC1 211 -8312.991 -8916.138 -14484.045
ZIPLN ~ PC1 | stream 290 -8548.828 -9377.797 -14456.248
ZIPLNNET ~ PC1 | stream 117 -8587.148 -8921.594 -14322.612

Zero-Inflation (2)

Fitted probability of zero-inflation: \(\hat{\boldsymbol{\pi}}\)

\(\rightsquigarrow\) The ZI component can be used as a model for presence/absence (large part explained by the stream)

Zero-Inflation (2)

Coefficient in ZI part (Stream)

Conclusion

Take-home message

  • PLN = generic model for multivariate count data analysis
  • Flexible modeling of the covariance structure, allows for covariates
  • Efficient Variational EM algorithms
  • Hopefully user-friendly tools available

https://computo.sfds.asso.fr, a journal promoting reproducible research in ML and stat.

References

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.
Baey, Charlotte, Maud Delattre, Estelle Kuhn, Jean-Benoist Leger, and Sarah Lemler. 2023. “Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models.” In International Conference on Machine Learning, 1430–53. PMLR.
Batardiere, Bastien, Joon Kwon, and Julien Chiquet. 2024. “pyPLNmodels: A Python Package to Analyze Multivariate High-Dimensional Count Data.” Journal of Open Source Software 9 (104): 6969.
Batardière, Bastien, Julien Chiquet, François Gindraud, and Mahendra Mariadassou. 2024. “Zero-Inflation in the Multivariate Poisson Lognormal Family.” arXiv Preprint. https://arxiv.org/abs/2405.14711.
Batardière, Bastien, Julien Chiquet, Joon Kwon, and Julien Stoehr. 2025. “Importance Sampling-Based Gradient Method for Dimension Reduction in Poisson Log-Normal Model.” Electronic Journal of Statistics.
Batardière, Bastien, Julien Chiquet, and Mahendra Mariadassou. 2024. “Evaluating Parameter Uncertainty in the Poisson Lognormal Model with Corrected Variational Estimators.” arXiv Preprint arXiv:2411.08524.
Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational Inference for Probabilistic Poisson PCA.” The Annals of Applied Statistics 12: 2674–98. http://dx.doi.org/10.1214/18-AOAS1177.
———. 2019. “Variational Inference for Sparse Network Reconstruction from Count Data.” In Proceedings of the 19th International Conference on Machine Learning (ICML 2019).
———. 2021. “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.” Frontiers in Ecology and Evolution 9. https://doi.org/10.3389/fevo.2021.588292.
Inouye, David I, Eunho Yang, Genevera I Allen, and Pradeep Ravikumar. 2017. “A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.” Wiley Interdisciplinary Reviews: Computational Statistics 9 (3).
Johnson, Steven G. 2011. The NLopt Nonlinear-Optimization Package. http://ab-initio.mit.edu/nlopt.
Mardia, K. V., J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic press.
Mariadassou, Mahendra, Laurent X Nouvel, Fabienne Constant, Diego P Morgavi, Lucie Rault, Sarah Barbey, Emmanuelle Helloin, et al. 2023. “Microbiota Members from Body Sites of Dairy Cows Are Largely Shared Within Individual Hosts Throughout Lactation but Sharing Is Limited in the Herd.” Animal Microbiome 5 (1): 32.
Paszke, Adam, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. “Automatic Differentiation in Pytorch.”
Riedmiller, Martin, and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” In IEEE International Conference on Neural Networks, 586–91. IEEE.
Stoehr, Julien, and Stephane S Robin. 2024. “Composite Likelihood Inference for the Poisson Log-Normal Model.” arXiv Preprint arXiv:2402.14390.
Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” SIAM Journal on Optimization 12 (2): 555–73.
Tikhonov, Gleb, Øystein H Opedal, Nerea Abrego, Aleksi Lehikoinen, Melinda MJ de Jonge, Jari Oksanen, and Otso Ovaskainen. 2020. “Joint Species Distribution Modelling with the r-Package Hmsc.” Methods in Ecology and Evolution 11 (3): 442–47.
Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.