Poisson lognormal models for multivariate count data analysis

Rencontres chaire MMB

Julien Chiquet

MIA Paris-Saclay, INRAE

Mahendra Mariadassou

MaIAGE, INRAE

Stéphane Robin

LPSM, SU

Bastien Batardière

PhD 2022 - 2024

Jeanne Tous

PhD 2024-2026

May 6, 2025

Motivations

Multivariate count data

Routinely gathered in ecology/microbiology/genomics

Data tables

Abundances: read counts of species/transcripts \(j\) in sample \(i\)
Covariates: value of environmental variable \(k\) in sample \(i\)
Offsets: sampling effort for species/transcripts \(j\) in sample \(i\)

Goals

understand environmental effects (regression, classification)
exhibit patterns of diversity (clustering, dimension reduction)
understand between-species interactions (covariance selection)

\(\rightsquigarrow\) In ecology: help understanding underlying mechanisms like dispersion, abiotic/biotic effects

Two illustrative examples

Microcosm (Mariadassou et al. 2023)

Study carried out at INRAE (“Domaine Expérimental du Pin”)

microbiotas of 44 lactating cows at 4 body site \(\times\) 4 time points
Abundances of \(\approx\) 1200 “species” (Amplicon Sequence Variants)

Tropical freshwater fishes (Trinidad Island)

Collab between Jeanne Tous (PhD) and Anne Magurran (Saint-Andrews)

16 sites (8 streams \(\times\) disturbed or not \(\times\) 19 dates
\(n=304\) measures of abundances of \(p=21\) species

Tropical freshwater fishes in Trinidad

Table of Counts/ Abundances (first rows)

Anabhart	Andipulc	Crenfren	Hemitaen	Hyporobi	Poecreti	Astybima	Coryriis
2	14	0	0	0	187	0	0
1	4	3	1	1	197	0	0
0	1	0	44	0	73	156	10

Raw counts

Log-scale

Tropical fishes

Table of covariates/ exogenous variables (first rows)

latitude	longitude	altitude	width	depth	volume	garbage	conductivity	O2	pH	temperature	turbidity	coarse_gravel	fine_gravel	cobble	sand	silt	boulders	canopy	season	year	stream	disturbance
10.70517	-61.31968	448	5.633333	11.00000	3.491128	9	114.7	8.00	0.00	23.8	2	6.667	10.000	30.000	3.333	3.333	18.333	1.619789	dry_start	2011	Lopinot	no
10.68905	-61.31958	387	10.866667	24.45714	4.123472	15	146.1	7.86	0.00	23.8	2	25.000	11.667	25.000	8.333	0.000	30.000	1.669007	dry_start	2011	Lopinot	yes
10.65030	-61.22068	59	13.200000	31.22581	4.314058	90	307.7	7.82	7.69	24.5	3	20.000	23.333	6.667	11.667	6.667	1.667	1.583577	dry_start	2011	LowerAripo	yes

PCA on covariates: no strong temporal/season effect

Tropical fishes

Table of covariates/ exogenous variables (first rows)

latitude	longitude	altitude	width	depth	volume	garbage	conductivity	O2	pH	temperature	turbidity	coarse_gravel	fine_gravel	cobble	sand	silt	boulders	canopy	season	year	stream	disturbance
10.70517	-61.31968	448	5.633333	11.00000	3.491128	9	114.7	8.00	0.00	23.8	2	6.667	10.000	30.000	3.333	3.333	18.333	1.619789	dry_start	2011	Lopinot	no
10.68905	-61.31958	387	10.866667	24.45714	4.123472	15	146.1	7.86	0.00	23.8	2	25.000	11.667	25.000	8.333	0.000	30.000	1.669007	dry_start	2011	Lopinot	yes
10.65030	-61.22068	59	13.200000	31.22581	4.314058	90	307.7	7.82	7.69	24.5	3	20.000	23.333	6.667	11.667	6.667	1.667	1.583577	dry_start	2011	LowerAripo	yes

PCA on covariates: spatial effect

Models for multivariate count data

If we were in a Gaussian world…

The general linear model (Mardia, Kent, and Bibby 1979) would be appropriate! For each sample \(i = 1,\dots,n\),

\[\underbrace{\mathbf{Y}_i}_{\text{abundances}} = \underbrace{\mathbf{x}_i^\top \mathbf{B}}_{\text{covariates}} + \underbrace{\mathbf{o}_i}_{\text{sampling effort}} + \boldsymbol\varepsilon_i, \quad \boldsymbol\varepsilon_i \sim \mathcal{N}(\mathbf{0}_p, \underbrace{\boldsymbol\Sigma}_{\text{between-species dependencies}})\]

null covariance \(\Leftrightarrow\) independence \(\rightsquigarrow\) uncorrelated species/transcripts do not interact

This model gives birth to Principal Component Analysis, Discriminant Analysis, Gaussian Graphical Models, Gaussian Mixture models and many others \(\dots\)

With count data…

There is no generic model for multivariate counts

Data transformation (log, \(\sqrt{}\)): quick and dirty
Non-Gaussian multivariate distributions (Inouye et al. 2017): do not scale to data dimension yet
Latent variable models: interaction occur in a latent (unobserved) layer

Poisson-lognormal Family
Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

PLN (Aitchison and Ho 1989) is a multivariate generalized linear model, where

the counts \(\mathbf{Y}_i\in\mathbb{N}^p\) are the response variables
the main effect is due to a linear combination of the covariates \(\mathbf{x}_i\in\mathbb{R}^d\)
a vector of offsets \(\mathbf{o}_i\in\mathbb{R}^p\) can be specified for each sample

\[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\underbrace{\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}}}_{{\boldsymbol\mu}_i},\boldsymbol\Sigma). \]

Typically, \(n\approx 10s \to 1000s\), \(p\approx 10s \to 1000s\), \(d \approx 1 \to 10\)

Properties: over-dispersion, arbitrary-signed covariances

mean: \(\mathbb{E}(Y_{ij}) = \exp \left( o_{ij} + \mathbf{x}_i^\top {\mathbf{B}}_{\cdot j} + \sigma_{jj}/2\right) > 0\)
variance: \(\mathbb{V}(Y_{ij}) = \mathbb{E}(Y_{ij}) + \mathbb{E}(Y_{ij})^2 \left( e^{\sigma_{jj}} - 1 \right) > \mathbb{E}(Y_{ij})\)
covariance: \(\mathrm{Cov}(Y_{ij}, Y_{ik}) = \mathbb{E}(Y_{ij}) \mathbb{E}(Y_{ik}) \left( e^{\sigma_{jk}} - 1 \right).\)

Natural extensions of PLN (1)

Various tasks of multivariate analysis

Dimension Reduction: rank constraint matrix \(\boldsymbol\Sigma\). (Chiquet, Mariadassou, and Robin 2018)

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]
Classification: maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.\]

Clustering: mixture model in the latent space (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \mid i \in k \sim \mathcal{N}(\boldsymbol\mu_k, \boldsymbol\Sigma_k), \quad \text{for unknown memberships}.\]

Network inference: sparsity constraint on inverse covariance. (Chiquet, Mariadassou, and Robin 2019)

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \boldsymbol\Omega^{-1}), \quad \|\boldsymbol\Omega \|_1 < c.\]

Natural extensions of PLN (2)

Recent/on-going extensions

Zero Inflation: an ZI version of PLN and PLN-network (Batardière et al. 2024)

\[\begin{array}{rrl} \text{PLN latent space} & \boldsymbol{Z}_i = (Z_{ij})_{j=1\dots p} & \sim \mathcal{N}(\mathbf{x}_i^\top \mathbf{B}, \mathbf{\Sigma}), \\[1.5ex] \text{excess of zeros} & \boldsymbol{W}_{i} = (W_{ij})_{j=1\dots p} & \sim \otimes_{j=1}^p \cal B(\pi_{ij}), \\[1.5ex] \text{observation space} & Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim^\text{indep} W_{ij}\delta_0 + (1-W_{ij})\mathcal{P} \left(\exp\{o_{ij} + Z_{ij}\}\right),\\ \end{array}\]

Time-Series analysis: Vector Auto-Regressive model (Batardiere, Kwon, and Chiquet 2024)

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma), \quad \mathbf{Z}_i| \mathbf{Z}_{i-1} = \boldsymbol{\Phi}\mathbf{Z}_{i-1} + \mathcal{N}({\boldsymbol\mu}_i^\varepsilon, \boldsymbol\Sigma^\varepsilon).\]

\({\boldsymbol\mu}_i^\varepsilon = {\boldsymbol\mu}_i - \boldsymbol{\Phi}{\boldsymbol\mu}_{i-1} \boldsymbol{\Phi}\) with
\({\boldsymbol\Sigma}^\varepsilon = {\boldsymbol\Sigma} - \boldsymbol{\Phi}{\boldsymbol\Sigma}_{i-1} \boldsymbol{\Phi}\)
\(\boldsymbol{\Phi}\) the positive-definite auto-correlation matrix such as \({\boldsymbol\Sigma}^\varepsilon \succeq 0\).

Estimation

Latent-variable models

Estimate \(\theta = (\mathbf{B}, {\boldsymbol\Sigma})\), predict the \(\mathbf{Z}_i\), while the model marginal likelihood is

\[\begin{equation*} p_\theta(\mathbf{Y}_i) = \int_{\mathbb{R}_p} \prod_{j=1}^p p_\theta(Y_{ij} | Z_{ij}) \, p_\theta(\mathbf{Z}_i) \mathrm{d}\mathbf{Z}_i \end{equation*}\]

Direct approach

Numerical integration (Aitchison and Ho 1989): limited to a couple of variables
Gibbs Markov chain Monte Carlo (MCMC) sampling (Tikhonov et al. 2020). Last for ever
Stochastic gradient ascent (Baey et al. 2023): promising but too generic

Expectation-Maximization

\[\begin{equation*} \log p_\theta(\mathbf{Y}) = \mathbb{E}_{p_\theta(\mathbf{Z}\,|\,\mathbf{Y})} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[p_\theta(\mathbf{Z}\,|\,\mathbf{Y})], \text{ with } \mathcal{H}(p) = -\mathbb{E}_p(\log(p)). \end{equation*}\]

EM requires to evaluate (some moments of) \(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\), but there is no close form!

Variational Inference

Use a proxy \(q_\psi\) of \(p_\theta(\mathbf{Z}\,|\,\mathbf{Y})\) minimizing a divergence in a class \(\mathcal{Q}\) (e.g, Küllback-Leibler)

\[\begin{equation*} q_\psi(\mathbf{Z})^\star \in \arg\min_{q\in\mathcal{Q}} D\left(q(\mathbf{Z}), p(\mathbf{Z} | \mathbf{Y})\right), \, \text{e.g.}, D(.,.) = KL(., .) = \mathbb{E}_{q_\psi}\left[\log \frac{q(z)}{p(z)}\right]. \end{equation*}\]

and maximize the ELBO (Evidence Lower BOund)

\[\begin{equation*} J(\theta, \psi) = \log p_\theta(\mathbf{Y}) - KL[q_\psi (\mathbf{Z}) || p_\theta(\mathbf{Z} | \mathbf{Y})] = \mathbb{E}_{\psi} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[q_\psi(\mathbf{Z})] = \frac{1}{n} \sum_{i = 1}^n J_i(\theta, \psi_i) \end{equation*}\]

Resulting Variational EM

VE step: optimize \(\boldsymbol{\psi}\) (can be written individually) \[\begin{equation*} \psi_i^{(h)} = \arg \max J_{i}(\theta^{(h)}, \psi_i) \left( = \arg\min_{q_i} KL[q_i(\mathbf{Z}_i) \,||\, p_{\theta^h}(\mathbf{Z}_i\,|\,\mathbf{Y}_i)] \right) \end{equation*}\]
M step: optimize \(\theta\) \[\theta^{(h)} = \arg\max \frac{1}{n}\sum_{i=1}^{n}J_{Y_i}(\theta, \psi_i^{(h)})\]

Implementations

See https://pln-team.github.io/repositories/: documentation, vignettes, examples, etc.

`PLNmodels` Medium scale problems (R/C++ package)

Up to thousands of sites ( \(n \approx 1000s\) ), hundreds of species ( \(p\approx 100s\) )

V-EM: exact M-step + optimization for VE-stem
algorithm: conservative convex separable approximations (Svanberg 2002)
implementation: NLopt nonlinear-optimization library (Johnson 2011)

`pyPLNmodels` Large scale problems (Python/Pytorch module)

Up to \(n \to 100,000\) and \(p\approx 10,000s\)

V-EM: exact M-step + optimization for VE-stem
algorithm: Rprop (gradient sign + adaptive variable-specific update) (Riedmiller and Braun 1993)
implementation: torch with GPU auto-differentiation (Paszke et al. 2017)

Other optimisation approaches

MCMC techniques on reduced sampling space

Composite likelihood (Stoehr and Robin 2024)
Importance sampling in PLN-PCA (Batardière et al. 2025)

Variational Auto-Encoders and PLN-PCA

The Decoder is the generative model \(p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\)
The Encoder approximates the posterior distribution with \(q_\psi(\mu_i,\sigma^2_i)\)
VAE maximize a lower bound of the marginal \(\log p_{\theta}(\mathbf{Y}_i)\)

Variational estimator: properties

M-estimation framework (Van der Vaart 2000)

Let \(\bar{J}_n \; : \theta \mapsto \frac{1}{n}\sum_{i=1}^{n} \arg \max_{\psi}J_i(\theta, \hat{\psi}_i) \stackrel{\Delta}{=} \frac{1}{n}\sum_{i=1}^{n} \bar{J}_i(\theta)\)

Theorem (Batardière, Chiquet, and Mariadassou 2024)

Let \(\hat{\theta}^{\text{ve}} = \arg\max_{\theta} \bar{J}_{n}(\theta)\). If \(\bar{J}_n\) is smooth enough (e.g. when \(\theta\) and \(\psi_i\) are restricted to compact sets), \[ \sqrt{n}(\hat{\theta}^{\text{ve}} - \bar{\theta}) \xrightarrow[]{d} \mathcal{N}(0, V(\bar{\theta})), \quad \text{where } V(\theta) = C(\theta)^{-1} D(\theta) C(\theta)^{-1} \] for \(C(\theta) = \mathbb{E}[\nabla_{\theta\theta} \bar{J}(\theta) ]\) and \(D(\theta) = \mathbb{E}\left[(\nabla_{\theta} \bar{J}(\theta)) (\nabla_{\theta} \bar{J}(\theta)^\intercal \right]\)

Caveat

Open question: \(\bar{\theta} = \theta^\star\) ? No formal results as \(\bar{J}\) is untractable but numerical evidence suggests so.
For \(\theta = (\mathbf{B}, \boldsymbol\Omega)\), \(\hat{C}_n\) requires the inversion of \(n\) matrices with \(dp^3\) rows/columns…
Assume blockwise diagonal matrix \(\hat{C}_n\) by neglecting the cross-terms between \(\mathbf{B}\) and \(\boldsymbol\Sigma\).

Illustration tropical freshwater fishes

\(n=304\) measures of abundances of \(p=21\) species in 16 sites (8 streams \(\times\) disturbed or not \(\times\) 19 dates

Multivariate regression (1)

Assess environnemental/abiotic effects

NR_PLN            <- PLN(Abundance ~ 1, data = NR_DATA)
NR_PLN_Stream     <- PLN(Abundance ~ 0 + stream, data = NR_DATA)
NR_PLN_Stream_PC  <- PLN(Abundance ~ 0 + stream + PC1, data = NR_DATA)
NR_PLN_Stream_PC2 <- PLN(Abundance ~ 0 + stream + PC1 + PC2, data = NR_DATA)

Model selection criteria
model	nb_param	loglik	BIC	ICL
PLN ~ 1	230	-9393.133	-10050.592	-17212.93
PLN ~ stream	370	-8292.598	-9350.248	-14740.56
PLN ~ stream + PC1	390	-8206.340	-9321.161	-14573.90
PLN ~ stream + PC1 + PC2	410	-8192.004	-9363.995	-14760.17

We continue with NR_PLN_Stream_PC.

Multivariate regression (2)

Coefficients: \(\hat{\mathbf{B}}\)

Multivariate regression (2)

Coefficients: variance \(\hat{\mathbb{V}}(\hat{\mathbf{B}})\) (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (2)

Coefficients: \(z\)-score (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (3)

Mean effect: \(\hat{\boldsymbol{\mu}} = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}}\)

Multivariate regression (4)

Fit: \(\log \hat{\mathbb{E}}(\mathbf{Y}) = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}} + \mathbf{M} + \frac 12 \mathbf{S^2}\)

Multivariate regression (4)

Fit (per site)

Dimension reduction (1)

Exhibit patterns of diversity

\[\quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]

Model selection criteria for rank (number of PC)

model	nb_param	loglik	BIC	ICL
PLN ~ stream + PC1	390	-8206.340	-9321.161	-14573.897
PLNPCA ~ PC1	159	-6565.395	-7019.898	-6699.009
PLNPCA ~ stream + PC1	270	-5881.358	-6653.157	-6562.650

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (3)

Visualization: correct for stream effect

Dimension reduction (4)

Coefficients (correct for Stream): \(\hat{\mathbf{B}}^\text{PLNPCA}\)

Dimension reduction (4)

Coefficients: \(\hat{\mathbf{B}}^\text{PLN}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLNPCA}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLN}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLNPCA}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLN}\)

Discriminant Analysis (1)

Assess environnemental/abiotic effects by classification

Objective

Maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021) \[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}. \]

\(\rightsquigarrow\) Find the linear combination(s) \(\mathbf{Z} u\) maximizing separation between groups

Solution

Find the first eigenvectors of \(\mathbf{\Sigma}_w^{-1} \mathbf{\Sigma}_b\) where

\(\mathbf{\Sigma}_w\) is the within-group variance matrix, i.e. the unbiased estimated of \({\boldsymbol\Sigma}\):
\(\mathbf{\Sigma}_b\) is the between-group variance matrix, estimated from \(\boldsymbol{\mu_k}\).

Discriminant Analysis (2)

PLN-LDA is a “special” PLN with a grouping variable added to the design matrix + post-treatment

NR_PLNLDA <- PLNLDA(Abundance ~ PC1, grouping = stream, data = NR_DATA)

\(\rightsquigarrow\) Group means are similar to PLN regression coefficients associated with stream

First two discriminant axes

Correlation with original variables

Discriminant Analysis (4)

Group means (PLN-LDA vs PLN)

Classifier

Supervised-learning: provide a classifier to predict group newly observed site:

set.seed(1969)
train_set <- sample.int(nrow(NR_DATA), 200)
test_set  <- setdiff(1:nrow(NR_DATA), train_set)
NR_train <- NR_DATA[train_set, ]
NR_test   <- NR_DATA[test_set, ]
pred_stream <- predict(NR_PLNLDA, newdata = NR_test, type = "response")
aricode::ARI(pred_stream, NR_test$stream)

[1] 0.8908636

Association Network (1)

Inference

NR_PLNNET_all <- PLNnetwork(Abundance ~ PC1, data = NR_DATA, control = PLNnetwork_param(penalize_diagonal = FALSE))
NR_PLNNET_all$stability_selection()
NR_PLNNET_Stream_all <- PLNnetwork(Abundance ~ PC1 + stream, data = NR_DATA, control = PLNnetwork_param(penalize_diagonal = FALSE))
NR_PLNNET_Stream_all$stability_selection()

Chosing the network’s density

Without stream as covariate

With stream as covariate

Association Network (2)

Network selected without stream as covariate

Association Network (2)

Network selected with stream as covariate

Association Network (2)

Precision/Covariance matrices (without stream)

Precision/Covariance matrices (with stream)

Association Network (3)

Model selection criteria

model	nb_param	loglik	BIC	ICL
PLN ~ stream + PC1	390	-8206.340	-9321.161	-14573.897
PLNPCA ~ PC1	159	-6565.395	-7019.898	-6699.009
PLNPCA ~ stream + PC1	270	-5881.358	-6653.157	-6562.650
PLNNET ~ PC1	77	-9619.069	-9839.174	-16524.285
PLNNET ~ stream + PC1	211	-8312.991	-8916.138	-14484.045

Fit

Zero-Inflation (1)

Modeling of the pure zero component

\[\begin{align*} \pi_{ij} & = \pi \in [0,1] & \text{(single global parameter)} \\ \pi_{ij} & = \pi_j \in [0,1] & \text{(species dependent)} \\ \pi_{ij} & = \pi_i \in [0,1] & \text{(site dependent)} \\ \pi_{ij} & = \mathrm{logit}^{-1}( \boldsymbol X^{0}\boldsymbol B^0)_{ij}, ~ \boldsymbol X^0 \in \mathbb R^{n \times d_0}, ~ \boldsymbol B^0 \in \mathbb R^{d_0\times p} & \text{(site covariates)} \\ \pi_{ij} & = \mathrm{logit}^{-1}(\bar{\boldsymbol{B}}^0 \bar{\boldsymbol{X}}^{0})_{ij}, ~ \bar{\boldsymbol{B}}^0 \in \mathbb R^{n \times d_0}, ~ \bar{\boldsymbol X}^0 \in \mathbb R^{d_0\times p} & \text{(species covariates)} \end{align*}\]

Zero-Inflation (2)

Fit models with covariate effect in ZI/PLN part

NR_ZIPLN  <- ZIPLN(Abundance ~ PC1 | 0 + stream,  data = NR_DATA)
NR_ZIPLNNET_all <- ZIPLNnetwork(Abundance ~ PC1 | 0 + stream,  data = NR_DATA, control=ZIPLNnetwork_param(penalize_diagonal=FALSE))
NR_ZIPLNNET <- NR_ZIPLNNET_all$getBestModel("BIC")

\(\rightsquigarrow\) Stream effect now caught by the ZI component

Model selection criteria

model	nb_param	loglik	BIC	ICL
PLN ~ stream + PC1	390	-8206.340	-9321.161	-14573.897
PLNPCA ~ PC1	159	-6565.395	-7019.898	-6699.009
PLNPCA ~ stream + PC1	270	-5881.358	-6653.157	-6562.650
PLNNET ~ PC1	77	-9619.069	-9839.174	-16524.285
PLNNET ~ stream + PC1	211	-8312.991	-8916.138	-14484.045
ZIPLN ~ PC1 \| stream	290	-8548.828	-9377.797	-14456.248
ZIPLNNET ~ PC1 \| stream	117	-8587.148	-8921.594	-14322.612

Zero-Inflation (2)

Fitted probability of zero-inflation: \(\hat{\boldsymbol{\pi}}\)

\(\rightsquigarrow\) The ZI component can be used as a model for presence/absence (large part explained by the stream)

Zero-Inflation (2)

Coefficient in ZI part (Stream)

Conclusion

Take-home message

PLN = generic model for multivariate count data analysis
Flexible modeling of the covariance structure, allows for covariates
Efficient Variational EM algorithms
Hopefully user-friendly tools available

https://computo.sfds.asso.fr, a journal promoting reproducible research in ML and stat.

References

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.

Baey, Charlotte, Maud Delattre, Estelle Kuhn, Jean-Benoist Leger, and Sarah Lemler. 2023. “Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models.” In International Conference on Machine Learning, 1430–53. PMLR.

Batardiere, Bastien, Joon Kwon, and Julien Chiquet. 2024. “pyPLNmodels: A Python Package to Analyze Multivariate High-Dimensional Count Data.” Journal of Open Source Software 9 (104): 6969.

Batardière, Bastien, Julien Chiquet, François Gindraud, and Mahendra Mariadassou. 2024. “Zero-Inflation in the Multivariate Poisson Lognormal Family.” arXiv Preprint. https://arxiv.org/abs/2405.14711.

Batardière, Bastien, Julien Chiquet, Joon Kwon, and Julien Stoehr. 2025. “Importance Sampling-Based Gradient Method for Dimension Reduction in Poisson Log-Normal Model.” Electronic Journal of Statistics.

Batardière, Bastien, Julien Chiquet, and Mahendra Mariadassou. 2024. “Evaluating Parameter Uncertainty in the Poisson Lognormal Model with Corrected Variational Estimators.” arXiv Preprint arXiv:2411.08524.

Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational Inference for Probabilistic Poisson PCA.” The Annals of Applied Statistics 12: 2674–98. http://dx.doi.org/10.1214/18-AOAS1177.

———. 2019. “Variational Inference for Sparse Network Reconstruction from Count Data.” In Proceedings of the 19th International Conference on Machine Learning (ICML 2019).

———. 2021. “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.” Frontiers in Ecology and Evolution 9. https://doi.org/10.3389/fevo.2021.588292.

Inouye, David I, Eunho Yang, Genevera I Allen, and Pradeep Ravikumar. 2017. “A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.” Wiley Interdisciplinary Reviews: Computational Statistics 9 (3).

Johnson, Steven G. 2011. The NLopt Nonlinear-Optimization Package. http://ab-initio.mit.edu/nlopt.

Mardia, K. V., J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic press.

Mariadassou, Mahendra, Laurent X Nouvel, Fabienne Constant, Diego P Morgavi, Lucie Rault, Sarah Barbey, Emmanuelle Helloin, et al. 2023. “Microbiota Members from Body Sites of Dairy Cows Are Largely Shared Within Individual Hosts Throughout Lactation but Sharing Is Limited in the Herd.” Animal Microbiome 5 (1): 32.

Paszke, Adam, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. “Automatic Differentiation in Pytorch.”

Riedmiller, Martin, and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” In IEEE International Conference on Neural Networks, 586–91. IEEE.

Stoehr, Julien, and Stephane S Robin. 2024. “Composite Likelihood Inference for the Poisson Log-Normal Model.” arXiv Preprint arXiv:2402.14390.

Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” SIAM Journal on Optimization 12 (2): 555–73.

Tikhonov, Gleb, Øystein H Opedal, Nerea Abrego, Aleksi Lehikoinen, Melinda MJ de Jonge, Jari Oksanen, and Otso Ovaskainen. 2020. “Joint Species Distribution Modelling with the r-Package Hmsc.” Methods in Ecology and Evolution 11 (3): 442–47.

Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.

Anabhart	Andipulc	Crenfren	Hemitaen	Hyporobi	Poecreti	Astybima	Coryriis
2	14	0	0	0	187	0	0
1	4	3	1	1	197	0	0
0	1	0	44	0	73	156	10

Anabhart	Andipulc	Crenfren	Hemitaen	Hyporobi	Poecreti	Astybima	Coryriis
2	14	0	0	0	187	0	0
1	4	3	1	1	197	0	0
0	1	0	44	0	73	156	10

Poisson lognormal models for multivariate count data analysis

Motivations

Multivariate count data

Routinely gathered in ecology/microbiology/genomics

Data tables

Goals

Two illustrative examples

Microcosm (Mariadassou et al. 2023)

Tropical freshwater fishes (Trinidad Island)

Tropical freshwater fishes in Trinidad

Table of Counts/ Abundances (first rows)

Tropical fishes

Table of covariates/ exogenous variables (first rows)

PCA on covariates: no strong temporal/season effect

Tropical fishes

Table of covariates/ exogenous variables (first rows)

PCA on covariates: spatial effect

Models for multivariate count data

If we were in a Gaussian world…

With count data…

Poisson-lognormal Family Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

Properties: over-dispersion, arbitrary-signed covariances

Natural extensions of PLN (1)

Various tasks of multivariate analysis

Natural extensions of PLN (2)

Recent/on-going extensions

Estimation

Latent-variable models

Direct approach

Expectation-Maximization

Variational Inference

Resulting Variational EM

Implementations

PLNmodels Medium scale problems (R/C++ package)

pyPLNmodels Large scale problems (Python/Pytorch module)

Other optimisation approaches

MCMC techniques on reduced sampling space

Variational Auto-Encoders and PLN-PCA

Variational estimator: properties

M-estimation framework (Van der Vaart 2000)

Theorem (Batardière, Chiquet, and Mariadassou 2024)

Caveat

Illustration tropical freshwater fishes

Multivariate regression (1)

Multivariate regression (2)

Coefficients: \(\hat{\mathbf{B}}\)

Multivariate regression (2)

Coefficients: variance \(\hat{\mathbb{V}}(\hat{\mathbf{B}})\) (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (2)

Coefficients: \(z\)-score (keep clustering on \(\hat{\mathbf{B}}\))

Multivariate regression (3)

Mean effect: \(\hat{\boldsymbol{\mu}} = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}}\)

Multivariate regression (4)

Fit: \(\log \hat{\mathbb{E}}(\mathbf{Y}) = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}} + \mathbf{M} + \frac 12 \mathbf{S^2}\)

Multivariate regression (4)

Fit (per site)

Dimension reduction (1)

Model selection criteria for rank (number of PC)

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (2)

Visualization: do not correct for stream effect

Dimension reduction (3)

Visualization: correct for stream effect

Dimension reduction (4)

Coefficients (correct for Stream): \(\hat{\mathbf{B}}^\text{PLNPCA}\)

Dimension reduction (4)

Coefficients: \(\hat{\mathbf{B}}^\text{PLN}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLNPCA}\)

Dimension reduction (5)

Mean effect: \(\hat{\boldsymbol{\mu}}^\text{PLN}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLNPCA}\)

Dimension reduction (6)

Covariance/Correlation (singular): \(\hat{\boldsymbol{\Sigma}}^\text{PLN}\)

Discriminant Analysis (1)

Objective

Poisson-lognormal Family
Model, inference

`PLNmodels` Medium scale problems (R/C++ package)

`pyPLNmodels` Large scale problems (Python/Pytorch module)

Anabhart	Andipulc	Crenfren	Hemitaen	Hyporobi	Poecreti	Astybima	Coryriis
2	14	0	0	0	187	0	0
1	4	3	1	1	197	0	0
0	1	0	44	0	73	156	10