Zero-inflation in the Poisson lognormal family (Batardière et al. 2024)

Rencontres Mathématiques de Rouen

Bastien Batardière

PhD - UMR MIA Paris-Saclay

Julien Chiquet

MIA Paris-Saclay, INRAE

François Gindraud

IR INRIA

Mahendra Mariadassou

MaIAGE, INRAE

June 21, 2024

Motivation: multivariate count data

Routinely gathered in ecology/microbiology/genomics

Data tables

Abundances: read counts of species/transcripts \(j\) in sample \(i\)
Covariates: value of environmental variable \(k\) in sample \(i\)
Offsets: sampling effort for species/transcripts \(j\) in sample \(i\)

Goals

understand environmental effects (regression, classification)
exhibit patterns of diversity (clustering, dimension reduction)
understand between-species interactions (covariance selection)

Illustration: microcosm data set (Mariadassou et al. 2023)

Study carried out at INRAE “Domaine Expérimental du Pin”
microbiotas of 44 lactating cows at 4 body site (mouth, nose, vagina, milk) \(\times\) 4 time points
Abundances of \(\approx\) 1200 “species” (Amplicon Sequence Variants) with known taxonomy

Table of Counts

Retrieve microcosm data

microcosm <- readRDS("microcosm_reduced.rds")
microcosm$Abundance <- microcosm$Abundance[, colMeans(microcosm$Abundance > 0) > 0.05]
microcosm <- microcosm[rowSums(microcosm$Abundance) > 0, ]
microcosm$site_time <- droplevels(microcosm$site_time)
sum(microcosm$Abundance == 0) / length(microcosm$Abundance)

[1] 0.9013119

Still 90% of zeros !

Data after total-counts normalization

Log-scale

Background on
the Poisson-lognormal Family
Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

PLN (Aitchison and Ho 1989) is a multivariate generalized linear model, where

the counts \(\mathbf{Y}_i\in\mathbb{N}^p\) are the response variables
the main effect is due to a linear combination of the covariates \(\mathbf{x}_i\in\mathbb{R}^d\)
a vector of offsets \(\mathbf{o}_i\in\mathbb{R}^p\) can be specified for each sample

\[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\underbrace{\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}}}_{{\boldsymbol\mu}_i},\boldsymbol\Sigma). \]

Typically, \(n\approx 10s \to 1000s\), \(p\approx 10s \to 1000s\), \(d \approx 1 \to 10\)

Properties: over-dispersion, arbitrary-signed covariances

mean: \(\mathbb{E}(Y_{ij}) = \exp \left( o_{ij} + \mathbf{x}_i^\top {\mathbf{B}}_{\cdot j} + \sigma_{jj}/2\right) > 0\)
variance: \(\mathbb{V}(Y_{ij}) = \mathbb{E}(Y_{ij}) + \mathbb{E}(Y_{ij})^2 \left( e^{\sigma_{jj}} - 1 \right) > \mathbb{E}(Y_{ij})\)
covariance: \(\mathrm{Cov}(Y_{ij}, Y_{ik}) = \mathbb{E}(Y_{ij}) \mathbb{E}(Y_{ik}) \left( e^{\sigma_{jk}} - 1 \right).\)

Natural extensions of PLN

Various tasks of multivariate analysis

Dimension Reduction: rank constraint matrix \(\boldsymbol\Sigma\). (Chiquet, Mariadassou, and Robin 2018)

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]
Classification: maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.\]

Clustering: mixture model in the latent space (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \mid i \in k \sim \mathcal{N}(\boldsymbol\mu_k, \boldsymbol\Sigma_k), \quad \text{for unknown memberships}.\]

Network inference: sparsity constraint on inverse covariance. (Chiquet, Mariadassou, and Robin 2019)

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \boldsymbol\Omega^{-1}), \quad \|\boldsymbol\Omega \|_1 < c.\]

Estimation

Estimate \(\theta = (\mathbf{B}, {\boldsymbol\Sigma})\), predict the \(\mathbf{Z}_i\), while the model marginal likelihood is

\[\begin{equation*} p_\theta(\mathbf{Y}_i) = \int_{\mathbb{R}_p} \prod_{j=1}^p p_\theta(Y_{ij} | Z_{ij}) \, p_\theta(\mathbf{Z}_i) \mathrm{d}\mathbf{Z}_i \end{equation*}\]

Direct approach

Numerical integration (Aitchison and Ho 1989): limited to a couple of variables
Gibbs Markov chain Monte Carlo (MCMC) sampling (Tikhonov et al. 2020). Last for ever
Stochastic gradient ascent (Baey et al. 2023): promising but too generic

Expectation-Maximization

\[\begin{equation*} \log p_\theta(\mathbf{Y}) = \mathbb{E}_{p_\theta(\mathbf{Z}\,|\,\mathbf{Y})} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[p_\theta(\mathbf{Z}\,|\,\mathbf{Y})], \text{ with } \mathcal{H}(p) = -\mathbb{E}_p(\log(p)). \end{equation*}\]

EM requires to evaluate (some moments of) \(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\), but there is no close form!

MCMC techniques on sampling space reducedwith composite likelihood (Stoehr and Robin 2024)
Variational approaches (e.g. Blei, Kucukelbir, and McAuliffe 2017): use a surrogate of \(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\)

Variational Inference

Use a proxy \(q_\psi\) of \(p_\theta(\mathbf{Z}\,|\,\mathbf{Y})\) minimizing a divergence in a class \(\mathcal{Q}\) (e.g, Küllback-Leibler)

\[\begin{equation*} q_\psi(\mathbf{Z})^\star \in \arg\min_{q\in\mathcal{Q}} D\left(q(\mathbf{Z}), p(\mathbf{Z} | \mathbf{Y})\right), \, \text{e.g.}, D(.,.) = KL(., .) = \mathbb{E}_{q_\psi}\left[\log \frac{q(z)}{p(z)}\right]. \end{equation*}\]

and maximize the ELBO (Evidence Lower BOund)

\[\begin{equation*} J(\theta, \psi) = \log p_\theta(\mathbf{Y}) - KL[q_\psi (\mathbf{Z}) || p_\theta(\mathbf{Z} | \mathbf{Y})] = \mathbb{E}_{\psi} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[q_\psi(\mathbf{Z})] = \frac{1}{n} \sum_{i = 1}^n J_i(\theta, \psi_i) \end{equation*}\]

Resulting Variational EM

VE step: optimize \(\boldsymbol{\psi}\) (can be written individually) \[\begin{equation*} \psi_i^{(h)} = \arg \max J_{i}(\theta^{(h)}, \psi_i) \left( = \arg\min_{q_i} KL[q_i(\mathbf{Z}_i) \,||\, p_{\theta^h}(\mathbf{Z}_i\,|\,\mathbf{Y}_i)] \right) \end{equation*}\]
M step: optimize \(\theta\) \[\theta^{(h)} = \arg\max \frac{1}{n}\sum_{i=1}^{n}J_{Y_i}(\theta, \psi_i^{(h)})\]

Implementations

See https://pln-team.github.io/repositories/

`PLNmodels` Medium scale problems (R/C++ package)

V-EM: exact M-step + optimization for VE-stem
algorithm: conservative convex separable approximations (Svanberg 2002)
implementation: NLopt nonlinear-optimization library (Johnson 2011)

\(\rightsquigarrow\) Up to thousands of sites ( \(n \approx 1000s\) ), hundreds of species ( \(p\approx 100s\) )

`pyPLNmodels` Large scale problems (Python/Pytorch module)

V-EM: exact M-step + optimization for VE-stem
algorithm: Rprop (gradient sign + adaptive variable-specific update) (Riedmiller and Braun 1993)
implementation: torch with GPU auto-differentiation (Paszke et al. 2017)

\(\rightsquigarrow\) Up to \(n \to 100,000\) and \(p\approx 10,000s\)

PLN-PCA and Variational Auto-Encoders

The Decoder is the generative model \(p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\)
The Encoder perform performs variational by approximating the posterior distribution with \(q_\psi(\mu_i,\sigma^2_i)\)
VAE maximize a lower bound of the marginal \(\log p_{\theta}(\mathbf{Y})\)

\[ \log p_{\theta}(\mathbf{Y}_i) \geq \mathbb{E}_{q_{\psi}(\mathbf{Z}_i|\mathbf{X}_i)}\left[\log p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\right] - D_{KL}(q_{\psi}(\mathbf{Z}_i|\mathbf{Y}_i)||p_{\theta}(\mathbf{Z}_i)) \]

Variational Estimator: properties

M-estimation framework (Van der Vaart 2000)

Let \(\hat{\psi}_i = \hat{\psi}_i(\theta, \mathbf{Y}_i) = \arg \max_{\psi} J_i(\theta, \psi)\) and consider the stochastic map \(\bar{J}_n\) defined by

\[\bar{J}_n \; : \quad \theta \mapsto \frac{1}{n}\sum_{i=1}^{n} J_i(\theta, \hat{\psi}_i) \stackrel{\Delta}{=} \frac{1}{n}\sum_{i=1}^{n} \bar{J}_i(\theta)\]

M-estimation suggests that \(\hat{\theta}^{\text{ve}} = \arg\max_{\theta} \bar{J}_{n}(\theta)\) should converge to \(\bar{\theta} = \arg\max_{\theta} \bar{J}(\theta)\) where \(\bar{J}(\theta) = \mathbb{E}_{\theta^\star}[\bar{J}_Y(\theta)] = \mathbb{E}_{\theta^\star}[J_Y(\theta, \hat{\psi}(\theta, Y))]\).

Theorem (Westling and McCormick 2015)

In this line, (Westling and McCormick 2015) show that under regularity conditions ensuring that \(\bar{J}_n\) is smooth enough (e.g. when \(\theta\) and \(\psi_i\) are restricted to compact sets), \[ \hat{\theta}^{\text{ve}} \xrightarrow[n \to +\infty]{a.e.} \bar{\theta} \]

Open question: \(\bar{\theta} = \theta^\star\) ? No formal results as \(\bar{J}\) is untractable but numerical evidence suggests so.

Variance: naïve approach

Do as if \(\hat{\theta}^{\text{ve}}\) was a MLE and \(\bar{J}_n\) the log-likelihood.

Variational Fisher Information

For standard PLN, the Fisher information matrix is given by (from the Hessian of \(J\)) by

\[I_n(\hat{\theta}^{\text{ve}}) = \begin{pmatrix} \frac{1}{n}(\mathbf{I}_p \otimes \mathbf{X}^\top)\mathrm{diag}(\mathrm{vec}(\mathbf{A}))(\mathbf{I}_p \otimes \mathbf{X}) & \mathbf{0} \\ \mathbf{0} & \frac12\mathbf{\Omega}^{-1} \otimes \mathbf{\Omega}^{-1} \end{pmatrix}\]

where \(A_{ij} = \exp{M_{ij} + 1/2 S_{ij}}\).

Confidence intervals and coverage

\(\hat{\mathbb{V}}(B_{kj}) = [n (\mathbf{X}^\top \mathrm{diag}(\mathrm{vec}(\hat{A}_{.j})) \mathbf{X})^{-1}]_{kk}, \qquad \hat{\mathbb{V}}(\Omega_{kl}) = 2\hat{\Omega}_{kk}\hat{\Omega}_{ll}\)

The confidence intervals at level \(\alpha\) are given by

\(B_{kj} = \hat{B}_{kj} \pm \frac{q_{1 - \alpha/2}}{\sqrt{n}} \sqrt{\hat{\mathbb{V}}(B_{kj})}, \qquad \Omega_{kl} = \hat{\Omega}_{kl} \pm \frac{q_{1 - \alpha/2}}{\sqrt{n}} \sqrt{\hat{\mathbb{V}}(\Omega_{kl})}\).

Variance : sandwich estimator

Theorem (Westling and McCormick 2015)

Under additional regularity conditions (satisfied when \(\theta\) and \(\psi_i\) are restricted to compact sets), we have \[ \sqrt{n}(\hat{\theta}^{\text{ve}} - \bar{\theta}) \xrightarrow[]{d} \mathcal{N}(0, V(\bar{\theta})), \quad \text{where } V(\theta) = C(\theta)^{-1} D(\theta) C(\theta)^{-1} \] for \(C(\theta) = \mathbb{E}[\nabla_{\theta\theta} \bar{J}(\theta) ]\) and \(D(\theta) = \mathbb{E}\left[(\nabla_{\theta} \bar{J}(\theta)) (\nabla_{\theta} \bar{J}(\theta)^\intercal \right]\)

Practical computations (chain rule)

\[ \begin{aligned} \hat{C}_n(\theta) & = \frac{1}{n} \sum_{i=1}^n \left[ \nabla_{\theta\theta} J_i - \nabla_{\theta\psi_i} J_i (\nabla_{\psi_i\psi_i} J_i)^{-1} \nabla_{\theta\psi_i} J_i^\intercal \right](\theta, \hat{\psi}_i) \\ \hat{D}_n(\theta) & = \frac{1}{n} \sum_{i=1}^n \left[ \nabla_{\theta} J_i \nabla_{\theta} J_i^\intercal \right](\theta, \hat{\psi}_i) \end{aligned} \]

Caveat

For \(\theta = (\mathbf{B}, \boldsymbol\Omega)\), \(\hat{C}_n\) requires the inversion of \(n\) matrices with \((p^2 + p d)\) rows/columns…
In practice, jackknife resampling can be estimated the variance (Stoehr and Robin 2024)

Handling zeros in
multivariate count tables

A zero-inflated PLN

Mixture of PLN and Bernoulli distribution

Use two latent vectors \(\mathbf{W}_i\) and \(\mathbf{Z}_i\) to model excess of zeroes and dependence structure

\[\begin{array}{rrl} \text{PLN latent space} & \boldsymbol{Z}_i = (Z_{ij})_{j=1\dots p} & \sim \mathcal{N}(\mathbf{x}_i^\top \mathbf{B}, \mathbf{\Sigma}), \\[1.5ex] \text{excess of zeros} & \boldsymbol{W}_{i} = (W_{ij})_{j=1\dots p} & \sim \otimes_{j=1}^p \cal B(\pi_{ij}), \\[1.5ex] \text{observation space} & Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim^\text{indep} W_{ij}\delta_0 + (1-W_{ij})\mathcal{P} \left(\exp\{o_{ij} + Z_{ij}\}\right),\\ \end{array}\]

\(\rightsquigarrow\) Better handling of zeros +additional interpretable parameters

Basic properties

Letting \(A_{ij} \triangleq \exp \left( o_{ij} + \mu_{ij} + \sigma_{jj}/2\right)\) with \(\mu_{ij} = \mathbf{x}_i^\intercal \mathbf{B}_j\), then

\(\mathbb{E}\left(Y_{ij}\right) = (1-\pi_{ij}) A_{ij} \leq A_{ij}\) (PLN’s mean),
\(\mathbb{V}\left(Y_{ij}\right) = (1-\pi_{ij})A_{ij} + (1-\pi_{ij})A_{ij}^2 \left( e^{\sigma_{jj}} - (1-\pi_{ij}) \right)\).

Standard mean-field

Variational approximation breaks all dependencies

\[\begin{equation*} p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) \triangleq q_{\psi_i}(\mathbf{Z}_i) q_{\psi_i}(\mathbf{W}_i) = \otimes_{j=1}^p q_{\psi_i}(Z_{ij}) q_{\psi_i}(W_{ij}) \end{equation*}\]

with Gaussian and Bernoulli distributions for \(Z_{ij}\) and \(W_{ij}\), then

\[\begin{equation*} q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) = \otimes_{j=1}^p \mathcal N\left( M_{ij}, S_{ij}^2\right) \mathcal B\left(\rho_{ij} \right) \end{equation*}\]

Variational lower bound

Let \(\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Sigma})\) and \(\psi= (\mathbf{M}, \mathbf{S}, \mathbf{R})\), then

\[\begin{align*} J(\theta, \psi) & = \log p_\theta(\mathbf{Y}) - KL(p_\theta(. | \mathbf{Y}) \| q_\psi(.) ) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta(\mathbf{Z}, \mathbf{W}, \mathbf{Y}) - \mathbb{E}_{q_{\psi}} \log q_\psi(\mathbf{Z}, \mathbf{W}) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Y} | \mathbf{Z}, \mathbf{W}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Z}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{W}) \\ & \qquad - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{Z}) - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{W}) \\ \end{align*}\]

Property: concave in each element of \(\theta, \psi\).

Sparse regularization

Recall that \(\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Omega} = \mathbf{\Sigma}^{-1})\). Sparsity allows to control the number of parameters:

\[\begin{equation*} \arg\min_{\theta, \psi} J(\theta, \psi) + \lambda_1 \| \mathbf{B} \|_1 + \lambda_2 \| \mathbb{\Omega} \|_1 \color{#ddd}{ \left( + \lambda_1 \| \mathbf{B}^0 \|_1 \right)} \end{equation*}\]

Alternate optimization

(Stochastic) Gradient-descent on \(\mathbf{B}^0, \mathbf{M}, \mathbf{S}\)
Closed-form for posterior probabilities \(\mathbf{R}\)
Inverse covariance \(\mathbf{\Omega}\)
- if \(\lambda_2=0\), \(\hat{\mathbf{\Sigma}} = n^{-1} \left[ (\mathbf{M} - \mathbf{XB})^\top(\mathbf{M} - \mathbf{XB}) + \bar{\mathbf{S}}^2 \right]\)
- if \(\lambda_2 > 0\), \(\ell_1\) penalized MLE ( \(\rightsquigarrow\) Graphical-Lasso with \(\hat{\mathbf{\Sigma}}\) as input)
PLN regression coefficient \(\mathbf{B}\)
- if \(\lambda_1=0\), \(\hat{\mathbf{B}} = [\mathbf{X}^\top \mathbf{X}]^{-1} \mathbf{X}^\top \mathbf{M}\)
- if \(\lambda_1 > 0\), vectorize and solve a \(\ell_1\) penalized least-squared problem

Initialize With univariate zero-inflated Poisson regression models

Enhancing variational approximation (1)

Two paths of improvements to break less dependencies between the latent variables:

\[p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q(\mathbf{Z}_i, \mathbf{W}_i) \triangleq \left\{\begin{array}{l} \prod_j q(W_{ij} | Z_{ij}) q(Z_ {ij}) \\[1ex] \prod_j q(Z_{ij} | W_{ij}) q(W_{ij}) \\ \end{array}\right.\]

The \(W|Z,Y\) path

One can show that

\[ W_{ij}| Y_{ij},Z_{ij} \sim\mathcal B \left( \frac{\pi_{ij}}{\pi_{ij} + (1-\pi_{ij})\exp(-Z_{ij})}\right) \boldsymbol 1_{\{Y_{ij} = 0\}} \]

Sadly, the resulting ELBO involves the untractable entropy term \(\tilde{\mathbb{E}}[\log q_{\psi}(\mathbf{W} | \mathbf{Z})]\)

\(\rightsquigarrow\) requires computing \(\tilde{\mathbb{E}}\left[ -\frac{\log(1+\exp(-U))}{1+\exp(-U)} \right]\) for arbitrary univariate Gaussians \(U\)

Enhancing variational approximation (2)

The \(Z|W,Y\) path

Since \(W_{ij}\) only takes two values, the dependence between \(Z_{ij}\) and \(W_{ij}\) can easily be highlighted:

\[Z_{ij} | W_{ij}, Y_{ij} = \left(Z_{ij}|Y_{ij}, W_{ij} = 1 \right)^{ W_{ij}}\left(Z_{ij}|Y_{ij}, W_{ij} = 0 \right)^{1- W_{ij}}.\] Then, \(p(Z_{ij}| Y_{ij}, W_{ij} = 1) = p(Z_{ij} | (W_{ij} = 1) = p(Z_{ij})\) by independence of \(Z_{ij}\) and \(W_{ij}\).

\(\rightsquigarrow\) Only an approximation of \(Z_{ij} | Y_{ij}, W_{ij} = 0\) is needed.

More accurate variational approximation

\[\begin{aligned} q_{\psi_i}(\boldsymbol Z_i, \boldsymbol W_i) & = q_{\psi_i}(\boldsymbol Z_i | \boldsymbol W_i) q_{\psi_i}(\boldsymbol W_i) \\ & = \otimes_{j = 1}^p \mathcal{N}(\boldsymbol x_i^\top \mathbf{B}_j, \Sigma_{jj})^{W_{ij}} \mathcal{N}(M_{ij}, S_{ij}^2)^{1-W_{ij}}, \quad W_{ij} \sim^\text{indep} \mathcal{B}\left(\rho_{ij}\right)\end{aligned}.\]

Counterpart

We loose close-forms in M Step of VEM for \(\hat{\mathbf{B}}\) and \(\hat{\mathbf{\Sigma}}\) in the corresponding ELBO…

Microcosm data analysis

Fit various PLN model

Assess environnemental effects

PLN           <- PLN(Abundance ~ 1             + offset(log(Offset)), data = microcosm)
PLN_site      <- PLN(Abundance ~ 0 + site      + offset(log(Offset)), data = microcosm)
PLN_time      <- PLN(Abundance ~ 0 + time      + offset(log(Offset)), data = microcosm)
PLN_site_time <- PLN(Abundance ~ 0 + site_time + offset(log(Offset)), data = microcosm)

Model selection criteria
	nb_param	loglik	BIC	ICL
PLN	33929	-219757.4	-334775.4	-839241.5
PLN site	34706	-217339.0	-334991.0	-842809.6
PLN time	34706	-217971.7	-335623.7	-838857.6
PLN site * time	37555	-213477.7	-340787.7	-845913.2

Fit various ZI-PLN models

Playing with the ZI component

Try various way of modeling probablity of being zero:

ZI            <- ZIPLN(Abundance ~ 1 + offset(log(Offset))           , data = microcosm)
ZI_site       <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + site,  data = microcosm)
ZI_time       <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + time,  data = microcosm)
ZI_site_time  <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + site_time,  data = microcosm)

Include covariates both in PLN and ZI components

ZI_site_PLN_site <- ZIPLN(Abundance ~ 0 + site      + offset(log(Offset)) | 0 + site,  data = microcosm)
ZI_time_PLN_time <- ZIPLN(Abundance ~ 0 + time      + offset(log(Offset)) | 0 + time,  data = microcosm)
ZI_PLN_site_time <- ZIPLN(Abundance ~ 0 + site_time + offset(log(Offset)) | 0 + site_time, data = microcosm)

Other possible combinaisons (still less effective).

Locate the best model

Model selection criteria
The top two largest are presented
model	nb_param	loglik	BIC	ICL
PLN	33929	-219757.4	-334775.4	-839241.5
PLN site	34706	-217339.0	-334991.0	-842809.6
PLN time	34706	-217971.7	-335623.7	-838857.6
PLN site * time	37555	-213477.7	-340787.7	-845913.2
ZI	33930	-224396.1	-339417.4	-595755.9
ZI site	34188	-210095.5	-325991.5	-533817.0
ZI time	34188	-216834.5	-332730.4	-565127.5
ZI site * time	34188	-204605.6	-320501.6	-542733.7
ZI site PLN site	35742	-206675.4	-327839.4	-512224.7
ZI time PLN time	35742	-211357.2	-332521.2	-527883.0
ZI and PLN site * time	41440	-192811.7	-333291.7	-503239.1

\(\rightsquigarrow\) Ok, let us keep model with site and time with main and interaction effects in both ZI an PLN components.

Model fits

ZI-PLN

Model fits

PLN seems to do just as well…

Fits of zeros

But zeros are not well predicted

ZI-PLN: latent layer

Latent means of the PLN component (w/wo covariate effects)

PLN: latent layer

Same for standard PLN 😊¹

ZI PLN

Clustering of the latent means vs probability of zero inflation

ZI PLN

Latent means vs probability of zero inflation

Residual covariance

Reordered by hierarchical clustering with complete linkage

Network analysis

Sparse reconstruction + SBM

network <- ZIPLN(Abundance ~ 0 + site_time + offset(log(Offset)) | 0 + site_time, data = microcosm,
                 control = ZIPLN_param(penalty = 0.2, inception = ZI_PLN_site_time))
adjacency_mat <- plot(network, plot = FALSE, type = "support", output = "corrplot") %>% as.matrix()
mySBM <- sbm::estimateSimpleSBM(adjacency_mat, dimLabels = "ASV", estimOptions = list(plot=FALSE))

Cluster: means vs covariance

With respect to taxonomy

Conclusion

Take-home message

PLN = generic model for multivariate count data analysis
Flexible modeling of the covariance structure, allows for covariates
Efficient Vartiational EM algorithms
More extension to come (e.g. Kahlman-filter-like formulation)

https://computo.sfds.asso.fr, a journal promoting reproducible research in ML and stat.

References

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.

Baey, Charlotte, Maud Delattre, Estelle Kuhn, Jean-Benoist Leger, and Sarah Lemler. 2023. “Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models.” In International Conference on Machine Learning, 1430–53. PMLR.

Batardière, Bastien, Julien Chiquet, François Gindraud, and Mahendra Mariadassou. 2024. “Zero-Inflation in the Multivariate Poisson Lognormal Family.” arXiv Preprint. https://arxiv.org/abs/2405.14711.

Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112 (518): 859–77.

Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational Inference for Probabilistic Poisson PCA.” The Annals of Applied Statistics 12: 2674–98. http://dx.doi.org/10.1214/18-AOAS1177.

———. 2019. “Variational Inference for Sparse Network Reconstruction from Count Data.” In Proceedings of the 19th International Conference on Machine Learning (ICML 2019).

———. 2021. “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.” Frontiers in Ecology and Evolution 9. https://doi.org/10.3389/fevo.2021.588292.

Johnson, Steven G. 2011. The NLopt Nonlinear-Optimization Package. http://ab-initio.mit.edu/nlopt.

Mariadassou, Mahendra, Laurent X Nouvel, Fabienne Constant, Diego P Morgavi, Lucie Rault, Sarah Barbey, Emmanuelle Helloin, et al. 2023. “Microbiota Members from Body Sites of Dairy Cows Are Largely Shared Within Individual Hosts Throughout Lactation but Sharing Is Limited in the Herd.” Animal Microbiome 5 (1): 32.

Paszke, Adam, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. “Automatic Differentiation in Pytorch.”

Riedmiller, Martin, and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” In IEEE International Conference on Neural Networks, 586–91. IEEE.

Rojas-Nandayapa, Leonardo. 2008. “Risk Probabilities: Asymptotics and Simulation.”

Stoehr, Julien, and Stephane S Robin. 2024. “Composite Likelihood Inference for the Poisson Log-Normal Model.” arXiv Preprint arXiv:2402.14390.

Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” SIAM Journal on Optimization 12 (2): 555–73.

Tikhonov, Gleb, Øystein H Opedal, Nerea Abrego, Aleksi Lehikoinen, Melinda MJ de Jonge, Jari Oksanen, and Otso Ovaskainen. 2020. “Joint Species Distribution Modelling with the r-Package Hmsc.” Methods in Ecology and Evolution 11 (3): 442–47.

Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.

Westling, Ted, and Tyler H. McCormick. 2015. “Beyond Prediction: A Framework for Inference with Variational Approximations in Mixture Models.” https://arxiv.org/abs/1510.08151.

Zero-inflation in the Poisson lognormal family (Batardière et al. 2024)

Motivation: multivariate count data

Routinely gathered in ecology/microbiology/genomics

Data tables

Goals

Illustration: microcosm data set (Mariadassou et al. 2023)

Table of Counts

Retrieve microcosm data

Background on the Poisson-lognormal Family Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

Properties: over-dispersion, arbitrary-signed covariances

Natural extensions of PLN

Various tasks of multivariate analysis

Estimation

Direct approach

Expectation-Maximization

Variational Inference

Resulting Variational EM

Implementations

PLNmodels Medium scale problems (R/C++ package)

pyPLNmodels Large scale problems (Python/Pytorch module)

PLN-PCA and Variational Auto-Encoders

Variational Estimator: properties

M-estimation framework (Van der Vaart 2000)

Theorem (Westling and McCormick 2015)

Variance: naïve approach

Variational Fisher Information

Confidence intervals and coverage

Variance : sandwich estimator

Theorem (Westling and McCormick 2015)

Practical computations (chain rule)

Caveat

Handling zeros in multivariate count tables

A zero-inflated PLN

Mixture of PLN and Bernoulli distribution

Basic properties

ZI-PLN: refinements

Modeling of the pure zero component

Standard mean-field

Variational approximation breaks all dependencies

Variational lower bound

Sparse regularization

Alternate optimization

Enhancing variational approximation (1)

The \(W|Z,Y\) path

Enhancing variational approximation (2)

The \(Z|W,Y\) path

More accurate variational approximation

Counterpart

Additional refinement

Optimization using analytic law of \(W_{ij}| Y_{ij}\)

Approximation of \(\varphi\)

Microcosm data analysis

Fit various PLN model

Fit various ZI-PLN models

Playing with the ZI component

Include covariates both in PLN and ZI components

Locate the best model

Model fits

ZI-PLN

Model fits

PLN seems to do just as well…

Fits of zeros

But zeros are not well predicted

ZI-PLN: latent layer

Latent means of the PLN component (w/wo covariate effects)

PLN: latent layer

Same for standard PLN 😊1

ZI PLN

Clustering of the latent means vs probability of zero inflation

ZI PLN

Latent means vs probability of zero inflation

Residual covariance

Reordered by hierarchical clustering with complete linkage

Network analysis

Cluster: means vs covariance

With respect to taxonomy

Conclusion

Take-home message

Background on
the Poisson-lognormal Family
Model, inference

`PLNmodels` Medium scale problems (R/C++ package)

`pyPLNmodels` Large scale problems (Python/Pytorch module)

Handling zeros in
multivariate count tables

Same for standard PLN 😊¹