Zero-inflation in the Poisson lognormal family (Batardière et al. 2024)

Rencontres Mathématiques de Rouen

Bastien Batardière

PhD - UMR MIA Paris-Saclay

Julien Chiquet

MIA Paris-Saclay, INRAE

François Gindraud

IR INRIA

Mahendra Mariadassou

MaIAGE, INRAE

June 21, 2024

Motivation: multivariate count data

Routinely gathered in ecology/microbiology/genomics

Data tables

  • Abundances: read counts of species/transcripts \(j\) in sample \(i\)
  • Covariates: value of environmental variable \(k\) in sample \(i\)
  • Offsets: sampling effort for species/transcripts \(j\) in sample \(i\)

Goals

  • understand environmental effects (regression, classification)
  • exhibit patterns of diversity (clustering, dimension reduction)
  • understand between-species interactions (covariance selection)

Illustration: microcosm data set (Mariadassou et al. 2023)

  • Study carried out at INRAE “Domaine Expérimental du Pin”
  • microbiotas of 44 lactating cows at 4 body site (mouth, nose, vagina, milk) \(\times\) 4 time points
  • Abundances of \(\approx\) 1200 “species” (Amplicon Sequence Variants) with known taxonomy

Table of Counts

Retrieve microcosm data

microcosm <- readRDS("microcosm_reduced.rds")
microcosm$Abundance <- microcosm$Abundance[, colMeans(microcosm$Abundance > 0) > 0.05]
microcosm <- microcosm[rowSums(microcosm$Abundance) > 0, ]
microcosm$site_time <- droplevels(microcosm$site_time)
sum(microcosm$Abundance == 0) / length(microcosm$Abundance)
[1] 0.9013119

Still 90% of zeros !

Data after total-counts normalization

Log-scale

Background on
the Poisson-lognormal Family
Model, inference

Model for multivariate count data

The Poisson Lognormal model (PLN)

PLN (Aitchison and Ho 1989) is a multivariate generalized linear model, where

  • the counts \(\mathbf{Y}_i\in\mathbb{N}^p\) are the response variables
  • the main effect is due to a linear combination of the covariates \(\mathbf{x}_i\in\mathbb{R}^d\)
  • a vector of offsets \(\mathbf{o}_i\in\mathbb{R}^p\) can be specified for each sample

\[ \quad \mathbf{Y}_i | \mathbf{Z}_i \sim^{\text{iid}} \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\underbrace{\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}}}_{{\boldsymbol\mu}_i},\boldsymbol\Sigma). \]

Typically, \(n\approx 10s \to 1000s\), \(p\approx 10s \to 1000s\), \(d \approx 1 \to 10\)

Properties: over-dispersion, arbitrary-signed covariances

  • mean: \(\mathbb{E}(Y_{ij}) = \exp \left( o_{ij} + \mathbf{x}_i^\top {\mathbf{B}}_{\cdot j} + \sigma_{jj}/2\right) > 0\)
  • variance: \(\mathbb{V}(Y_{ij}) = \mathbb{E}(Y_{ij}) + \mathbb{E}(Y_{ij})^2 \left( e^{\sigma_{jj}} - 1 \right) > \mathbb{E}(Y_{ij})\)
  • covariance: \(\mathrm{Cov}(Y_{ij}, Y_{ik}) = \mathbb{E}(Y_{ij}) \mathbb{E}(Y_{ik}) \left( e^{\sigma_{jk}} - 1 \right).\)

Natural extensions of PLN

Various tasks of multivariate analysis

  • Dimension Reduction: rank constraint matrix \(\boldsymbol\Sigma\). (Chiquet, Mariadassou, and Robin 2018)

    \[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.\]

  • Classification: maximize separation between groups with means (Chiquet, Mariadassou, and Robin 2021)

\[\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.\]

\[\mathbf{Z}_i \mid i \in k \sim \mathcal{N}(\boldsymbol\mu_k, \boldsymbol\Sigma_k), \quad \text{for unknown memberships}.\]

\[\mathbf{Z}_i \sim \mathcal{N}({\boldsymbol\mu}_i, \boldsymbol\Sigma = \boldsymbol\Omega^{-1}), \quad \|\boldsymbol\Omega \|_1 < c.\]

Estimation

Estimate \(\theta = (\mathbf{B}, {\boldsymbol\Sigma})\), predict the \(\mathbf{Z}_i\), while the model marginal likelihood is

\[\begin{equation*} p_\theta(\mathbf{Y}_i) = \int_{\mathbb{R}_p} \prod_{j=1}^p p_\theta(Y_{ij} | Z_{ij}) \, p_\theta(\mathbf{Z}_i) \mathrm{d}\mathbf{Z}_i \end{equation*}\]

Direct approach

Expectation-Maximization

\[\begin{equation*} \log p_\theta(\mathbf{Y}) = \mathbb{E}_{p_\theta(\mathbf{Z}\,|\,\mathbf{Y})} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[p_\theta(\mathbf{Z}\,|\,\mathbf{Y})], \text{ with } \mathcal{H}(p) = -\mathbb{E}_p(\log(p)). \end{equation*}\]

EM requires to evaluate (some moments of) \(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\), but there is no close form!

Variational Inference

Use a proxy \(q_\psi\) of \(p_\theta(\mathbf{Z}\,|\,\mathbf{Y})\) minimizing a divergence in a class \(\mathcal{Q}\) (e.g, Küllback-Leibler)

\[\begin{equation*} q_\psi(\mathbf{Z})^\star \in \arg\min_{q\in\mathcal{Q}} D\left(q(\mathbf{Z}), p(\mathbf{Z} | \mathbf{Y})\right), \, \text{e.g.}, D(.,.) = KL(., .) = \mathbb{E}_{q_\psi}\left[\log \frac{q(z)}{p(z)}\right]. \end{equation*}\]

and maximize the ELBO (Evidence Lower BOund)

\[\begin{equation*} J(\theta, \psi) = \log p_\theta(\mathbf{Y}) - KL[q_\psi (\mathbf{Z}) || p_\theta(\mathbf{Z} | \mathbf{Y})] = \mathbb{E}_{\psi} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[q_\psi(\mathbf{Z})] = \frac{1}{n} \sum_{i = 1}^n J_i(\theta, \psi_i) \end{equation*}\]

Resulting Variational EM

  • VE step: optimize \(\boldsymbol{\psi}\) (can be written individually) \[\begin{equation*} \psi_i^{(h)} = \arg \max J_{i}(\theta^{(h)}, \psi_i) \left( = \arg\min_{q_i} KL[q_i(\mathbf{Z}_i) \,||\, p_{\theta^h}(\mathbf{Z}_i\,|\,\mathbf{Y}_i)] \right) \end{equation*}\]

  • M step: optimize \(\theta\) \[\theta^{(h)} = \arg\max \frac{1}{n}\sum_{i=1}^{n}J_{Y_i}(\theta, \psi_i^{(h)})\]

Implementations

See https://pln-team.github.io/repositories/

PLNmodels Medium scale problems (R/C++ package)

  • V-EM: exact M-step + optimization for VE-stem
  • algorithm: conservative convex separable approximations (Svanberg 2002)
  • implementation: NLopt nonlinear-optimization library (Johnson 2011)

\(\rightsquigarrow\) Up to thousands of sites ( \(n \approx 1000s\) ), hundreds of species ( \(p\approx 100s\) )

pyPLNmodels Large scale problems (Python/Pytorch module)

  • V-EM: exact M-step + optimization for VE-stem
  • algorithm: Rprop (gradient sign + adaptive variable-specific update) (Riedmiller and Braun 1993)
  • implementation: torch with GPU auto-differentiation (Paszke et al. 2017)

\(\rightsquigarrow\) Up to \(n \to 100,000\) and \(p\approx 10,000s\)

PLN-PCA and Variational Auto-Encoders

  • The Decoder is the generative model \(p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\)
  • The Encoder perform performs variational by approximating the posterior distribution with \(q_\psi(\mu_i,\sigma^2_i)\)
  • VAE maximize a lower bound of the marginal \(\log p_{\theta}(\mathbf{Y})\)

\[ \log p_{\theta}(\mathbf{Y}_i) \geq \mathbb{E}_{q_{\psi}(\mathbf{Z}_i|\mathbf{X}_i)}\left[\log p_{\theta}(\mathbf{Y}_i | \mathbf{Z}_i)\right] - D_{KL}(q_{\psi}(\mathbf{Z}_i|\mathbf{Y}_i)||p_{\theta}(\mathbf{Z}_i)) \]

Variational Estimator: properties

M-estimation framework (Van der Vaart 2000)

Let \(\hat{\psi}_i = \hat{\psi}_i(\theta, \mathbf{Y}_i) = \arg \max_{\psi} J_i(\theta, \psi)\) and consider the stochastic map \(\bar{J}_n\) defined by

\[\bar{J}_n \; : \quad \theta \mapsto \frac{1}{n}\sum_{i=1}^{n} J_i(\theta, \hat{\psi}_i) \stackrel{\Delta}{=} \frac{1}{n}\sum_{i=1}^{n} \bar{J}_i(\theta)\]

M-estimation suggests that \(\hat{\theta}^{\text{ve}} = \arg\max_{\theta} \bar{J}_{n}(\theta)\) should converge to \(\bar{\theta} = \arg\max_{\theta} \bar{J}(\theta)\) where \(\bar{J}(\theta) = \mathbb{E}_{\theta^\star}[\bar{J}_Y(\theta)] = \mathbb{E}_{\theta^\star}[J_Y(\theta, \hat{\psi}(\theta, Y))]\).

Theorem (Westling and McCormick 2015)

In this line, (Westling and McCormick 2015) show that under regularity conditions ensuring that \(\bar{J}_n\) is smooth enough (e.g. when \(\theta\) and \(\psi_i\) are restricted to compact sets), \[ \hat{\theta}^{\text{ve}} \xrightarrow[n \to +\infty]{a.e.} \bar{\theta} \]

Open question: \(\bar{\theta} = \theta^\star\) ? No formal results as \(\bar{J}\) is untractable but numerical evidence suggests so.

Variance: naïve approach

Do as if \(\hat{\theta}^{\text{ve}}\) was a MLE and \(\bar{J}_n\) the log-likelihood.

Variational Fisher Information

For standard PLN, the Fisher information matrix is given by (from the Hessian of \(J\)) by

\[I_n(\hat{\theta}^{\text{ve}}) = \begin{pmatrix} \frac{1}{n}(\mathbf{I}_p \otimes \mathbf{X}^\top)\mathrm{diag}(\mathrm{vec}(\mathbf{A}))(\mathbf{I}_p \otimes \mathbf{X}) & \mathbf{0} \\ \mathbf{0} & \frac12\mathbf{\Omega}^{-1} \otimes \mathbf{\Omega}^{-1} \end{pmatrix}\]

where \(A_{ij} = \exp{M_{ij} + 1/2 S_{ij}}\).

Confidence intervals and coverage

\(\hat{\mathbb{V}}(B_{kj}) = [n (\mathbf{X}^\top \mathrm{diag}(\mathrm{vec}(\hat{A}_{.j})) \mathbf{X})^{-1}]_{kk}, \qquad \hat{\mathbb{V}}(\Omega_{kl}) = 2\hat{\Omega}_{kk}\hat{\Omega}_{ll}\)

The confidence intervals at level \(\alpha\) are given by

\(B_{kj} = \hat{B}_{kj} \pm \frac{q_{1 - \alpha/2}}{\sqrt{n}} \sqrt{\hat{\mathbb{V}}(B_{kj})}, \qquad \Omega_{kl} = \hat{\Omega}_{kl} \pm \frac{q_{1 - \alpha/2}}{\sqrt{n}} \sqrt{\hat{\mathbb{V}}(\Omega_{kl})}\).

Variance : sandwich estimator

Theorem (Westling and McCormick 2015)

Under additional regularity conditions (satisfied when \(\theta\) and \(\psi_i\) are restricted to compact sets), we have \[ \sqrt{n}(\hat{\theta}^{\text{ve}} - \bar{\theta}) \xrightarrow[]{d} \mathcal{N}(0, V(\bar{\theta})), \quad \text{where } V(\theta) = C(\theta)^{-1} D(\theta) C(\theta)^{-1} \] for \(C(\theta) = \mathbb{E}[\nabla_{\theta\theta} \bar{J}(\theta) ]\) and \(D(\theta) = \mathbb{E}\left[(\nabla_{\theta} \bar{J}(\theta)) (\nabla_{\theta} \bar{J}(\theta)^\intercal \right]\)

Practical computations (chain rule)

\[ \begin{aligned} \hat{C}_n(\theta) & = \frac{1}{n} \sum_{i=1}^n \left[ \nabla_{\theta\theta} J_i - \nabla_{\theta\psi_i} J_i (\nabla_{\psi_i\psi_i} J_i)^{-1} \nabla_{\theta\psi_i} J_i^\intercal \right](\theta, \hat{\psi}_i) \\ \hat{D}_n(\theta) & = \frac{1}{n} \sum_{i=1}^n \left[ \nabla_{\theta} J_i \nabla_{\theta} J_i^\intercal \right](\theta, \hat{\psi}_i) \end{aligned} \]

Caveat

  • For \(\theta = (\mathbf{B}, \boldsymbol\Omega)\), \(\hat{C}_n\) requires the inversion of \(n\) matrices with \((p^2 + p d)\) rows/columns…
  • In practice, jackknife resampling can be estimated the variance (Stoehr and Robin 2024)


Handling zeros in
multivariate count tables

A zero-inflated PLN

Mixture of PLN and Bernoulli distribution

Use two latent vectors \(\mathbf{W}_i\) and \(\mathbf{Z}_i\) to model excess of zeroes and dependence structure

\[\begin{array}{rrl} \text{PLN latent space} & \boldsymbol{Z}_i = (Z_{ij})_{j=1\dots p} & \sim \mathcal{N}(\mathbf{x}_i^\top \mathbf{B}, \mathbf{\Sigma}), \\[1.5ex] \text{excess of zeros} & \boldsymbol{W}_{i} = (W_{ij})_{j=1\dots p} & \sim \otimes_{j=1}^p \cal B(\pi_{ij}), \\[1.5ex] \text{observation space} & Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim^\text{indep} W_{ij}\delta_0 + (1-W_{ij})\mathcal{P} \left(\exp\{o_{ij} + Z_{ij}\}\right),\\ \end{array}\]

\(\rightsquigarrow\) Better handling of zeros +additional interpretable parameters

Basic properties

Letting \(A_{ij} \triangleq \exp \left( o_{ij} + \mu_{ij} + \sigma_{jj}/2\right)\) with \(\mu_{ij} = \mathbf{x}_i^\intercal \mathbf{B}_j\), then

  • \(\mathbb{E}\left(Y_{ij}\right) = (1-\pi_{ij}) A_{ij} \leq A_{ij}\) (PLN’s mean),
  • \(\mathbb{V}\left(Y_{ij}\right) = (1-\pi_{ij})A_{ij} + (1-\pi_{ij})A_{ij}^2 \left( e^{\sigma_{jj}} - (1-\pi_{ij}) \right)\).

ZI-PLN: refinements

Modeling of the pure zero component

\[\begin{align*} \pi_{ij} & = \pi \in [0,1] & \text{(single global parameter)} \\ \pi_{ij} & = \pi_j \in [0,1] & \text{(species dependent)} \\ \pi_{ij} & = \pi_i \in [0,1] & \text{(site dependent)} \\ \pi_{ij} & = \mathrm{logit}^{-1}( \boldsymbol X^{0}\boldsymbol B^0)_{ij}, ~ \boldsymbol X^0 \in \mathbb R^{n \times d_0}, ~ \boldsymbol B^0 \in \mathbb R^{d_0\times p} & \text{(site covariates)} \\ \pi_{ij} & = \mathrm{logit}^{-1}(\bar{\boldsymbol{B}}^0 \bar{\boldsymbol{X}}^{0})_{ij}, ~ \bar{\boldsymbol{B}}^0 \in \mathbb R^{n \times d_0}, ~ \bar{\boldsymbol X}^0 \in \mathbb R^{d_0\times p} & \text{(species covariates)} \end{align*}\]

Proposition 1 (Identifiability of ZI-PLN)  

  • The “single global parameter” ZI-PLN model with parameter \(\mathbf{\theta} = (\mathbf{\Omega}, \mathbf{\mu}, \mathbf{\pi})\) and parameter space \(\mathbb{S}_p^{++} \times \mathbb{R}^p \times (0,1)^{p}\) is identifiable (moment-based proof)
  • The site-covariates zero-inflation model with parameter \(\mathbf{\theta} = (\mathbf{\Omega}, \mathbf{B}, \mathbf{B}^0)\) and parameter space \(\mathbb{S}_p^{++} \times \mathcal{M}_{p,d}(\mathbb{R}) \times \mathcal{M}_{p,d}(\mathbb{R})\) is identifiable if and only if both \(n\times d\) and \(n \times d_0\) matrices of covariates \(\mathbf{X}\) and \(\mathbf{X}^0\) are full rank.

Standard mean-field

Variational approximation breaks all dependencies

\[\begin{equation*} p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) \triangleq q_{\psi_i}(\mathbf{Z}_i) q_{\psi_i}(\mathbf{W}_i) = \otimes_{j=1}^p q_{\psi_i}(Z_{ij}) q_{\psi_i}(W_{ij}) \end{equation*}\]

with Gaussian and Bernoulli distributions for \(Z_{ij}\) and \(W_{ij}\), then

\[\begin{equation*} q_{\psi_i}(\mathbf{Z}_i, \mathbf{W}_i) = \otimes_{j=1}^p \mathcal N\left( M_{ij}, S_{ij}^2\right) \mathcal B\left(\rho_{ij} \right) \end{equation*}\]

Variational lower bound

Let \(\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Sigma})\) and \(\psi= (\mathbf{M}, \mathbf{S}, \mathbf{R})\), then

\[\begin{align*} J(\theta, \psi) & = \log p_\theta(\mathbf{Y}) - KL(p_\theta(. | \mathbf{Y}) \| q_\psi(.) ) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta(\mathbf{Z}, \mathbf{W}, \mathbf{Y}) - \mathbb{E}_{q_{\psi}} \log q_\psi(\mathbf{Z}, \mathbf{W}) \\ & = \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Y} | \mathbf{Z}, \mathbf{W}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{Z}) + \mathbb{E}_{q_{\psi}} \log p_\theta (\mathbf{W}) \\ & \qquad - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{Z}) - \mathbb{E}_{q_{\psi}} \log q_{\psi} (\mathbf{W}) \\ \end{align*}\]

Property: concave in each element of \(\theta, \psi\).

Sparse regularization

Recall that \(\theta = (\mathbf{B}, \mathbf{B}^0, \mathbf{\Omega} = \mathbf{\Sigma}^{-1})\). Sparsity allows to control the number of parameters:

\[\begin{equation*} \arg\min_{\theta, \psi} J(\theta, \psi) + \lambda_1 \| \mathbf{B} \|_1 + \lambda_2 \| \mathbb{\Omega} \|_1 \color{#ddd}{ \left( + \lambda_1 \| \mathbf{B}^0 \|_1 \right)} \end{equation*}\]

Alternate optimization

  • (Stochastic) Gradient-descent on \(\mathbf{B}^0, \mathbf{M}, \mathbf{S}\)
  • Closed-form for posterior probabilities \(\mathbf{R}\)
  • Inverse covariance \(\mathbf{\Omega}\)
    • if \(\lambda_2=0\), \(\hat{\mathbf{\Sigma}} = n^{-1} \left[ (\mathbf{M} - \mathbf{XB})^\top(\mathbf{M} - \mathbf{XB}) + \bar{\mathbf{S}}^2 \right]\)
    • if \(\lambda_2 > 0\), \(\ell_1\) penalized MLE ( \(\rightsquigarrow\) Graphical-Lasso with \(\hat{\mathbf{\Sigma}}\) as input)
  • PLN regression coefficient \(\mathbf{B}\)
    • if \(\lambda_1=0\), \(\hat{\mathbf{B}} = [\mathbf{X}^\top \mathbf{X}]^{-1} \mathbf{X}^\top \mathbf{M}\)
    • if \(\lambda_1 > 0\), vectorize and solve a \(\ell_1\) penalized least-squared problem

Initialize With univariate zero-inflated Poisson regression models

Enhancing variational approximation (1)

Two paths of improvements to break less dependencies between the latent variables:

\[p(\mathbf{Z}_i, \mathbf{W}_i | \mathbf{Y}_i) \approx q(\mathbf{Z}_i, \mathbf{W}_i) \triangleq \left\{\begin{array}{l} \prod_j q(W_{ij} | Z_{ij}) q(Z_ {ij}) \\[1ex] \prod_j q(Z_{ij} | W_{ij}) q(W_{ij}) \\ \end{array}\right.\]

The \(W|Z,Y\) path

One can show that

\[ W_{ij}| Y_{ij},Z_{ij} \sim\mathcal B \left( \frac{\pi_{ij}}{\pi_{ij} + (1-\pi_{ij})\exp(-Z_{ij})}\right) \boldsymbol 1_{\{Y_{ij} = 0\}} \]

Sadly, the resulting ELBO involves the untractable entropy term \(\tilde{\mathbb{E}}[\log q_{\psi}(\mathbf{W} | \mathbf{Z})]\)

\(\rightsquigarrow\) requires computing \(\tilde{\mathbb{E}}\left[ -\frac{\log(1+\exp(-U))}{1+\exp(-U)} \right]\) for arbitrary univariate Gaussians \(U\)

Enhancing variational approximation (2)

The \(Z|W,Y\) path

Since \(W_{ij}\) only takes two values, the dependence between \(Z_{ij}\) and \(W_{ij}\) can easily be highlighted:

\[Z_{ij} | W_{ij}, Y_{ij} = \left(Z_{ij}|Y_{ij}, W_{ij} = 1 \right)^{ W_{ij}}\left(Z_{ij}|Y_{ij}, W_{ij} = 0 \right)^{1- W_{ij}}.\] Then, \(p(Z_{ij}| Y_{ij}, W_{ij} = 1) = p(Z_{ij} | (W_{ij} = 1) = p(Z_{ij})\) by independence of \(Z_{ij}\) and \(W_{ij}\).

\(\rightsquigarrow\) Only an approximation of \(Z_{ij} | Y_{ij}, W_{ij} = 0\) is needed.

More accurate variational approximation

\[\begin{aligned} q_{\psi_i}(\boldsymbol Z_i, \boldsymbol W_i) & = q_{\psi_i}(\boldsymbol Z_i | \boldsymbol W_i) q_{\psi_i}(\boldsymbol W_i) \\ & = \otimes_{j = 1}^p \mathcal{N}(\boldsymbol x_i^\top \mathbf{B}_j, \Sigma_{jj})^{W_{ij}} \mathcal{N}(M_{ij}, S_{ij}^2)^{1-W_{ij}}, \quad W_{ij} \sim^\text{indep} \mathcal{B}\left(\rho_{ij}\right)\end{aligned}.\]

Counterpart

We loose close-forms in M Step of VEM for \(\hat{\mathbf{B}}\) and \(\hat{\mathbf{\Sigma}}\) in the corresponding ELBO…

Additional refinement

Optimization using analytic law of \(W_{ij}| Y_{ij}\)

Proposition 2 (Distribution of \(W_{ij}| Y_{ij}\)) \[W_{ij} | Y_{ij} \sim \mathcal{B}\left(\frac{\pi_{ij}}{ \varphi\left(\mathbf{x}_i^\top \boldsymbol B_j, \Sigma_{jj}\right) \left(1 - \pi_{ij}\right) + \pi_{ij}}\right) \boldsymbol 1_{\{Y_{ij} = 0\}}\]

with \(\varphi(\mu,\sigma^2) = \mathbb E \left[ \exp(-X)\right], ~ X \sim \mathcal L \mathcal N \left( \mu, \sigma^2\right)\).

Approximation of \(\varphi\)

The function \(\varphi\) is intractable but an approximation (Rojas-Nandayapa 2008) can be computed:

\[\varphi(\mu, \sigma^2)\approx \tilde \varphi(\mu, \sigma^2)= \frac{\exp \left(-\frac{L^2\left(\sigma^2 e^\mu\right)+2 L\left(\sigma^2 e^\mu\right)}{2 \sigma^2}\right)}{\sqrt{1+L\left(\sigma^2 e^\mu\right)}},\] where \(L(\cdot)\) is the Lambert function (i.e. \(z = x \exp(x) \Leftrightarrow x = L(z), x,z \in \mathbb R\)).

Fit various PLN model

Assess environnemental effects


PLN           <- PLN(Abundance ~ 1             + offset(log(Offset)), data = microcosm)
PLN_site      <- PLN(Abundance ~ 0 + site      + offset(log(Offset)), data = microcosm)
PLN_time      <- PLN(Abundance ~ 0 + time      + offset(log(Offset)), data = microcosm)
PLN_site_time <- PLN(Abundance ~ 0 + site_time + offset(log(Offset)), data = microcosm)


Model selection criteria
nb_param loglik BIC ICL
PLN 33929 -219757.4 -334775.4 -839241.5
PLN site 34706 -217339.0 -334991.0 -842809.6
PLN time 34706 -217971.7 -335623.7 -838857.6
PLN site * time 37555 -213477.7 -340787.7 -845913.2

Fit various ZI-PLN models


Playing with the ZI component

Try various way of modeling probablity of being zero:

ZI            <- ZIPLN(Abundance ~ 1 + offset(log(Offset))           , data = microcosm)
ZI_site       <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + site,  data = microcosm)
ZI_time       <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + time,  data = microcosm)
ZI_site_time  <- ZIPLN(Abundance ~ 1 + offset(log(Offset)) | 0 + site_time,  data = microcosm)


Include covariates both in PLN and ZI components

ZI_site_PLN_site <- ZIPLN(Abundance ~ 0 + site      + offset(log(Offset)) | 0 + site,  data = microcosm)
ZI_time_PLN_time <- ZIPLN(Abundance ~ 0 + time      + offset(log(Offset)) | 0 + time,  data = microcosm)
ZI_PLN_site_time <- ZIPLN(Abundance ~ 0 + site_time + offset(log(Offset)) | 0 + site_time, data = microcosm)

Other possible combinaisons (still less effective).

Locate the best model

Model selection criteria
The top two largest are presented
model nb_param loglik BIC ICL
PLN 33929 -219757.4 -334775.4 -839241.5
PLN site 34706 -217339.0 -334991.0 -842809.6
PLN time 34706 -217971.7 -335623.7 -838857.6
PLN site * time 37555 -213477.7 -340787.7 -845913.2
ZI 33930 -224396.1 -339417.4 -595755.9
ZI site 34188 -210095.5 -325991.5 -533817.0
ZI time 34188 -216834.5 -332730.4 -565127.5
ZI site * time 34188 -204605.6 -320501.6 -542733.7
ZI site PLN site 35742 -206675.4 -327839.4 -512224.7
ZI time PLN time 35742 -211357.2 -332521.2 -527883.0
ZI and PLN site * time 41440 -192811.7 -333291.7 -503239.1

\(\rightsquigarrow\) Ok, let us keep model with site and time with main and interaction effects in both ZI an PLN components.

Model fits

ZI-PLN

Model fits

PLN seems to do just as well…

Fits of zeros

But zeros are not well predicted

ZI-PLN: latent layer

Latent means of the PLN component (w/wo covariate effects)

PLN: latent layer

Same for standard PLN 😊1

ZI PLN

Clustering of the latent means vs probability of zero inflation

ZI PLN

Latent means vs probability of zero inflation

Residual covariance

Reordered by hierarchical clustering with complete linkage

Network analysis

Sparse reconstruction + SBM

network <- ZIPLN(Abundance ~ 0 + site_time + offset(log(Offset)) | 0 + site_time, data = microcosm,
                 control = ZIPLN_param(penalty = 0.2, inception = ZI_PLN_site_time))
adjacency_mat <- plot(network, plot = FALSE, type = "support", output = "corrplot") %>% as.matrix()
mySBM <- sbm::estimateSimpleSBM(adjacency_mat, dimLabels = "ASV", estimOptions = list(plot=FALSE))

Cluster: means vs covariance

With respect to taxonomy

Conclusion

Take-home message

  • PLN = generic model for multivariate count data analysis
  • Flexible modeling of the covariance structure, allows for covariates
  • Efficient Vartiational EM algorithms
  • More extension to come (e.g. Kahlman-filter-like formulation)

https://computo.sfds.asso.fr, a journal promoting reproducible research in ML and stat.

References

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.
Baey, Charlotte, Maud Delattre, Estelle Kuhn, Jean-Benoist Leger, and Sarah Lemler. 2023. “Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models.” In International Conference on Machine Learning, 1430–53. PMLR.
Batardière, Bastien, Julien Chiquet, François Gindraud, and Mahendra Mariadassou. 2024. “Zero-Inflation in the Multivariate Poisson Lognormal Family.” arXiv Preprint. https://arxiv.org/abs/2405.14711.
Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112 (518): 859–77.
Chiquet, Julien, Mahendra Mariadassou, and Stéphane Robin. 2018. “Variational Inference for Probabilistic Poisson PCA.” The Annals of Applied Statistics 12: 2674–98. http://dx.doi.org/10.1214/18-AOAS1177.
———. 2019. “Variational Inference for Sparse Network Reconstruction from Count Data.” In Proceedings of the 19th International Conference on Machine Learning (ICML 2019).
———. 2021. “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.” Frontiers in Ecology and Evolution 9. https://doi.org/10.3389/fevo.2021.588292.
Johnson, Steven G. 2011. The NLopt Nonlinear-Optimization Package. http://ab-initio.mit.edu/nlopt.
Mariadassou, Mahendra, Laurent X Nouvel, Fabienne Constant, Diego P Morgavi, Lucie Rault, Sarah Barbey, Emmanuelle Helloin, et al. 2023. “Microbiota Members from Body Sites of Dairy Cows Are Largely Shared Within Individual Hosts Throughout Lactation but Sharing Is Limited in the Herd.” Animal Microbiome 5 (1): 32.
Paszke, Adam, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. “Automatic Differentiation in Pytorch.”
Riedmiller, Martin, and Heinrich Braun. 1993. “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm.” In IEEE International Conference on Neural Networks, 586–91. IEEE.
Rojas-Nandayapa, Leonardo. 2008. “Risk Probabilities: Asymptotics and Simulation.”
Stoehr, Julien, and Stephane S Robin. 2024. “Composite Likelihood Inference for the Poisson Log-Normal Model.” arXiv Preprint arXiv:2402.14390.
Svanberg, Krister. 2002. “A Class of Globally Convergent Optimization Methods Based on Conservative Convex Separable Approximations.” SIAM Journal on Optimization 12 (2): 555–73.
Tikhonov, Gleb, Øystein H Opedal, Nerea Abrego, Aleksi Lehikoinen, Melinda MJ de Jonge, Jari Oksanen, and Otso Ovaskainen. 2020. “Joint Species Distribution Modelling with the r-Package Hmsc.” Methods in Ecology and Evolution 11 (3): 442–47.
Van der Vaart, Aad W. 2000. Asymptotic Statistics. Vol. 3. Cambridge university press.
Westling, Ted, and Tyler H. McCormick. 2015. “Beyond Prediction: A Framework for Inference with Variational Approximations in Mixture Models.” https://arxiv.org/abs/1510.08151.