Skip to contents

Preliminaries

This vignette illustrates the ZIPLN and ZIPLNnetwork functions and the methods accompanying the R6 classes ZIPLNfit and ZIPLNnetworkfamily. These models extend PLN and PLNnetwork (see the corresponding vignettes) to count data with an excess of zeros that a (sparse) Poisson lognormal model alone cannot capture.

Requirements

The packages required for the analysis are PLNmodels plus some others for data manipulation and representation:

Data set

We illustrate zero-inflation with the microcosm data set (Mariadassou et al. 2023): the evolution of the microbiota of lactating cows, sampled in 4 body sites (oral, nasal, vaginal, milk) at several time points, with p = 259 taxa, n = 880 samples and an average of 90% zeroes. To keep this vignette fast to compile while remaining representative, we restrict the analysis to the 30 most abundant taxa, which still display substantial zero-inflation:

data(microcosm)
most_abundant <- order(colSums(microcosm$Abundance), decreasing = TRUE)[1:30]
microcosm$Abundance <- microcosm$Abundance[, most_abundant]
mean(microcosm$Abundance == 0)
## [1] 0.8018182

Mathematical background

The zero-inflated PLN model (ZIPLN) (Batardière et al. 2025) combines the Poisson lognormal model (Aitchison and Ho 1989) – see the PLN vignette – with a zero-inflation mechanism: each count YijY_{ij} is either a structural zero (with probability πij\pi_{ij}) or drawn from the usual PLN generative process: latent space 𝐙i𝒩(𝛍,𝚺)zero-inflation Wij(πij)independent of 𝐙iobservation space Yij|Zij,Wij=0𝒫(exp{Zij}),Yij|Wij=1=0\begin{equation} \begin{array}{rcl} \text{latent space } & \mathbf{Z}_i \sim \mathcal{N}\left({\boldsymbol\mu},\boldsymbol\Sigma\right) & \\ \text{zero-inflation } & W_{ij} \sim \mathcal{B}(\pi_{ij}) & \text{independent of } \mathbf{Z}_i\\ \text{observation space } & Y_{ij} \,|\, Z_{ij}, W_{ij} = 0 \sim \mathcal{P}\left(\exp\{Z_{ij}\}\right), & Y_{ij} \,|\, W_{ij} = 1 \;=\; 0 \end{array} \end{equation}

Just like PLN, 𝛍{\boldsymbol\mu} generalizes to 𝐨i+𝐱i𝐁\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B} to account for offsets and covariates. The zero-inflation probabilities πij\pi_{ij} can be parameterized in several ways, controlled by the zi argument of ZIPLN():

  • "single": a single π\pi shared by all entries (default).
  • "row": one πi\pi_i per sample.
  • "col": one πj\pi_j per species.
  • covariates: logit(πij)=𝐱0,i𝐁0,j\text{logit}(\pi_{ij}) = \mathbf{x}_{0,i}^\top\mathbf{B}_{0,j}, specified with the formula syntax Y ~ PLN effect | ZI effect (see below).

ZIPLNnetwork further adds a sparsity penalty on 𝛀=𝚺1\boldsymbol\Omega = \boldsymbol\Sigma^{-1}, exactly as PLNnetwork does for PLN (see the PLNnetwork vignette and Chiquet et al. (2019)), so that both the excess of zeros and the residual dependency structure between taxa are accounted for. See Tous et al. (2025) for an application to species association networks from count data with structural zeros.

Analysis of microcosm with ZIPLN

Comparing zero-inflation parameterizations

We fit a plain PLN model and the four ZIPLN parameterizations, all with the sampling site as a covariate for the count part, to check how much accounting for zero-inflation improves the fit:

myPLN     <- PLN(Abundance ~ 0 + site + offset(log(Offset)), data = microcosm)
zi_single <- ZIPLN(Abundance ~ 0 + site + offset(log(Offset)), data = microcosm)
zi_row    <- ZIPLN(Abundance ~ 0 + site + offset(log(Offset)), data = microcosm, zi = "row")
zi_col    <- ZIPLN(Abundance ~ 0 + site + offset(log(Offset)), data = microcosm, zi = "col")
zi_site   <- ZIPLN(Abundance ~ 0 + site + offset(log(Offset)) | 0 + site, data = microcosm)
data.frame(
  model  = c("PLN", "ZIPLN (single)", "ZIPLN (row)", "ZIPLN (col)", "ZIPLN (site-dependent)"),
  loglik = c(myPLN$loglik, zi_single$loglik, zi_row$loglik, zi_col$loglik, zi_site$loglik),
  BIC    = c(myPLN$BIC, zi_single$BIC, zi_row$BIC, zi_col$BIC, zi_site$BIC),
  ICL    = c(myPLN$ICL, zi_single$ICL, zi_row$ICL, zi_col$ICL, zi_site$ICL)
) %>% knitr::kable(digits = 1)
model loglik BIC ICL
PLN -51716.2 -53699.3 -100954.0
ZIPLN (single) -48847.5 -50834.0 -89988.5
ZIPLN (row) -46875.9 -51842.2 -89899.3
ZIPLN (col) -48147.4 -50232.3 -87715.3
ZIPLN (site-dependent) -47647.5 -50037.4 -85962.4

Accounting for zero-inflation brings a large improvement over plain PLN, and letting the zero-inflation probability depend on the body site (zi_site) gives the best BIC and ICL among the parameterizations considered – not surprising given how different the four body sites are.

Inspecting the fit

As for PLN, fitted values stay close to the observed counts:

data.frame(
  fitted   = as.vector(fitted(zi_site)),
  observed = as.vector(microcosm$Abundance)
) %>%
  ggplot(aes(x = observed, y = fitted)) +
    geom_point(size = .5, alpha = .25) +
    scale_x_log10(limits = c(1, NA)) +
    scale_y_log10(limits = c(1, NA)) +
    theme_bw() + ggplot2::annotation_logticks()
fitted value vs. observation

fitted value vs. observation

The site-dependent model also lets us recover, for each species, an estimated zero-inflation probability per body site (model_par$Pi), revealing substantial heterogeneity:

one_obs_per_site <- !duplicated(microcosm$site)
pi_hat <- zi_site$model_par$Pi[one_obs_per_site, ]
rownames(pi_hat) <- as.character(microcosm$site[one_obs_per_site])

data.frame(site = rep(rownames(pi_hat), ncol(pi_hat)), zi_prob = as.vector(pi_hat)) %>%
  ggplot(aes(x = site, y = zi_prob)) +
    geom_boxplot() + theme_bw() +
    labs(x = "Body site", y = "Zero-inflation probability (per species)")
Estimated zero-inflation probability by body site, across the 30 species

Estimated zero-inflation probability by body site, across the 30 species

Coefficient matrices for the count and zero-inflation parts can be inspected with coefficients() – here both components share the same site design, so rows of 𝐁\mathbf{B} (count) and 𝐁0\mathbf{B}_0 (zero-inflation) line up one-to-one with body sites:

pheatmap::pheatmap(coefficients(zi_site, "zero"), cluster_rows = FALSE)
Estimated regression coefficients of the zero-inflation component ($B_0$)

Estimated regression coefficients of the zero-inflation component (B0B_0)

pheatmap::pheatmap(coefficients(zi_site, "count"), cluster_rows = FALSE)
Estimated regression coefficients of the count component ($B$)

Estimated regression coefficients of the count component (BB)

Sparse network inference with ZIPLNnetwork

ZIPLNnetwork adjusts the model for a series of penalties controlling the number of edges in the network, just like PLNnetwork, but on top of a zero-inflation component. We keep the default zi = "single" here for simplicity and focus on the network. The default backend = "builtin" is used (it now finds a consistently better ELBO than "nlopt" here, at the cost of being slower); we lower min_ratio to explore a wider, sparser range of the penalty path:

zi_models <- ZIPLNnetwork(Abundance ~ site + offset(log(Offset)), data = microcosm, control = ZIPLNnetwork_param(min_ratio = 0.01))

As for PLNnetwork, a diagnostic plot and the evolution of the criteria along the path are available:

plot(zi_models, "diagnostic")
diagnostic of the ZIPLNnetwork fits

diagnostic of the ZIPLNnetwork fits

plot(zi_models)
evolution of model selection criteria

evolution of model selection criteria

We select a network with getBestModel() and represent it:

zi_net <- getBestModel(zi_models, "EBIC")
plot(zi_net)
sparse residual network between the 30 most abundant taxa

sparse residual network between the 30 most abundant taxa

As for PLNnetwork, a more robust (but more computationally intensive) stability_selection()-based choice of penalty is available – we do not run it here to keep this vignette fast, see the PLNnetwork vignette for an example.

References

Aitchison, J., and C. H. Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.
Batardière, Bastien, Julien Chiquet, François Gindraud, and Mahendra Mariadassou. 2025. “Zero-Inflation in the Multivariate Poisson Lognormal Family.” Statistics and Computing 35. https://doi.org/10.1007/s11222-025-10729-0.
Chiquet, Julien, Stephane Robin, and Mahendra Mariadassou. 2019. “Variational Inference for Sparse Network Reconstruction from Count Data.” In Proceedings of the 36th International Conference on Machine Learning, edited by Kamalika Chaudhuri and Ruslan Salakhutdinov, vol. 97. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v97/chiquet19a.html.
Mariadassou, Mahendra, Lucie X Nouvel, Fabienne Constant, et al. 2023. “Microbiota Members from Body Sites of Dairy Cows Are Largely Shared Within Individual Hosts Throughout Lactation but Sharing Is Limited in the Herd.” Animal Microbiome 5 (32). https://doi.org/10.1186/s42523-023-00252-w.
Tous, Jeanne, Julien Chiquet, Amy E. Deacon, Ada Fontrodona-Eslava, Douglas F. Fraser, and Anne E. Magurran. 2025. “A JSDM with Zero-Inflation to Improve Inference of Association Networks from Count Community Data with Structural Zeros.” https://doi.org/10.1101/2025.07.24.666553.