All the models in PLNmodels are fitted by variational inference: each one maximises a tractable lower bound of an otherwise intractable log-likelihood. This document describes, model by model, which optimizer is used and, for the home-made "builtin" backend, the mathematical principles behind it.
1 The common variational framework
Every model shares the same Poisson log-normal core. For a count matrix , offsets and covariates , the latent Gaussian vectors drive Poisson emissions .
Because the marginal likelihood integrates out , inference maximises a variational lower bound (the ELBO) obtained with a factorised Gaussian approximation . Writing , and , the bound reads
The variational parameters are the means and variances (one pair per observation and per latent dimension); the model parameters are the regression coefficients and the covariance structure (or ). Optimization alternates a VE-step (update at fixed model parameters) with an M-step (update ), until the ELBO stabilises. The different models change what structure is imposed on (or on the emission), not this general scheme. Throughout, the variances are optimised in the unconstrained parametrisation .
2 Which optimizer for which model
Three backends are available, selected through the backend argument of each *_param() helper:
-
nloptβ a first-order solver from the NLOPT library (the CCSAQ conservative convex separable approximation). Robust and cheap per iteration but needs many iterations. -
builtinβ the home-made second-order optimizer implemented directly in the package (the subject of Β§3). It uses the analytic Hessian of the bound. -
torchβ automatic differentiation with first-order optimizers (Adam, RPROP, β¦). Experimental; typically reaches a lower bound than the other two.
| Model | Imposed structure | Backends | Default | builtin optimizer |
|---|---|---|---|---|
PLN |
full / diagonal / spherical / fixed / genetic | nlopt, builtin, torch | nlopt | joint coordinate Newton (Β§3.1) |
PLNLDA |
PLN with per-group means | (via PLN) |
nlopt | as PLN
|
PLNPCA |
rank- | nlopt, builtin, torch | nlopt | profiled trust-region Newton (Β§3.4) |
PLNnetwork |
sparse () | builtin, nlopt | builtin | Newton VE + graphical-lasso M-step (Β§3.2) |
PLNmixture |
mixture of PLN | builtin, nlopt, torch | builtin | outer mixture-EM + per-component builtin (Β§3.5) |
ZIPLN |
zero-inflation + structure | builtin, nlopt | builtin | joint Newton on (Β§3.3) |
ZIPLNnetwork |
zero-inflation + sparse | builtin, nlopt | builtin | as ZIPLN + graphical-lasso |
The default is nlopt for PLN/PLNPCA (conservative, well tested) and builtin for the penalised / structured models, where the second-order steps pay off most.
3 The builtin backend: mathematical principles
The common thread is that the VE-step is concave and separable, which makes it an ideal target for Newtonβs method with a cheaply invertible Hessian; the M-steps are then either closed-form or a small convex problem.
3.1 PLN and its covariance variants β joint coordinate Newton
At fixed the bound decouples across observations, and for each observation the pair is updated by a Newton step that is joint in but diagonal across the coordinates. Per entry the gradient is
and the Hessian block, using , is
so the joint step is obtained by inverting a matrix in closed form (compute_joint_step_MS in src/covariance_pln.h). Solving together captures the coupling between the mean and its variance and converges in a handful of passes. The cross term is exactly this coupling; a separate fixed point for would ignore it.
The M-step is analytic. The regression coefficients are profiled out, , and the covariance is the empirical variational covariance, , projected onto the requested structure (full, diagonal, spherical, fixed, or the genetic solved by 1-D bisection on ). All variants share one templated implementation (CovTraitsBase), each supplying only its covariance-specific primitives.
3.2 PLNnetwork β Newton VE-step + graphical-lasso M-step
PLNnetwork adds an penalty on the precision to recover a sparse conditional-dependence graph. The VE-step is unchanged (Β§3.1 Newton), and the covariance M-step becomes a graphical lasso:
solved efficiently with glassoFast along the penalty path . Successive penalties are warm-started, and the inception model is only partially converged (maxit_ve = 1, inception_niter = 5) so that the latent means do not over-fit the unpenalised optimum before the sparse grid begins.
3.3 ZIPLN β joint Newton on
Zero-inflated PLN augments the model with Bernoulli βexcess-zeroβ indicators; the variational approximation adds posterior probabilities of being an excess zero. The builtin VE-step is again a Newton step, now joint in , with updated inside the C++ solver (a closed-form logistic update given ). The effective Poisson rate becomes , which reuses exactly the PLN covariance machinery with the substitution . The M-step updates the zero-inflation coefficients and the covariance (full/diagonal/spherical/fixed, or sparse via graphical lasso for ZIPLNnetwork).
3.4 PLNPCA β profiled trust-region Newton
PLNPCA constrains to rank , so the latent is with -dimensional scores and loadings . NaΓ―ve block-coordinate ascent stalls on the saddle points of this non-convex landscape; the builtin optimizer avoids them with a profiled, saddle-aware trust-region Newton.
Profiling. The score block is concave and is solved by a per-observation Newton VE-step (a solve for each , plus a fixed point for ). This defines the profiled objective on the loadings alone, By the envelope theorem its gradient is free β it is simply the partial gradient evaluated at the optimal .
Reduced (Schur) Hessian. Writing (the loadings) and (the profiled scores), the Hessian of is the Schur complement It is never formed. The inner block is block-diagonal per observation (, inverted analytically); the cross/outer terms are applied through analytic directional derivatives of the gradient, so a Hessianβvector product costs only a couple of gradient evaluations plus small solves.
Saddle-aware trust region. The reduced Hessian is indefinite (the landscape is saddle-rich), so is maximised with a trust-region method whose subproblem is solved by a preconditioned Steihaug conjugate gradient. The preconditioner is the analytic diagonal of (a Jacobi metric that absorbs the very different scales of and ), and the conjugate gradient follows negative-curvature directions to the trust-region boundary β precisely the directions that let the iterate escape saddles rather than stall at them. This reaches a variational bound at least as high as nlopt across datasets, in one to two orders of magnitude fewer outer iterations.
Large data
On large problems the reduced Hessian is strongly indefinite and is highly non-quadratic for large steps, so the trust region β not the linear solver β bounds progress: the outer iteration keeps improving the bound but converges slowly and is capped by
maxit_out(default 150) rather than by the gradient tolerance. For the best bound on large data, raise it, e.g.PLNPCA_param(backend = "builtin", config_optim = list(maxit_out = 300)).
3.5 PLNmixture β mixture-EM with builtin components
A PLN mixture with components is fitted by a genuine EM loop at the R level: an E-step updates the posterior class probabilities (a soft-max of the component variational bounds plus the log mixing proportions), and an M-step re-fits each component as a weighted PLN β using the builtin (or nlopt) optimizer of Β§3.1 with observation weights β and updates the shared covariate effects. The mixing proportions are the column means of .
4 Summary
- All models maximise the same kind of Poisson log-normal variational bound; the optimizers differ in how they exploit its structure.
- The
builtinbackend is a second-order method throughout: a joint Newton VE-step everywhere, closed-form or graphical-lasso covariance M-steps, and β forPLNPCAβ a profiled, saddle-aware trust-region Newton on the loadings. -
nloptremains the default forPLN/PLNPCAfor its robustness;builtinis the default for the penalised and structured variants, where its second-order steps and better optima matter most.torchis experimental.
