class: center, middle, inverse, title-slide .title[ # Poisson lognormal models for count data ] .subtitle[ ## Focus on some probabilistic factor models: Poisson LDA and Poisson PCA ] .author[ ### J. Chiquet, M. Mariadassou, S. Robin, B. Batardière + others
MIA Paris-Saclay, AgroParisTech, INRAE, Sorbonne Universyty
Last update 29 mars, 2023
] .date[ ###
https://pln-team.github.io/PLNmodels
] --- class: inverse, middle # Outline 1. .large[**Framework** of multivariate Poisson lognormal models] <br/> 2. .large[**PLN estimation** with variational inference] <br/> 3. .large[**Focus** on factor analysis with PLN (LDA, PCA)] <br/> 4. .large[**Illustration** on a scRNA dataset] --- class: inverse, center, middle # Multivariate Poisson lognormal models <br/> .small[Motivations, Framework] <!-- STATISTICAL MODEL --> --- # Generic form of data sets Routinely gathered in ecology/microbiology/genomics ### Data tables - .important[Abundances]: read counts of species/transcripts `\(j\)` in sample `\(i\)` - .important[Covariates]: value of environmental variable `\(k\)` in sample `\(i\)` - .important[Offsets]: sampling effort for species/transcripts `\(j\)` in sample `\(i\)` ### Need frameworks to model _dependencies between counts_ - understand **environmental effects** <br/> `\(\rightsquigarrow\)` explanatory models (multivariate regression, classification) - exhibit **patterns of diversity** <br/> `\(\rightsquigarrow\)` summarize the information (clustering, dimension reduction) - understand **between-species interactions** <br /> `\(\rightsquigarrow\)` 'network' inference (variable/covariance selection) - correct for technical and **confounding effects** <br/> `\(\rightsquigarrow\)` account for covariables and sampling effort --- # Illustration in genomics ## scRNA data set The dataset `scRNA` contains the counts of the 500 most varying transcripts in the mixtures of 5 cell lines in human liver (obtained with standard 10x scRNAseq Chromium protocol). We subsample 500 random cells and the keep the 200 most varying genes ```r data(scRNA); set.seed(1234) train_set <- sample.int(nrow(scRNA), 500) test_set <- setdiff(1:nrow(scRNA), train_set) scRNA_train <- scRNA[train_set, ] scRNA_test <- scRNA[test_set, ] scRNA_train$counts <- scRNA_train$counts[, 1:200] scRNA_test$counts <- scRNA_test$counts[, 1:200] ``` ### Covariates - `cell_line`: the cell line of the current row (among 5) - `total_counts`: Total number reads for that cell --- # Table of Counts .pull-left[ Matrix view (log-transform + total-counts normalization) ![log-scaled counts](slides_files/figure-html/glance counts-1.png) ] .pull-right[ Histogram (log-transform + total-counts normalization) ![](slides_files/figure-html/glimpse Abundance-1.png)<!-- --> ] --- # Models for multivariate count data ### If we were in a Gaussian world... The .important[general linear model] [MKB79] would be appropriate! For each sample `\(i = 1,\dots,n\)`, `$$\underbrace{\mathbf{Y}_i}_{\text{abundances}} = \underbrace{\mathbf{x}_i^\top \mathbf{B}}_{\text{covariates}} + \underbrace{\mathbf{o}_i}_{\text{sampling effort}} + \boldsymbol\varepsilon_i, \quad \boldsymbol\varepsilon_i \sim \mathcal{N}(\mathbf{0}_p, \underbrace{\boldsymbol\Sigma}_{\text{between-species dependencies}})$$` null covariance `\(\Leftrightarrow\)` independence `\(\rightsquigarrow\)` uncorrelated species/transcripts do not interact .content-box-red[This model gives birth to Principal Component Analysis, Discriminant Analysis, Gaussian Graphical Models, Gaussian Mixture models and many others `\(\dots\)`] ### With count data... There is no generic model for multivariate counts - Data transformation (log, `\(\sqrt{}\)`) : quick and dirty - Non-Gaussian multivariate distributions [Ino+17]: do not scale to data dimension yet - .important[Latent variable models]: interaction occur in a latent (unobserved) layer --- # The Poisson Lognormal model (PLN) The PLN model [AH89] is a .important[multivariate generalized linear model], where - the counts `\(\mathbf{Y}_i\)` are the response variables - the main effect is due to a linear combination of the covariates `\(\mathbf{x}_i\)` - a vector of offsets `\(\mathbf{o}_i\)` can be specified for each sample. .content-box-red[ $$ \mathbf{Y}_i | \mathbf{Z}_i \sim \mathcal{P}\left(\exp\{\mathbf{Z}_i\}\right), \qquad \mathbf{Z}_i \sim \mathcal{N}({\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}},\boldsymbol\Sigma), \\ $$ ] .pull-left[The unkwown parameters are - `\(\mathbf{B}\)`, the regression parameters - `\(\boldsymbol\Sigma\)`, the variance-covariance matrix ] .pull-right[ Stacking all individuals together, - `\(\mathbf{Y}\)` is the `\(n\times p\)` matrix of counts - `\(\mathbf{X}\)` is the `\(n\times d\)` matrix of design - `\(\mathbf{O}\)` is the `\(n\times p\)` matrix of offsets ] ### Properties: .small[.important[over-dispersion, arbitrary-signed covariances]] - mean: `\(\mathbb{E}(Y_{ij}) = \exp \left( o_{ij} + \mathbf{x}_i^\top {\mathbf{B}}_{\cdot j} + \sigma_{jj}/2\right) > 0\)` - variance: `\(\mathbb{V}(Y_{ij}) = \mathbb{E}(Y_{ij}) + \mathbb{E}(Y_{ij})^2 \left( e^{\sigma_{jj}} - 1 \right) > \mathbb{E}(Y_{ij})\)` - covariance: `\(\mathrm{Cov}(Y_{ij}, Y_{ik}) = \mathbb{E}(Y_{ij}) \mathbb{E}(Y_{ik}) \left( e^{\sigma_{jk}} - 1 \right).\)` --- class: inverse, center, middle # Variational inference for standard PLN<br/> .small[Optimisation] <!-- VARIATIONAL INFERENCE --> --- # PLN Inference: general ingredients Estimate `\(\theta = (\mathbf{B}, \boldsymbol\Sigma)\)`, predict the `\(\mathbf{Z}_i\)`, while the model marginal likelihood is `$$p_\theta(\mathbf{Y}_i) = \int_{\mathbb{R}_p} \prod_{j=1}^p p_\theta(Y_{ij} | Z_{ij}) \, p_\theta(\mathbf{Z}_i) \mathrm{d}\mathbf{Z}_i$$` ### Expectation-Maximization With `\(\mathcal{H}(p) = -\mathbb{E}_p(\log(p))\)` the entropy of `\(p\)`, `$$\log p_\theta(\mathbf{Y}) = \mathbb{E}_{p_\theta(\mathbf{Z}\,|\,\mathbf{Y})} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[p_\theta(\mathbf{Z}\,|\,\mathbf{Y})]$$` EM requires to evaluate (some moments of) `\(p_\theta(\mathbf{Z} \,|\, \mathbf{Y})\)`, but there is no close form! ### Variational approximation [WJ08] Use a proxy `\(q_\psi\)` of `\(p_\theta(\mathbf{Z}\,|\,\mathbf{Y})\)` minimizing a divergence in a class `\(\mathcal{Q}\)` .small[(e.g, Küllback-Leibler divergence)] `$$q_\psi(\mathbf{Z})^\star \arg\min_{q\in\mathcal{Q}} D\left(q(\mathbf{Z}), p(\mathbf{Z} | \mathbf{Y})\right), \, \text{e.g.}, D(.,.) = KL(., .) = \mathbb{E}_{q_\psi}\left[\log \frac{q(z)}{p(z)}\right].$$` --- # Inference: specific ingredients Consider `\(\mathcal{Q}\)` the class of diagonal multivariate Gaussian distributions: `$$\Big\{q: \, q(\mathbf{Z}) = \prod_i q_i(\mathbf{Z}_i), \, q_i(\mathbf{Z}_i) = \mathcal{N}\left(\mathbf{Z}_i; \mathbf{m}_i, \mathrm{diag}(\mathbf{s}_i \circ \mathbf{s}_i)\right), \boldsymbol\psi_i = (\mathbf{m}_i, \mathbf{s}_i) \in\mathbb{R}_p\times\mathbb{R}_p \Big\}$$` and maximize the ELBO (Evidence Lower BOund) `$$\begin{aligned}J(\theta, \psi) & = \log p_\theta(\mathbf{Y}) - KL[q_\psi (\mathbf{Z}) || p_\theta(\mathbf{Z} | \mathbf{Y})] \\ & = \mathbb{E}_{\psi} [\log p_\theta(\mathbf{Y}, \mathbf{Z})] + \mathcal{H}[q_\psi(\mathbf{Z})] \\ & = \frac{1}{n} \sum_{i = 1}^n J_i(\theta, \psi_i),\end{aligned}$$` where, letting `\(\mathbf{A}_i = \mathbb{E}_{q_i}[\exp(Z_i)] = \exp\left( \mathbf{o}_i + \mathbf{m}_i + \frac{1}{2}\mathbf{s}^2_i\right)\)`, we have `$$\begin{aligned} J_i(\theta, \psi_i) = &\mathbf{Y}_i^\intercal(\mathbf{o}_i + \mathbf{m}_i) - \left(\mathbf{A}_i - \frac{1}{2}\log(\mathbf{s}^2_i)\right) ^\intercal \mathbf{1}_p + \frac{1}{2} |\log|{\boldsymbol\Omega}| \\ & - \frac{1}{2}(\mathbf{m}_i - \boldsymbol{\Theta}\mathbf{x}_i)^\intercal \boldsymbol{\Omega} (\mathbf{m}_i - \boldsymbol{\Theta}\mathbf{x}_i) - \frac{1}{2} \mathrm{diag}(\boldsymbol\Omega)^\intercal\mathbf{s}^2_i + \mathrm{cst} \end{aligned}$$` --- # Resulting Variational EM .important[Alternate] until convergence between - VE step: optimize `\(\boldsymbol{\psi}\)` (can be written individually) `$$\psi_i^{(h)} = \arg \max J_{i}(\theta^{(h)}, \psi_i) \left( = \arg\min_{q_i} KL[q_i(\mathbf{Z}_i) \,||\, p_{\theta^h}(\mathbf{Z}_i\,|\,\mathbf{Y}_i)] \right)$$` - M step: optimize `\(\theta\)` `$$\theta^{(h)} = \arg\max \frac{1}{n}\sum_{i=1}^{n}J_{Y_i}(\theta, \psi_i^{(h)})$$` We end up with a `\(M\)`-estimator: $$ `\begin{equation} \hat{\theta}^{\text{ve}} = \arg\max_{\theta} \left( \frac{1}{n}\sum_{i=1}^{n} \sup_{\psi_i} J_i(\theta, \psi_i) \right) = \arg\max_{\theta} \underbrace{\left(\frac{1}{n}\sum_{i=1}^{n} \bar{J}_i(\theta) \right)}_{\bar{J}_n(\theta)} \end{equation}` $$ `\(\hat{\theta}^{\text{ve}}\)` is asymptotically unbiased. .important[Variance can be estimated with bootstrap/Jackknife.] --- # Optimization of simple PLN models ### Property of the objective function The ELBO `\(J(\theta, \psi)\)` is bi-concave, i.e. - concave wrt `\(\psi = (\mathbf{M}, \mathbf{S})\)` for given `\(\theta\)` - convace wrt `\(\theta = (\boldsymbol\Sigma, \mathbf{B})\)` for given `\(\psi\)` but .important[not jointly concave] in general. ### M-step: analytical `$$\hat{{\mathbf{B}}} = \left(\mathbf{X}^\top \mathbf{X}\right)^{-1} \mathbf{X} \mathbf{M}, \quad \hat{{\boldsymbol\Sigma}} = \frac{1}{n} \left(\mathbf{M}-\mathbf{X}\hat{{\mathbf{B}}}\right)^\top \left(\mathbf{M}-\mathbf{X}\hat{\mathbf{B}}\right) + \frac{1}{n} \mathrm{diag}(\mathbf{1}^\intercal\mathbf{S}^2)$$` ### VE-step: gradient ascent `$$\frac{\partial J(\psi)}{\partial \mathbf{M}} = \left(\mathbf{Y} - \mathbf{A} - (\mathbf{M} - \mathbf{X}{\mathbf{B}}) \mathbf{\Omega}\right), \qquad \frac{\partial J(\psi)}{\partial \mathbf{S}} = \frac{1}{\mathbf{S}} - \mathbf{S} \circ \mathbf{A} - \mathbf{S} \mathrm{D}_{\boldsymbol\Omega} .$$` `\(\rightsquigarrow\)` Same routine for other PLN variants. --- # Implementations #### Medium scale problems (R/C++ package) - **algorithm**: conservative convex separable approximations [Sva02] - **implementation**: `NLopt` nonlinear-optimization library [Joh11] <br/> `\(\rightsquigarrow\)` Up to thousands of sites ( `\(n \approx 1000s\)` ), hundreds of species ( `\(p\approx 100s\)` ) #### Large scale problems (Python/Pytorch module) - **algorithm**: `Rprop` (gradient sign + adaptive variable-specific update) [RB93] - **implementation**: `torch` with GPU auto-differentiation [FL22; Pas+17] <br/> `\(\rightsquigarrow\)` Up to `\(n \approx 100,000\)` and `\(p\approx 10,000s\)` <div class="figure" style="text-align: center"> <img src="figs/final_n=10000,p=2000.png" alt="n = 10,000, p = 2,000, d = 2 (running time: 1 min 40s)" width="30%" /> <p class="caption">n = 10,000, p = 2,000, d = 2 (running time: 1 min 40s)</p> </div> --- # Availability ### Help and documentation - github group <https://github.com/pln-team> - PLNmodels website <https://pln-team.github.io/PLNmodels> ### R/C++ Package `PLNmodels` Last stable release on CRAN, development version available on GitHub). ```r install.packages("PLNmodels") remotes::install_github("PLN-team/PLNmodels@dev") ``` ```r library(PLNmodels) packageVersion("PLNmodels") ``` ``` ## [1] '1.0.1' ``` ### Python module `pyPLNmodels` A PyTorch implementation is available in pyPI [https://pypi.org/project/pyPLNmodels/](https://pypi.org/project/pyPLNmodels/) --- # PLN with offsets and covariates - Cell-line effect is in the regression coefficient (groupwise or common mean) - Spurious effect regarding the interactions between genes (full or diagonal covariance). ## Offset: .small[modeling sampling effort] The predefined offset uses the total sum of reads. ```r M01_scRNA <- PLN(counts ~ 1 + offset(log(total_counts)), scRNA_train) M02_scRNA <- PLN(counts ~ 1 + offset(log(total_counts)), scRNA_train, control = PLN_param(covariance = "diagonal")) ``` ## Covariates: .small[cell-line effect ('ANOVA'-like) ] The `cell_line` is a natural candidate for explaining a large of the variance. ```r M11_scRNA <- PLN(counts ~ 0 + cell_line + offset(log(total_counts)), scRNA_train) M12_scRNA <- PLN(counts ~ 0 + cell_line + offset(log(total_counts)), scRNA_train, control = PLN_param(covariance = "diagonal")) ``` --- # PLN with offsets and covariates (2) There is a clear gain in introducing the cell_line covariate in the model: ```r rbind(M01 = M01_scRNA$criteria, M02 = M02_scRNA$criteria, M11 = M11_scRNA$criteria, M12 = M12_scRNA$criteria) %>% knitr::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> nb_param </th> <th style="text-align:right;"> loglik </th> <th style="text-align:right;"> BIC </th> <th style="text-align:right;"> ICL </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> M01 </td> <td style="text-align:right;"> 20300 </td> <td style="text-align:right;"> -203318.4 </td> <td style="text-align:right;"> -266396.7 </td> <td style="text-align:right;"> -294420.7 </td> </tr> <tr> <td style="text-align:left;"> M02 </td> <td style="text-align:right;"> 400 </td> <td style="text-align:right;"> -258421.9 </td> <td style="text-align:right;"> -259664.8 </td> <td style="text-align:right;"> -342792.9 </td> </tr> <tr> <td style="text-align:left;"> M11 </td> <td style="text-align:right;"> 21100 </td> <td style="text-align:right;"> -199111.2 </td> <td style="text-align:right;"> -264675.3 </td> <td style="text-align:right;"> -289661.4 </td> </tr> <tr> <td style="text-align:left;"> M12 </td> <td style="text-align:right;"> 1200 </td> <td style="text-align:right;"> -216457.5 </td> <td style="text-align:right;"> -220186.3 </td> <td style="text-align:right;"> -275057.6 </td> </tr> </tbody> </table> --- # PLN with offsets and covariates (3) Looking at the coefficients `\(\mathbf{B}\)` associated with `cell_line` bring additional insights: <img src="slides_files/figure-html/scRNA matrix plot-1.png" style="display: block; margin: auto;" /> --- # Simple torch example in `R` ```r data("oaks") system.time(myPLN_torch <- PLN(Abundance ~ 1 + offset(log(Offset)), data = oaks, control = PLN_param(backend = "torch", trace = 0))) ``` ``` ## user system elapsed ## 3.378 0.438 3.067 ``` ```r system.time(myPLN_nlopt <- PLN(Abundance ~ 1 + offset(log(Offset)), data = oaks, control = PLN_param(backend = "nlopt", trace = 0))) ``` ``` ## user system elapsed ## 7.466 0.613 6.900 ``` ```r myPLN_torch$loglik ``` ``` ## [1] -32224.15 ``` ```r myPLN_nlopt$loglik ``` ``` ## [1] -32097.67 ``` --- class: inverse, center, middle # Probabilistic factor models in PLN<br/> .small[Poisson LDA and Poisson PCA] --- # Natural extensions of PLN ### Various tasks of multivariate analysis - .important[Dimension Reduction]: rank constraint matrix `\(\boldsymbol\Sigma\)`. `$$\mathbf{Z}_i \sim \mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.$$` - .important[Classification]: maximize separation between groups with means `$$\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.$$` - .important[Clustering]: mixture model in the latent space `$$\mathbf{Z}_i \mid i \in k \sim \mathcal{N}(\boldsymbol\mu_k, \boldsymbol\Sigma_k), \quad \text{for unknown memberships}.$$` - .important[Network inference]: sparsity constraint on inverse covariance. `$$\mathbf{Z}_i \sim \mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma = \boldsymbol\Omega^{-1}), \quad \|\boldsymbol\Omega \|_1 < c.$$` - .important[Variable selection]: sparsity constraint on regression coefficients `$$\mathbf{Z}_i \sim \mathcal{N}(\mathbf{x}_i^\top\mathbf{B}, \boldsymbol\Sigma), \quad \|\mathbf{B} \|_1 < c.$$` --- # Natural extensions of PLN ### Tasks seen in Factor Analysis - .important[Dimension Reduction]: rank constraint matrix `\(\boldsymbol\Sigma\)`. .important[PCA] `$$\mathbf{Z}_i \sim \mathcal{N}(\boldsymbol\mu, \boldsymbol\Sigma = \mathbf{C}\mathbf{C}^\top), \quad \mathbf{C} \in \mathcal{M}_{pk} \text{ with orthogonal columns}.$$` - .important[Classification]: maximize separation between groups with means .important[LDA] `$$\mathbf{Z}_i \sim \mathcal{N}(\sum_k {\boldsymbol\mu}_k \mathbf{1}_{\{i\in k\}}, \boldsymbol\Sigma), \quad \text{for known memberships}.$$` --- # Background: Gaussian LDA ### Model Assume the samples are distributed in `\(K\)` groups and note - `\(\mathbf{G}\)` the group membership matrix - `\(\mathbf{U} = [{\boldsymbol\mu}_1, \dots, {\boldsymbol\mu}_K]\)` the matrix of group-specific means The model is `$$\mathbf{Z}_i = \mathcal{N}(\mathbf{G}_i^\top \mathbf{U}, {\boldsymbol\Sigma})$$` `\(\rightsquigarrow\)` Aim of LDA: Find the linear combination(s) `\(\mathbf{Z} u\)` maximizing separation between groups ### Solution Find the first eigenvectors of `\(\mathbf{\Sigma}_w^{-1} \mathbf{\Sigma}_b\)` where - `\(\mathbf{\Sigma}_w\)` is the _within_-group variance matrix, i.e. the unbiased estimated of `\({\boldsymbol\Sigma}\)`: - `\(\mathbf{\Sigma}_b\)` is the _between_-group variance matrix, estimated from `\(\mathbf{U}\)`. --- # Poisson lognormal LDA (1) Similar to normal PLN with - `\(\mathbf{X} \rightarrow (\mathbf{X}, \mathbf{G})\)` - `\(\mathbf{B} \rightarrow (\mathbf{B}, \mathbf{U})\)` ### Inference - Use .important[variational inference] to estimate `\((\mathbf{B}, \mathbf{U})\)`, `\(\mathbf{\Sigma}\)` and `\(\mathbf{Z}_i\)` - Compute `\(\hat{\mathbf{\Sigma}}_b\)` as `$$\hat{\mathbf{\Sigma}}_b = \frac1{K-1} \sum_k n_k (\hat{{\boldsymbol\mu}}_k - \hat{{\boldsymbol\mu}}_\bullet) (\hat{{\boldsymbol\mu}}_k - \hat{{\boldsymbol\mu}}_\bullet)^\intercal$$` - Compute first `\(K-1\)` eigenvectors of `\(\hat{{\boldsymbol\Sigma}}^{-1} \hat{\mathbf{\Sigma}}_b = \mathbf{P}^\top \Lambda \mathbf{P}\)` --- # Poisson lognormal LDA (2) ### Graphical representation Mimick Gaussian LDA: - Center the estimated latent positions `\(\tilde{\mathbf{Z}}\)` - Compute the estimated coordinates along the discriminant axes `$$\tilde{\mathbf{Z}}^{LDA} = \tilde{\mathbf{Z}} \mathbf{P} \Lambda^{1/2}$$` ### Prediction For each group `\(k\)` - Assume that the new sample `\(\mathbf{Y}_{\text{new}}\)` comes from group `\(k\)` - Compute (variational) _likelihood_ `\(p_k = \mathbb{P}(\mathbf{Y}_{\text{new}} | \hat{{\boldsymbol\Sigma}}, \hat{\mathbf{\Sigma}_b}, \hat{{\boldsymbol\mu}}_k)\)` - Compute posterior probability `\(\pi_k \propto \frac{n_k p_k}{n}\)` `\(\rightsquigarrow\)` Assign to group with highest `\(\pi_k\)` --- # Discriminant Analysis (scRNA, 1) Use the `cell-line` variable for grouping (`grouping` is a factor of group to be considered) ```r myLDA_cell_line <- PLNLDA(counts ~ 1 + offset(log(total_counts)), grouping = cell_line, data = scRNA_train) ``` .pull-left[ ![](slides_files/figure-html/plot-lda-oaks1-1.png)<!-- --> ] .pull-right[ ![](slides_files/figure-html/plot-lda-oaks2-1.png)<!-- --> ] --- # Discriminant Analysis (scRNA, 2) Consider now a diagonal covariance. ```r myLDA_cell_line_diag <- PLNLDA(counts ~ 1 + offset(log(total_counts)), grouping = cell_line, data = scRNA_train, control = PLN_param(covariance = "diagonal")) ``` .pull-left[ ![](slides_files/figure-html/plot-lda-diag-oaks1-1.png)<!-- --> ] .pull-right[ ![](slides_files/figure-html/plot-lda-diag-oaks2-1.png)<!-- --> ] --- # Discriminant Analysis (scRNA, 3) We can prediction cell-line of some new data: let us try either diagonal or fully parametrized covariance. ```r pred_cell_line <- predict(myLDA_cell_line , newdata = scRNA_test, type = "response") pred_cell_line_diag <- predict(myLDA_cell_line_diag, newdata = scRNA_test, type = "response") ``` The ARI on the test set is impressive.footnote[The problem should be easy though...] ```r aricode::ARI(pred_cell_line, scRNA_test$cell_line) ``` ``` ## [1] 0.9770576 ``` ```r aricode::ARI(pred_cell_line_diag, scRNA_test$cell_line) ``` ``` ## [1] 0.9683799 ``` --- # Discriminant Analysis (scRNA, 4) Let us explore the discriminant groups: ```r heatmap(exp(myLDA_cell_line$group_means)) ``` <img src="slides_files/figure-html/LDA-heatmap-factors-1.png" style="display: block; margin: auto;" /> Indeed, some groups of gene caracterize well the cell-lines. --- # Poisson Lognormal PCA ### Model `$$\begin{array}{rcl} \mathbf{Z}_i & \sim^\text{iid} \mathcal{N}_p({\boldsymbol 0}_p, {\boldsymbol\Sigma}), & \text{rank}({\boldsymbol\Sigma}) = k \ll p \\ \mathbf{Y}_i \,|\, \mathbf{Z}_i & \sim \mathcal{P}(\exp\{\mathbf{O}_i + \mathbf{X}_i \mathbf{B} + \mathbf{Z}_i\}) \end{array}$$` Recall that: `\(\text{rank}({\boldsymbol\Sigma}) = q \quad \Leftrightarrow \quad \exists \mathbf{C} (p \times q): \Sigma = \mathbf{C} \mathbf{C}^\intercal\)`. ### Estimation Variational inference $$\text{maximize } J({\boldsymbol\beta}, {\boldsymbol\psi}) $$ `\(\rightsquigarrow\)` Still bi-concave in `\({\boldsymbol\beta} = (\mathbf{C}, \mathbf{B})\)` and `\({\boldsymbol\psi} = (\mathbf{M}, \mathbf{S})\)` --- # Model selection and Visualization ### Number of components/rank `\(k\)` needs to be chosen. `\(\log p_{\hat{\boldsymbol\beta}}(\mathbf{Y})\)` intractable: use variational "likelihood" `\(J(\hat{\boldsymbol\beta}, \hat{\boldsymbol\psi})\)` - BIC `\(\rightsquigarrow\)` `\(\text{vBIC}_k = J(\hat{\boldsymbol\beta}, \tilde{p}) - \frac12 p (d + k) \log(n)\)` - ICL `\(\rightsquigarrow\)` `\(\text{vICL}_k = \text{vBIC}_k - \mathcal{H}(\tilde{p})\)` $$ \hat{k} = \arg\max_k \text{vBIC}_k \qquad \text{or} \qquad \hat{k} = \arg\max_k \text{vICL}_k $$ ### Visualization - Gaussian PCA: Optimal subspaces nested when `\(q\)` increases. - PLN-pPCA: Non-nested subspaces. For the selected dimension dimension `\(\hat{k}\)`: - Compute the estimated latent positions `\(\mathbb{E}_q(\mathbf{Z}_i) = \mathbf{M} \hat{\mathbf{C}}^\top\)` - Perform PCA on the `\(\mathbf{M} \hat{\mathbf{C}}^\top\)` - Display result in any dimension `\(k \leq \hat{k}\)` --- # PCA: Goodness of fit .important[pPCA:] Cumulated sum of the eigenvalues = \% of variance preserved on the first `\(q\)` components. ### PLN-pPCA Deviance based criterion. - Compute `\(\tilde{\mathbf{Z}}^{(k)} = \mathbf{O} + \mathbf{X} \hat{\mathbf{B}}^\top + \mathbf{M}^{(k)} \left(\hat{\mathbf{C}}^{(k)}\right)^\top\)` - Take `\(\lambda_{ij}^{(k)} = \exp\left(\tilde{Z}_{ij}^{(k)}\right)\)` - Define `\(\lambda_{ij}^{\min} = \exp( \tilde{Z}_{ij}^0)\)` and `\(\lambda_{ij}^{\max} = Y_{ij}\)` - Compute the Poisson log-likelihood `\(\ell_k = \log \mathbb{P}(\mathbf{Y}; \lambda^{(k)})\)` ### Pseudo R² `$$R_k^2 = \frac{\ell_k - \ell_{\min}}{\ell_{\max} - \ell_{\min}}$$` --- # A PCA analysis of the scRNA data set (1) ```r PCA_scRNA <- PLNPCA(counts ~ 1 + offset(log(total_counts)), data = scRNA_train, ranks = c(1, 2, seq(5, 40, 5))) ``` ### Model Selection <img src="slides_files/figure-html/PCA offset vizu cell-line-1.png" style="display: block; margin: auto;" /> --- # A PCA analysis of the scRNA data set (2) ### Biplot Individual + Factor map - 40 most contributing genes <small> <img src="slides_files/figure-html/PCA offset vizu tree-1.png" style="display: block; margin: auto;" /> </small> --- class: inverse, center, middle # On-going works --- # Optimisation, Algorithms With .important[Bastien Batardière, Joon Kwon, Julien Stoehr] ## Exact Maximization Direct likelihood optim (SGD with Important Sampling) ## Optimization Optimisation guarantees coupling adaptive SGD + variance reduction --- # PLN LDA: Ongoing Extension With .important[Nicolas Jouvin] ### Quadratic Discriminant Analysis - Relax assumption of common covariance between groups - Tests Linear vs Quadratic Discriminant ### Regularized Discriminant Analysis - Ridge-like regularized Covariance - Sparse Covariance - Block-wise Covariance `\(\rightsquigarrow\)` better assess the structure of the sub-population --- # PLN PCA: Ongoing Extension With .important[Nicolas Jouvin] ## Mixture of PLN PCA - Assume several sub-population in the latent space - Each sub-population has is represented by a different linear combinasion of the initial variables - In the PLN setup, several variational approximation are possible ## Functional PCA PhD of Barbara Bricout (S. Robin, S. Donnet) Features are time points --- # Temporal/Spatial dependencies With .important[Stéphane Robin, Mahendra Mariadassou] ## 'In-row' dependency structure Consider, e.g., a multivariate Gaussian auto-regressive process on the latent variable `\(\mathbf{Z}\)` `$$\mathbf{Z}_1 \sim \mathcal{N}(0, {\boldsymbol\Sigma}), \quad \mathbf{Z}_t = \mathbf{A} \mathbf{Z}_{t-1} + {\boldsymbol\varepsilon}_t \quad ({\boldsymbol\varepsilon})_t \text{ iid } \sim \mathcal{N}(0, \Delta^{-1})$$` The counts are described by `\((\mathbf{Y}_t) \in \mathbb{N}^p\)` with time `\(t\geq 0\)` `$$(\mathbf{Y}_t) \text{ indep. } \mathbf{Z}_t: \quad \mathbf{Y}_t \sim \mathcal{P}\left(\exp\left\{ \mathbf{x}_t^\top \mathbf{B} + \mathbf{Z}_t\right\}\right)$$` `\(\rightsquigarrow\)` Camille Mondon's MsC ## Inference Based on Kalman Filter in the latent layer + variational approximation --- # A zero-inflated PLN With .important[Bastien Batardière, Mahendra Mariadassou] ### Motivations - account for a large amount of zero, i.e. with single-cell data, - try to separate "true" zeros from "technical"/dropouts ### The Model Use two latent vectors `\(\mathbf{W}_i\)` and `\(\mathbf{Z}_i\)` to model excess of zeroes and dependence structure `$$\begin{aligned} \mathbf{Z}_i & \sim \mathcal{N}({\mathbf{o}_i + \mathbf{x}_i^\top\mathbf{B}},\boldsymbol\Sigma) \\ W_{ij} & \sim \mathcal{B}(\text{logit}^{-1}(\mathbf{x}_i^\top{\mathbf{B}}_j^0)) \\ Y_{ij} \, | \, W_{ij}, Z_{ij} & \sim W_{ij}\delta_0 + (1-W_{ij}) \mathcal{P}\left(\exp\{Z_{ij}\}\right), \\ \end{aligned}$$` The unkwown parameters are - `\(\mathbf{B}\)`, the regression parameters (from the PLN component) - `\(\mathbf{B}^0\)`, the regression parameters (from the Bernoulli component) - `\(\boldsymbol\Sigma\)`, the variance-covariance matrix `\(\rightsquigarrow\)` ZI-PLN is a mixture of PLN and Bernoulli distribution with shared covariates. --- # ZI-PLN Inference ### Variational approximation 1. `\begin{equation*} p(\mathbf{Z}_i, \mathbf{W}_i \mathbf{Y}_i) \approx q_{\psi}(\mathbf{Z}_i, \mathbf{W}_i) \approx q_{\psi_1}(\mathbf{Z}_i) q_{\psi_2}(\mathbf{W}_i) \end{equation*}` with `\begin{equation*} q_{\psi_1}(\mathbf{Z}_i) = \mathcal{N}(\mathbf{Z}_i; \mathbf{m}_{i}, \mathrm{diag}(\mathbf{s}_{i} \circ \mathbf{s}_{i})), \qquad q_{\psi_2}(\mathbf{W}_i) = \otimes_{j=1}^p \mathcal{B}(W_{ij}, \pi_{ij}) \end{equation*}` .important[Tested, works _partially_] Too rough variational approximation ### Variational approximation 1. `\begin{equation*} p(\mathbf{Z}_i, \mathbf{W}_i \mathbf{Y}_i) \approx q_{\psi}(\mathbf{Z}_i, \mathbf{W}_i) \approx q_{\psi_1}(\mathbf{Z}_i|\mathbf{W}_i) q_{\psi_2}(\mathbf{W}_i) \end{equation*}` `\begin{equation*} q_{\psi_1}(\mathbf{Z}_i | \mathbf{W}_i) = \text{ mixture of Gaussians}, \quad q_{\psi_2}(\mathbf{W}_i) = \otimes_{j=1}^p \mathcal{B}(W_{ij}, \pi_{ij}) \end{equation*}` --- # Conclusion ## Summary - PLN = generic model for multivariate count data analysis - Flexible modeling of the covariance structure, allows for covariates - Efficient V-EM algorithm ## Advertisement [https://computo.sfds.asso.fr](https://computo.sfds.asso.fr), a journal promoting reproducible research in ML and stat. --- # References <small> Aitchison, J. and C. Ho (1989). "The multivariate Poisson-log normal distribution". In: _Biometrika_ 76.4, pp. 643-653. Chiquet, J., M. Mariadassou, and S. Robin (2018). "Variational inference for probabilistic Poisson PCA". In: _The Annals of Applied Statistics_ 12, pp. 2674-2698. URL: [http://dx.doi.org/10.1214/18-AOAS1177](http://dx.doi.org/10.1214/18-AOAS1177). Chiquet, J., M. Mariadassou, and S. Robin (2019). "Variational inference for sparse network reconstruction from count data". In: _Proceedings of the 19th International Conference on Machine Learning (ICML 2019)_. Chiquet, J., M. Mariadassou, and S. Robin (2021). "The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances". In: _Frontiers in Ecology and Evolution_ 9. DOI: [10.3389/fevo.2021.588292](https://doi.org/10.3389%2Ffevo.2021.588292). Facon, B., A. Hafsi, M. C. de la Masselière, et al. (2021). "Joint species distributions reveal the combined effects of host plants, abiotic factors and species competition as drivers of species abundances in fruit flies". In: _Ecological Letters_. DOI: [10.1111/ele.13825](https://doi.org/10.1111%2Fele.13825). Falbel, D. and J. Luraschi (2022). _torch: Tensors and Neural Networks with 'GPU' Acceleration_. https://torch.mlverse.org/docs, https://github.com/mlverse/torch. Inouye, D. I., E. Yang, G. I. Allen, et al. (2017). "A review of multivariate distributions for count data derived from the Poisson distribution". In: _Wiley Interdisciplinary Reviews: Computational Statistics_ 9.3. Johnson, S. G. (2011). _The NLopt nonlinear-optimization package_. URL: [http://ab-initio.mit.edu/nlopt](http://ab-initio.mit.edu/nlopt). Lejal, E., J. Chiquet, J. Aubert, et al. (2021). "Temporal patterns in Ixodes ricinus microbial communities: an insight into tick-borne microbe interactions". In: _Microbiome_ 9.153. DOI: [10.1186/s40168-021-01051-8](https://doi.org/10.1186%2Fs40168-021-01051-8). Mardia, K., J. Kent, and J. Bibby (1979). _Multivariate analysis_. Academic press. Paszke, A., S. Gross, S. Chintala, et al. (2017). "Automatic differentiation in pytorch". Riedmiller, M. and H. Braun (1993). "A direct adaptive method for faster backpropagation learning: The RPROP algorithm". In: _IEEE international conference on neural networks_. IEEE. , pp. 586-591. Svanberg, K. (2002). "A class of globally convergent optimization methods based on conservative convex separable approximations". In: _SIAM journal on optimization_ 12.2, pp. 555-573. Wainwright, M. J. and M. I. Jordan (2008). "Graphical Models, Exponential Families, and Variational Inference". In: _Found. Trends Mach. Learn._ 1.1-2, pp. 1-305. </small>