Basic analysis 🍼

Published

March 27, 2025

This guide introduces the basics of multivariate count data analysis using the pyPLNmodels package. For complex datasets, traditional models like Poisson regression and Zero-Inflated Poisson Regression (ZIP) may fail to capture important insights, such as correlations between counts (i.e. variables). The pyPLNmodels package provides two key models to address these challenges:

The PlnPCA model extends the functionality of the Pln model to handle high-dimensional data, though it may slightly compromise parameter estimation accuracy.

1 Statistical Overview

Both models are built on the following assumptions for a given count matrix \(Y\) (Aitchison and Ho 1989):

\[ Y_{ij}| Z_{ij} \sim \mathcal P(\exp(o_{ij} + Z_{ij})), \quad Z_{i}\sim \mathcal N(X_i^{\top} B, \Sigma),\] where the input data includes:

  • \(Y_{ij}\) (endog): the \(j\)-th count for the \(i\)-th observation
  • \(X_i\) (exog): covariates for the \(i\)-th observation (if available)
  • \(o_i\) (offsets): offset for the \(i\)-th observation (if available)

and the model parameters are:

  • \(B\) (coef): a matrix of regression coefficients
  • \(\Sigma\) (covariance): the covariance matrix of the latent variables \(Z_i\)

These models aim to capture the structure of the data through the latent variables \(Z\).

The Pln model assumes \(\Sigma\) has full rank, while the PlnPCA model assumes \(\Sigma\) has a low rank, which must be specified by the user. A lower A lower rank introduces a trade-off, reducing computational complexity but potentially compromising parameter estimation accuracy.

1.1 Purpose

The pyPLNmodels package is designed to:

  • Estimate the parameters \(B\) and \(\Sigma\)
  • Retrieve the latent variables \(Z\) (which typically contains more information than \(Y\))
  • Visualize the latent variables and their relationships

This is achieved using the input count matrix \(Y\), along with optional covariate matrix \(X\) (defaulting to a vector of 1’s) and offsets \(O\) (defaulting to a matrix of 0’s).

2 Importing Data

In this example, we analyze single-cell RNA-seq data provided by the load_scrna function in the package. Each column in the dataset represents a gene, while each row corresponds to a cell (i.e., an individual). Covariates for cell types (labels) are also included. For simplicity, we limit the analysis to 20 variables (dimensions).

from pyPLNmodels import load_scrna
rna = load_scrna(dim=20)
print('Data: ', rna.keys())
Returning scRNA dataset of size (400, 20)
Data:  dict_keys(['endog', 'labels', 'labels_1hot'])

2.1 Data Structure

2.1.1 Count Matrix (endog)

endog = rna["endog"]
print(endog.head())
     FTL  MALAT1   FTH1  TMSB4X   CD74  SPP1   APOE    B2M   ACTB  HLA-DRA  \
0    5.0    39.0   14.0    39.0   11.0   0.0    0.0   88.0   49.0      0.0
1    8.0    52.0   38.0   186.0   17.0   0.0    0.0  152.0  167.0      3.0
2   76.0    29.0   38.0    71.0   37.0   3.0    2.0   21.0    7.0     12.0
3  354.0   151.0  236.0   184.0  287.0   0.0  238.0  255.0   27.0    243.0
4    0.0    55.0   10.0    37.0    8.0   1.0    0.0   23.0   49.0      1.0

   APOC1  MT-CO1  TMSB10  EEF1A1  HLA-DRB1  HLA-DPA1  HLA-DPB1  MT-CO2  LYZ  \
0    0.0    29.0    22.0    34.0       0.0       0.0       1.0    21.0  0.0
1    0.0   109.0    65.0   128.0       2.0       4.0       1.0   172.0  1.0
2    3.0    46.0     7.0    11.0       0.0       7.0      17.0    16.0  0.0
3  196.0   118.0    45.0    98.0      66.0     115.0     160.0    27.0  6.0
4    0.0    11.0    25.0    10.0       1.0       2.0       2.0     6.0  0.0

   RPLP1
0   14.0
1  124.0
2   20.0
3   40.0
4   18.0  

2.1.2 Cell Type

cell_type = rna["labels"]
print('Possible cell types: ', cell_type.unique())
print(cell_type.head())
Possible cell types:  ['T_cells_CD4+' 'Macrophages' 'T_cells_CD8+']
0    T_cells_CD4+
1    T_cells_CD4+
2     Macrophages
3     Macrophages
4    T_cells_CD4+
Name: standard_true_celltype_v5, dtype: object

3 Model Initialization and Fitting

To analyze the mean values for each cell type, we use cell type as a covariate, with Macrophages as the reference category:

from pyPLNmodels import Pln
pln = Pln.from_formula('endog ~ 1 + labels', data=rna).fit()
Setting the offsets to zero.
Fitting a Pln model with full covariance.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 7.2 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   3%|▎         | 12/400 [00:00<00:03, 117.57it/s]Upper bound on the fitting time:   6%|▌         | 24/400 [00:00<00:03, 100.29it/s]Upper bound on the fitting time:   9%|▉         | 35/400 [00:00<00:03, 95.36it/s] Upper bound on the fitting time:  12%|█▏        | 47/400 [00:00<00:03, 102.52it/s]Upper bound on the fitting time:  14%|█▍        | 58/400 [00:00<00:03, 104.47it/s]Upper bound on the fitting time:  17%|█▋        | 69/400 [00:00<00:03, 106.13it/s]Upper bound on the fitting time:  20%|██        | 81/400 [00:00<00:02, 108.36it/s]Upper bound on the fitting time:  23%|██▎       | 93/400 [00:00<00:02, 109.86it/s]Upper bound on the fitting time:  26%|██▋       | 105/400 [00:00<00:02, 111.39it/s]Upper bound on the fitting time:  29%|██▉       | 117/400 [00:01<00:02, 112.57it/s]Upper bound on the fitting time:  32%|███▏      | 129/400 [00:01<00:02, 113.45it/s]Upper bound on the fitting time:  35%|███▌      | 141/400 [00:01<00:02, 115.26it/s]Upper bound on the fitting time:  38%|███▊      | 153/400 [00:01<00:02, 115.53it/s]Upper bound on the fitting time:  41%|████▏     | 165/400 [00:01<00:02, 114.84it/s]Upper bound on the fitting time:  44%|████▍     | 177/400 [00:01<00:01, 116.13it/s]Upper bound on the fitting time:  48%|████▊     | 190/400 [00:01<00:01, 118.75it/s]Upper bound on the fitting time:  51%|█████     | 203/400 [00:01<00:01, 120.74it/s]Upper bound on the fitting time:  54%|█████▍    | 216/400 [00:01<00:01, 122.81it/s]Upper bound on the fitting time:  57%|█████▋    | 229/400 [00:02<00:01, 123.70it/s]Upper bound on the fitting time:  60%|██████    | 242/400 [00:02<00:01, 122.67it/s]Upper bound on the fitting time:  64%|██████▍   | 255/400 [00:02<00:01, 119.18it/s]Upper bound on the fitting time:  67%|██████▋   | 267/400 [00:02<00:01, 114.22it/s]Upper bound on the fitting time:  70%|██████▉   | 279/400 [00:02<00:01, 114.61it/s]Upper bound on the fitting time:  73%|███████▎  | 291/400 [00:02<00:01, 106.51it/s]Upper bound on the fitting time:  76%|███████▌  | 302/400 [00:02<00:00, 102.25it/s]Upper bound on the fitting time:  78%|███████▊  | 314/400 [00:02<00:00, 106.28it/s]Upper bound on the fitting time:  82%|████████▏ | 326/400 [00:02<00:00, 109.12it/s]Upper bound on the fitting time:  85%|████████▍ | 339/400 [00:03<00:00, 112.56it/s]Upper bound on the fitting time:  88%|████████▊ | 351/400 [00:03<00:00, 114.24it/s]Upper bound on the fitting time:  91%|█████████ | 364/400 [00:03<00:00, 116.26it/s]Upper bound on the fitting time:  94%|█████████▍| 376/400 [00:03<00:00, 114.83it/s]Upper bound on the fitting time:  97%|█████████▋| 388/400 [00:03<00:00, 115.82it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 116.32it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 112.91it/s]

For more details on formula syntax and model initialization, including handling offsets, refer to the dedicated tutorial.

3.1 Model Summary and Insights

After fitting the model, you can print its configuration and key details:

print(pln)
A multivariate Pln with full covariance.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
   -22468.19          20         270  23277.0352  22738.1875    10228.37

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for Pln are:
    None
* Additional methods for Pln are:
    .get_variance_coef() .get_confidence_interval_coef() .summary() .get_coef_p_values() .plot_regression_forest()

To gain deeper insights into the model parameters and the optimization process, use the .show() method:

pln.show()

Monitoring the norm of each parameters allows to know if the model has converged. If it has not converged, one can refit the model with a lower tolerance (tol) and a bigger number iterations than the default (maxiter=400):

pln.fit(maxiter=1000, tol = 0)

3.2 Exploring Latent Variables

The latent variables \(Z\), which capture the underlying structure of the data, are accessible via the latent_variables attribute, or the .transform() method:

Z = pln.latent_variables
Z = pln.transform()
print('Shape of Z:', Z.shape)
Shape of Z: torch.Size([400, 20])

The effect of covariates on the latent variables can be removed by using the remove_exog_effect keyword:

Z_moins_XB = pln.transform(remove_exog_effect=True)

You can visualize these latent variables using the .viz() method:

pln.viz(colors=cell_type)

To visualize the latent positions without the effect of covariates (i.e., (Z - XB)), set the remove_exog_effect parameter to True:

pln.viz(colors=cell_type, remove_exog_effect=True)

Additionally, you can generate a pair plot of the first Principal Components (PCs) of the latent variables:

pln.pca_pairplot(n_components=4, colors=cell_type)

4 Analyzing Covariate Effects

The model provides insights into the effects of covariates. For example, it may reveal that the mean value for Macrophages is higher compared to T_cells_CD4+ and T_cells_CD8+.

4.1 Confidence Intervals and Statistical Significance

To summarize the model, including confidence intervals and p-values, use the summary method:

pln.summary()
Coefficients and p-values per dimension:

Dimension: FTL
Exog Name                Coefficient         P-value
Intercept                   4.405430           1e-16
labels[T.T_cells_CD4+]       -3.238307         3.4e-14
labels[T.T_cells_CD8+]       -3.691690           1e-16

Dimension: MALAT1
Exog Name                Coefficient         P-value
Intercept                   4.494294           1e-16
labels[T.T_cells_CD4+]        0.047518            0.67
labels[T.T_cells_CD8+]       -0.599757         0.00068

Dimension: FTH1
Exog Name                Coefficient         P-value
Intercept                   3.977555           1e-16
labels[T.T_cells_CD4+]       -2.103544           1e-16
labels[T.T_cells_CD8+]       -3.101902           1e-16

Dimension: TMSB4X
Exog Name                Coefficient         P-value
Intercept                   4.323444           1e-16
labels[T.T_cells_CD4+]       -0.844740         3.6e-10
labels[T.T_cells_CD8+]       -1.232170         4.2e-07

Dimension: CD74
Exog Name                Coefficient         P-value
Intercept                   3.900579           1e-16
labels[T.T_cells_CD4+]       -3.254608           1e-16
labels[T.T_cells_CD8+]       -3.284683           1e-16

Dimension: SPP1
Exog Name                Coefficient         P-value
Intercept                   0.200535            0.21
labels[T.T_cells_CD4+]       -4.434213           1e-16
labels[T.T_cells_CD8+]       -5.076484           1e-16

Dimension: APOE
Exog Name                Coefficient         P-value
Intercept                   0.917977         8.2e-11
labels[T.T_cells_CD4+]       -4.270944           1e-16
labels[T.T_cells_CD8+]       -4.561553           1e-16

Dimension: B2M
Exog Name                Coefficient         P-value
Intercept                   3.903286           1e-16
labels[T.T_cells_CD4+]       -0.207227            0.19
labels[T.T_cells_CD8+]       -0.571121           0.011

Dimension: ACTB
Exog Name                Coefficient         P-value
Intercept                   3.477508           1e-16
labels[T.T_cells_CD4+]       -1.058260         1.3e-05
labels[T.T_cells_CD8+]       -1.433466           1e-16

Dimension: HLA-DRA
Exog Name                Coefficient         P-value
Intercept                   3.569291           1e-16
labels[T.T_cells_CD4+]       -4.681176           1e-16
labels[T.T_cells_CD8+]       -4.322145           1e-16

Dimension: APOC1
Exog Name                Coefficient         P-value
Intercept                   0.537658         0.00021
labels[T.T_cells_CD4+]       -4.180346           1e-16
labels[T.T_cells_CD8+]       -4.951382           1e-16

Dimension: MT-CO1
Exog Name                Coefficient         P-value
Intercept                   3.719430           1e-16
labels[T.T_cells_CD4+]       -0.864605           0.046
labels[T.T_cells_CD8+]       -1.318557           1e-16

Dimension: TMSB10
Exog Name                Coefficient         P-value
Intercept                   3.296484           1e-16
labels[T.T_cells_CD4+]       -1.006131           0.033
labels[T.T_cells_CD8+]       -1.902974           1e-16

Dimension: EEF1A1
Exog Name                Coefficient         P-value
Intercept                   2.992825           1e-16
labels[T.T_cells_CD4+]        0.042000            0.71
labels[T.T_cells_CD8+]       -1.093768         2.4e-15

Dimension: HLA-DRB1
Exog Name                Coefficient         P-value
Intercept                   2.854607           1e-16
labels[T.T_cells_CD4+]       -4.098247           1e-16
labels[T.T_cells_CD8+]       -3.645638           1e-16

Dimension: HLA-DPA1
Exog Name                Coefficient         P-value
Intercept                   2.543715         4.4e-16
labels[T.T_cells_CD4+]       -3.681852           1e-16
labels[T.T_cells_CD8+]       -3.447149           1e-16

Dimension: HLA-DPB1
Exog Name                Coefficient         P-value
Intercept                   2.700437         4.9e-08
labels[T.T_cells_CD4+]       -3.689348           1e-16
labels[T.T_cells_CD8+]       -3.355143           1e-16

Dimension: MT-CO2
Exog Name                Coefficient         P-value
Intercept                   2.992518         8.1e-09
labels[T.T_cells_CD4+]       -0.491286          0.0016
labels[T.T_cells_CD8+]       -0.978666         2.2e-16

Dimension: LYZ
Exog Name                Coefficient         P-value
Intercept                   2.074273           9e-12
labels[T.T_cells_CD4+]       -4.863344           1e-16
labels[T.T_cells_CD8+]       -5.335721           1e-16

Dimension: RPLP1
Exog Name                Coefficient         P-value
Intercept                   3.192550         4.3e-14
labels[T.T_cells_CD4+]       -0.150987            0.28
labels[T.T_cells_CD8+]       -0.957494         2.3e-11

You can also visualize confidence intervals for regression coefficients using the plot_regression_forest method:

pln.plot_regression_forest()


5 Covariance and Variable Visualization

The pyPLNmodels package provides tools to analyze the effects of variables and visualize their relationships using the plot_correlation_circle() and biplot() methods.

5.1 Correlation Circle

Use the plot_correlation_circle() method to visualize the relationships between variables:

pln.plot_correlation_circle(column_names=["FTL", "MALAT1", "FTH1"])

You can access the column names using the column_names_endog attribute of the model.

5.2 Biplot

The biplot() method allows simultaneous visualization of variables and latent variables ((Z)):

pln.biplot(column_names=["FTL", "MALAT1", "FTH1"], colors=cell_type)


6 High-dimension: PlnPCA Model

The PlnPCA model (Chiquet, Mariadassou, and Robin 2018) extends the functionality of the Pln model for high-dimensional data. It uses the same syntax as Pln:

from pyPLNmodels import PlnPCA
high_d_rna = load_scrna(dim=500)
pca = PlnPCA.from_formula('endog ~ 1 + labels', data=high_d_rna, rank=5).fit()
Returning scRNA dataset of size (400, 500)
Setting the offsets to zero.
Fitting a PlnPCA model with 5 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 5.7 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   2%|▏         | 9/400 [00:00<00:04, 86.34it/s]Upper bound on the fitting time:   4%|▍         | 18/400 [00:00<00:04, 85.55it/s]Upper bound on the fitting time:   7%|▋         | 27/400 [00:00<00:04, 84.76it/s]Upper bound on the fitting time:   9%|▉         | 36/400 [00:00<00:04, 83.75it/s]Upper bound on the fitting time:  11%|█▏        | 45/400 [00:00<00:04, 84.96it/s]Upper bound on the fitting time:  14%|█▎        | 54/400 [00:00<00:04, 85.13it/s]Upper bound on the fitting time:  16%|█▌        | 63/400 [00:00<00:03, 84.55it/s]Upper bound on the fitting time:  18%|█▊        | 72/400 [00:00<00:03, 82.06it/s]Upper bound on the fitting time:  20%|██        | 81/400 [00:00<00:03, 82.59it/s]Upper bound on the fitting time:  22%|██▎       | 90/400 [00:01<00:03, 83.36it/s]Upper bound on the fitting time:  25%|██▍       | 99/400 [00:01<00:03, 84.16it/s]Upper bound on the fitting time:  27%|██▋       | 108/400 [00:01<00:03, 84.67it/s]Upper bound on the fitting time:  29%|██▉       | 117/400 [00:01<00:03, 85.28it/s]Upper bound on the fitting time:  32%|███▏      | 126/400 [00:01<00:03, 85.38it/s]Upper bound on the fitting time:  34%|███▍      | 135/400 [00:01<00:03, 83.57it/s]Upper bound on the fitting time:  36%|███▌      | 144/400 [00:01<00:03, 84.95it/s]Upper bound on the fitting time:  38%|███▊      | 153/400 [00:01<00:02, 85.99it/s]Upper bound on the fitting time:  40%|████      | 162/400 [00:01<00:02, 83.71it/s]Upper bound on the fitting time:  43%|████▎     | 172/400 [00:02<00:02, 86.44it/s]Upper bound on the fitting time:  45%|████▌     | 181/400 [00:02<00:02, 87.04it/s]Upper bound on the fitting time:  48%|████▊     | 190/400 [00:02<00:02, 86.93it/s]Upper bound on the fitting time:  50%|████▉     | 199/400 [00:02<00:02, 86.94it/s]Upper bound on the fitting time:  52%|█████▏    | 208/400 [00:02<00:02, 86.71it/s]Upper bound on the fitting time:  54%|█████▍    | 217/400 [00:02<00:02, 83.97it/s]Upper bound on the fitting time:  56%|█████▋    | 226/400 [00:02<00:02, 82.02it/s]Upper bound on the fitting time:  59%|█████▉    | 235/400 [00:02<00:02, 82.16it/s]Upper bound on the fitting time:  61%|██████    | 244/400 [00:02<00:01, 83.27it/s]Upper bound on the fitting time:  63%|██████▎   | 253/400 [00:02<00:01, 83.95it/s]Upper bound on the fitting time:  66%|██████▌   | 262/400 [00:03<00:01, 83.94it/s]Upper bound on the fitting time:  68%|██████▊   | 271/400 [00:03<00:01, 84.76it/s]Upper bound on the fitting time:  70%|███████   | 280/400 [00:03<00:01, 85.49it/s]Upper bound on the fitting time:  72%|███████▏  | 289/400 [00:03<00:01, 83.42it/s]Upper bound on the fitting time:  74%|███████▍  | 298/400 [00:03<00:01, 83.77it/s]Upper bound on the fitting time:  77%|███████▋  | 307/400 [00:03<00:01, 84.53it/s]Upper bound on the fitting time:  79%|███████▉  | 316/400 [00:03<00:00, 84.79it/s]Upper bound on the fitting time:  81%|████████▏ | 325/400 [00:03<00:00, 85.24it/s]Upper bound on the fitting time:  84%|████████▎ | 334/400 [00:03<00:00, 85.23it/s]Upper bound on the fitting time:  86%|████████▌ | 343/400 [00:04<00:00, 84.46it/s]Upper bound on the fitting time:  88%|████████▊ | 352/400 [00:04<00:00, 85.21it/s]Upper bound on the fitting time:  90%|█████████ | 361/400 [00:04<00:00, 85.94it/s]Upper bound on the fitting time:  92%|█████████▎| 370/400 [00:04<00:00, 86.14it/s]Upper bound on the fitting time:  95%|█████████▍| 379/400 [00:04<00:00, 85.91it/s]Upper bound on the fitting time:  97%|█████████▋| 388/400 [00:04<00:00, 86.03it/s]Upper bound on the fitting time:  99%|█████████▉| 397/400 [00:04<00:00, 83.08it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 84.52it/s]

⚠️ Note: P-values are not available in the PlnPCA model.

print(pca)
A multivariate PlnPCA with 5 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
  -330671.62         500        3990 342624.5968  334661.625   339771.13

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None

This model is particularly efficient for high-dimensional datasets, offering significantly reduced computation time compared to Pln:

6.1 Selecting the Rank

The rank is a hyperparameter that can be specified by the user. Alternatively, a data-driven approach to rank selection can be performed using the PlnPCACollection class, which fits multiple models with different ranks:

from pyPLNmodels import PlnPCACollection
pcas = PlnPCACollection.from_formula('endog ~ 1 + labels', data=high_d_rna, ranks=[3, 5, 10, 15]).fit()
Setting the offsets to zero.
Adjusting 4 PlnPCA models.

Fitting a PlnPCA model with 3 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 5.9 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 5 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 5.0 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 10 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 5.0 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 15 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400)  reached in 5.4 seconds.
Last  criterion = 5.4e-06 . Required tolerance = 1e-06
======================================================================

DONE!
Best model (lower BIC): rank 15
    Best model(lower AIC): rank  15

======================================================================
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   2%|▏         | 8/400 [00:00<00:05, 74.82it/s]Upper bound on the fitting time:   4%|▍         | 16/400 [00:00<00:05, 73.40it/s]Upper bound on the fitting time:   6%|▌         | 24/400 [00:00<00:05, 73.27it/s]Upper bound on the fitting time:   8%|▊         | 32/400 [00:00<00:05, 72.31it/s]Upper bound on the fitting time:  10%|█         | 40/400 [00:00<00:04, 72.14it/s]Upper bound on the fitting time:  12%|█▏        | 48/400 [00:00<00:04, 71.64it/s]Upper bound on the fitting time:  14%|█▍        | 56/400 [00:00<00:04, 72.34it/s]Upper bound on the fitting time:  16%|█▌        | 64/400 [00:00<00:04, 72.15it/s]Upper bound on the fitting time:  18%|█▊        | 72/400 [00:01<00:04, 70.18it/s]Upper bound on the fitting time:  20%|██        | 80/400 [00:01<00:04, 71.48it/s]Upper bound on the fitting time:  22%|██▏       | 88/400 [00:01<00:04, 72.30it/s]Upper bound on the fitting time:  24%|██▍       | 96/400 [00:01<00:04, 71.14it/s]Upper bound on the fitting time:  26%|██▋       | 105/400 [00:01<00:03, 74.80it/s]Upper bound on the fitting time:  28%|██▊       | 114/400 [00:01<00:03, 78.72it/s]Upper bound on the fitting time:  31%|███       | 123/400 [00:01<00:03, 81.14it/s]Upper bound on the fitting time:  33%|███▎      | 132/400 [00:01<00:03, 82.18it/s]Upper bound on the fitting time:  35%|███▌      | 141/400 [00:01<00:03, 84.43it/s]Upper bound on the fitting time:  38%|███▊      | 150/400 [00:01<00:02, 85.95it/s]Upper bound on the fitting time:  40%|███▉      | 159/400 [00:02<00:02, 86.91it/s]Upper bound on the fitting time:  42%|████▏     | 168/400 [00:02<00:02, 87.37it/s]Upper bound on the fitting time:  44%|████▍     | 177/400 [00:02<00:02, 87.46it/s]Upper bound on the fitting time:  46%|████▋     | 186/400 [00:02<00:02, 87.50it/s]Upper bound on the fitting time:  49%|████▉     | 195/400 [00:02<00:02, 88.14it/s]Upper bound on the fitting time:  51%|█████     | 204/400 [00:02<00:02, 88.27it/s]Upper bound on the fitting time:  54%|█████▎    | 214/400 [00:02<00:02, 89.16it/s]Upper bound on the fitting time:  56%|█████▌    | 224/400 [00:02<00:01, 89.70it/s]Upper bound on the fitting time:  58%|█████▊    | 233/400 [00:02<00:01, 89.52it/s]Upper bound on the fitting time:  60%|██████    | 242/400 [00:02<00:01, 87.74it/s]Upper bound on the fitting time:  63%|██████▎   | 251/400 [00:03<00:01, 87.16it/s]Upper bound on the fitting time:  65%|██████▌   | 260/400 [00:03<00:01, 85.44it/s]Upper bound on the fitting time:  67%|██████▋   | 269/400 [00:03<00:01, 86.45it/s]Upper bound on the fitting time:  70%|██████▉   | 278/400 [00:03<00:01, 86.85it/s]Upper bound on the fitting time:  72%|███████▏  | 287/400 [00:03<00:01, 87.32it/s]Upper bound on the fitting time:  74%|███████▍  | 297/400 [00:03<00:01, 87.91it/s]Upper bound on the fitting time:  76%|███████▋  | 306/400 [00:03<00:01, 87.70it/s]Upper bound on the fitting time:  79%|███████▉  | 315/400 [00:03<00:00, 88.02it/s]Upper bound on the fitting time:  81%|████████  | 324/400 [00:03<00:00, 88.04it/s]Upper bound on the fitting time:  83%|████████▎ | 333/400 [00:04<00:00, 86.15it/s]Upper bound on the fitting time:  86%|████████▌ | 342/400 [00:04<00:00, 86.63it/s]Upper bound on the fitting time:  88%|████████▊ | 351/400 [00:04<00:00, 86.96it/s]Upper bound on the fitting time:  90%|█████████ | 360/400 [00:04<00:00, 87.49it/s]Upper bound on the fitting time:  92%|█████████▏| 369/400 [00:04<00:00, 87.31it/s]Upper bound on the fitting time:  94%|█████████▍| 378/400 [00:04<00:00, 87.53it/s]Upper bound on the fitting time:  97%|█████████▋| 387/400 [00:04<00:00, 79.52it/s]Upper bound on the fitting time:  99%|█████████▉| 396/400 [00:04<00:00, 77.05it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 81.86it/s]
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   2%|▏         | 8/400 [00:00<00:05, 71.16it/s]Upper bound on the fitting time:   4%|▍         | 16/400 [00:00<00:05, 70.73it/s]Upper bound on the fitting time:   6%|▋         | 25/400 [00:00<00:04, 75.75it/s]Upper bound on the fitting time:   8%|▊         | 33/400 [00:00<00:04, 76.28it/s]Upper bound on the fitting time:  10%|█         | 42/400 [00:00<00:04, 77.95it/s]Upper bound on the fitting time:  12%|█▎        | 50/400 [00:00<00:04, 77.33it/s]Upper bound on the fitting time:  14%|█▍        | 58/400 [00:00<00:04, 76.23it/s]Upper bound on the fitting time:  16%|█▋        | 66/400 [00:00<00:04, 77.17it/s]Upper bound on the fitting time:  19%|█▉        | 75/400 [00:00<00:04, 78.85it/s]Upper bound on the fitting time:  21%|██        | 84/400 [00:01<00:03, 79.84it/s]Upper bound on the fitting time:  23%|██▎       | 93/400 [00:01<00:03, 81.05it/s]Upper bound on the fitting time:  26%|██▌       | 102/400 [00:01<00:03, 82.91it/s]Upper bound on the fitting time:  28%|██▊       | 111/400 [00:01<00:03, 84.08it/s]Upper bound on the fitting time:  30%|███       | 120/400 [00:01<00:03, 85.11it/s]Upper bound on the fitting time:  32%|███▏      | 129/400 [00:01<00:03, 85.78it/s]Upper bound on the fitting time:  34%|███▍      | 138/400 [00:01<00:03, 86.07it/s]Upper bound on the fitting time:  37%|███▋      | 147/400 [00:01<00:02, 86.42it/s]Upper bound on the fitting time:  39%|███▉      | 156/400 [00:01<00:02, 85.72it/s]Upper bound on the fitting time:  41%|████▏     | 165/400 [00:02<00:02, 85.36it/s]Upper bound on the fitting time:  44%|████▎     | 174/400 [00:02<00:02, 86.26it/s]Upper bound on the fitting time:  46%|████▌     | 183/400 [00:02<00:02, 86.30it/s]Upper bound on the fitting time:  48%|████▊     | 192/400 [00:02<00:02, 86.14it/s]Upper bound on the fitting time:  50%|█████     | 201/400 [00:02<00:02, 86.02it/s]Upper bound on the fitting time:  52%|█████▎    | 210/400 [00:02<00:02, 82.90it/s]Upper bound on the fitting time:  55%|█████▍    | 219/400 [00:02<00:02, 77.44it/s]Upper bound on the fitting time:  57%|█████▋    | 227/400 [00:02<00:02, 76.32it/s]Upper bound on the fitting time:  59%|█████▉    | 236/400 [00:02<00:02, 78.84it/s]Upper bound on the fitting time:  61%|██████▏   | 245/400 [00:03<00:01, 78.95it/s]Upper bound on the fitting time:  63%|██████▎   | 253/400 [00:03<00:01, 76.69it/s]Upper bound on the fitting time:  66%|██████▌   | 262/400 [00:03<00:01, 78.18it/s]Upper bound on the fitting time:  68%|██████▊   | 271/400 [00:03<00:01, 80.89it/s]Upper bound on the fitting time:  70%|███████   | 280/400 [00:03<00:01, 82.88it/s]Upper bound on the fitting time:  72%|███████▏  | 289/400 [00:03<00:01, 84.37it/s]Upper bound on the fitting time:  74%|███████▍  | 298/400 [00:03<00:01, 83.60it/s]Upper bound on the fitting time:  77%|███████▋  | 307/400 [00:03<00:01, 81.66it/s]Upper bound on the fitting time:  79%|███████▉  | 316/400 [00:03<00:01, 80.25it/s]Upper bound on the fitting time:  81%|████████▏ | 325/400 [00:04<00:00, 79.36it/s]Upper bound on the fitting time:  84%|████████▎ | 334/400 [00:04<00:00, 80.82it/s]Upper bound on the fitting time:  86%|████████▌ | 343/400 [00:04<00:00, 81.62it/s]Upper bound on the fitting time:  88%|████████▊ | 352/400 [00:04<00:00, 82.67it/s]Upper bound on the fitting time:  90%|█████████ | 361/400 [00:04<00:00, 82.77it/s]Upper bound on the fitting time:  92%|█████████▎| 370/400 [00:04<00:00, 79.56it/s]Upper bound on the fitting time:  94%|█████████▍| 378/400 [00:04<00:00, 74.05it/s]Upper bound on the fitting time:  96%|█████████▋| 386/400 [00:04<00:00, 72.85it/s]Upper bound on the fitting time:  98%|█████████▊| 394/400 [00:04<00:00, 72.82it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 80.09it/s]
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   2%|▏         | 8/400 [00:00<00:04, 79.34it/s]Upper bound on the fitting time:   4%|▍         | 17/400 [00:00<00:04, 81.25it/s]Upper bound on the fitting time:   6%|▋         | 26/400 [00:00<00:04, 81.54it/s]Upper bound on the fitting time:   9%|▉         | 35/400 [00:00<00:04, 80.65it/s]Upper bound on the fitting time:  11%|█         | 44/400 [00:00<00:04, 77.33it/s]Upper bound on the fitting time:  13%|█▎        | 53/400 [00:00<00:04, 78.75it/s]Upper bound on the fitting time:  16%|█▌        | 62/400 [00:00<00:04, 79.83it/s]Upper bound on the fitting time:  18%|█▊        | 71/400 [00:00<00:04, 80.56it/s]Upper bound on the fitting time:  20%|██        | 80/400 [00:01<00:03, 80.32it/s]Upper bound on the fitting time:  22%|██▏       | 89/400 [00:01<00:03, 80.81it/s]Upper bound on the fitting time:  24%|██▍       | 98/400 [00:01<00:03, 81.50it/s]Upper bound on the fitting time:  27%|██▋       | 107/400 [00:01<00:03, 81.87it/s]Upper bound on the fitting time:  29%|██▉       | 116/400 [00:01<00:03, 82.01it/s]Upper bound on the fitting time:  31%|███▏      | 125/400 [00:01<00:03, 81.94it/s]Upper bound on the fitting time:  34%|███▎      | 134/400 [00:01<00:03, 82.12it/s]Upper bound on the fitting time:  36%|███▌      | 143/400 [00:01<00:03, 82.12it/s]Upper bound on the fitting time:  38%|███▊      | 152/400 [00:01<00:03, 81.78it/s]Upper bound on the fitting time:  40%|████      | 161/400 [00:01<00:02, 82.00it/s]Upper bound on the fitting time:  42%|████▎     | 170/400 [00:02<00:02, 82.31it/s]Upper bound on the fitting time:  45%|████▍     | 179/400 [00:02<00:02, 80.33it/s]Upper bound on the fitting time:  47%|████▋     | 188/400 [00:02<00:02, 81.15it/s]Upper bound on the fitting time:  49%|████▉     | 197/400 [00:02<00:02, 81.62it/s]Upper bound on the fitting time:  52%|█████▏    | 206/400 [00:02<00:02, 81.82it/s]Upper bound on the fitting time:  54%|█████▍    | 215/400 [00:02<00:02, 81.52it/s]Upper bound on the fitting time:  56%|█████▌    | 224/400 [00:02<00:02, 81.46it/s]Upper bound on the fitting time:  58%|█████▊    | 233/400 [00:02<00:02, 81.79it/s]Upper bound on the fitting time:  60%|██████    | 242/400 [00:02<00:01, 81.85it/s]Upper bound on the fitting time:  63%|██████▎   | 251/400 [00:03<00:01, 80.71it/s]Upper bound on the fitting time:  65%|██████▌   | 260/400 [00:03<00:01, 81.04it/s]Upper bound on the fitting time:  67%|██████▋   | 269/400 [00:03<00:01, 81.37it/s]Upper bound on the fitting time:  70%|██████▉   | 278/400 [00:03<00:01, 81.91it/s]Upper bound on the fitting time:  72%|███████▏  | 287/400 [00:03<00:01, 81.96it/s]Upper bound on the fitting time:  74%|███████▍  | 296/400 [00:03<00:01, 79.39it/s]Upper bound on the fitting time:  76%|███████▌  | 304/400 [00:03<00:01, 77.90it/s]Upper bound on the fitting time:  78%|███████▊  | 312/400 [00:03<00:01, 76.19it/s]Upper bound on the fitting time:  80%|████████  | 320/400 [00:03<00:01, 76.62it/s]Upper bound on the fitting time:  82%|████████▏ | 328/400 [00:04<00:00, 76.60it/s]Upper bound on the fitting time:  84%|████████▍ | 336/400 [00:04<00:00, 76.88it/s]Upper bound on the fitting time:  86%|████████▌ | 344/400 [00:04<00:00, 77.11it/s]Upper bound on the fitting time:  88%|████████▊ | 352/400 [00:04<00:00, 76.94it/s]Upper bound on the fitting time:  90%|█████████ | 360/400 [00:04<00:00, 76.79it/s]Upper bound on the fitting time:  92%|█████████▏| 368/400 [00:04<00:00, 76.71it/s]Upper bound on the fitting time:  94%|█████████▍| 376/400 [00:04<00:00, 77.29it/s]Upper bound on the fitting time:  96%|█████████▋| 385/400 [00:04<00:00, 78.52it/s]Upper bound on the fitting time:  98%|█████████▊| 394/400 [00:04<00:00, 79.40it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:05<00:00, 79.99it/s]
Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:   2%|▏         | 9/400 [00:00<00:04, 81.63it/s]Upper bound on the fitting time:   4%|▍         | 18/400 [00:00<00:04, 79.52it/s]Upper bound on the fitting time:   6%|▋         | 26/400 [00:00<00:04, 78.83it/s]Upper bound on the fitting time:   8%|▊         | 34/400 [00:00<00:05, 72.70it/s]Upper bound on the fitting time:  10%|█         | 42/400 [00:00<00:05, 69.90it/s]Upper bound on the fitting time:  12%|█▎        | 50/400 [00:00<00:04, 71.23it/s]Upper bound on the fitting time:  14%|█▍        | 58/400 [00:00<00:04, 71.51it/s]Upper bound on the fitting time:  16%|█▋        | 66/400 [00:00<00:04, 72.43it/s]Upper bound on the fitting time:  18%|█▊        | 74/400 [00:01<00:04, 73.86it/s]Upper bound on the fitting time:  20%|██        | 82/400 [00:01<00:04, 74.41it/s]Upper bound on the fitting time:  22%|██▎       | 90/400 [00:01<00:04, 75.36it/s]Upper bound on the fitting time:  24%|██▍       | 98/400 [00:01<00:03, 76.21it/s]Upper bound on the fitting time:  26%|██▋       | 106/400 [00:01<00:03, 76.38it/s]Upper bound on the fitting time:  28%|██▊       | 114/400 [00:01<00:03, 76.86it/s]Upper bound on the fitting time:  30%|███       | 122/400 [00:01<00:03, 76.88it/s]Upper bound on the fitting time:  32%|███▎      | 130/400 [00:01<00:03, 75.39it/s]Upper bound on the fitting time:  34%|███▍      | 138/400 [00:01<00:03, 73.14it/s]Upper bound on the fitting time:  36%|███▋      | 146/400 [00:01<00:03, 70.52it/s]Upper bound on the fitting time:  38%|███▊      | 154/400 [00:02<00:03, 67.96it/s]Upper bound on the fitting time:  40%|████      | 161/400 [00:02<00:03, 66.97it/s]Upper bound on the fitting time:  42%|████▏     | 168/400 [00:02<00:03, 66.04it/s]Upper bound on the fitting time:  44%|████▍     | 175/400 [00:02<00:03, 66.85it/s]Upper bound on the fitting time:  46%|████▌     | 183/400 [00:02<00:03, 69.65it/s]Upper bound on the fitting time:  48%|████▊     | 191/400 [00:02<00:02, 71.68it/s]Upper bound on the fitting time:  50%|████▉     | 199/400 [00:02<00:02, 71.98it/s]Upper bound on the fitting time:  52%|█████▏    | 207/400 [00:02<00:02, 72.25it/s]Upper bound on the fitting time:  54%|█████▍    | 215/400 [00:02<00:02, 74.01it/s]Upper bound on the fitting time:  56%|█████▌    | 223/400 [00:03<00:02, 75.47it/s]Upper bound on the fitting time:  58%|█████▊    | 231/400 [00:03<00:02, 75.91it/s]Upper bound on the fitting time:  60%|█████▉    | 239/400 [00:03<00:02, 76.49it/s]Upper bound on the fitting time:  62%|██████▏   | 247/400 [00:03<00:02, 74.96it/s]Upper bound on the fitting time:  64%|██████▍   | 255/400 [00:03<00:01, 75.45it/s]Upper bound on the fitting time:  66%|██████▌   | 263/400 [00:03<00:01, 75.09it/s]Upper bound on the fitting time:  68%|██████▊   | 271/400 [00:03<00:01, 76.09it/s]Upper bound on the fitting time:  70%|██████▉   | 279/400 [00:03<00:01, 76.57it/s]Upper bound on the fitting time:  72%|███████▏  | 287/400 [00:03<00:01, 76.99it/s]Upper bound on the fitting time:  74%|███████▍  | 295/400 [00:04<00:01, 76.65it/s]Upper bound on the fitting time:  76%|███████▌  | 303/400 [00:04<00:01, 77.37it/s]Upper bound on the fitting time:  78%|███████▊  | 311/400 [00:04<00:01, 77.58it/s]Upper bound on the fitting time:  80%|███████▉  | 319/400 [00:04<00:01, 78.00it/s]Upper bound on the fitting time:  82%|████████▏ | 327/400 [00:04<00:00, 77.55it/s]Upper bound on the fitting time:  84%|████████▍ | 335/400 [00:04<00:00, 74.65it/s]Upper bound on the fitting time:  86%|████████▌ | 343/400 [00:04<00:00, 72.87it/s]Upper bound on the fitting time:  88%|████████▊ | 351/400 [00:04<00:00, 73.54it/s]Upper bound on the fitting time:  90%|████████▉ | 359/400 [00:04<00:00, 75.01it/s]Upper bound on the fitting time:  92%|█████████▏| 367/400 [00:04<00:00, 75.91it/s]Upper bound on the fitting time:  94%|█████████▍| 375/400 [00:05<00:00, 76.71it/s]Upper bound on the fitting time:  96%|█████████▌| 383/400 [00:05<00:00, 76.74it/s]Upper bound on the fitting time:  98%|█████████▊| 391/400 [00:05<00:00, 76.93it/s]Upper bound on the fitting time: 100%|█████████▉| 399/400 [00:05<00:00, 77.29it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:05<00:00, 74.34it/s]

Use the .show() method to explore insights about the optimal rank:

pcas.show()

⚠️ Note: The best model typically correspond to the largest rank, which may not always be desirable.

The best model can be accessed using the .best_model() method. The selection criterion can be AIC, BIC, or ICL:

best_model = pcas.best_model(criterion="BIC")
print(best_model)
A multivariate PlnPCA with 15 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
   -281440.0         500        8895 308087.0386    290335.0   299427.15

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None

6.2 Accessing Individual Models

All individual models in the collection can be accessed with the rank as the key:

pca_5 = pcas[5]
print(pca_5)
A multivariate PlnPCA with 5 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
  -331103.12         500        3990 343056.0968  335093.125   340204.75

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None

You can also iterate through the collection:

for pca in pcas.values():
    print(pca)
A multivariate PlnPCA with 3 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
  -363021.88         500        2997 372000.0846  366018.875   370292.93

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None
A multivariate PlnPCA with 5 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
  -331103.12         500        3990 343056.0968  335093.125   340204.75

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None
A multivariate PlnPCA with 10 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
  -295765.38         500        6455 315102.8268  302220.375    309371.6

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None
A multivariate PlnPCA with 15 principal components.
======================================================================
     Loglike   Dimension    Nb param         BIC         AIC         ICL
   -281440.0         500        8895 308087.0386    290335.0   299427.15

======================================================================
* Useful attributes
    .latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
    .transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
    None
* Additional methods for PlnPCA are:
    None

6.3 Additional Information

For further details, print the model to explore its methods and attributes.

References

Aitchison, John, and CH Ho. 1989. “The Multivariate Poisson-Log Normal Distribution.” Biometrika 76 (4): 643–53.
Batardière, Bastien, Joon Kwon, and Julien Chiquet. 2024. “pyPLNmodels: A Python Package to Analyze Multivariate High-Dimensional Count Data.” Journal of Open Source Software.
Chiquet, J, M Mariadassou, and S Robin. 2018. “Variational Inference for Probabilistic Poisson PCA.” The Annals of Applied Statistics.