from pyPLNmodels import load_scrna
= load_scrna(dim=20)
rna print('Data: ', rna.keys())
Returning scRNA dataset of size (400, 20)
Data: dict_keys(['endog', 'labels', 'labels_1hot'])
This guide introduces the basics of multivariate count data analysis using the pyPLNmodels
package. For complex datasets, traditional models like Poisson regression and Zero-Inflated Poisson Regression (ZIP) may fail to capture important insights, such as correlations between counts (i.e. variables). The pyPLNmodels
package provides two key models to address these challenges:
Pln
model (documentation)PlnPCA
model (documentation)The PlnPCA
model extends the functionality of the Pln
model to handle high-dimensional data, though it may slightly compromise parameter estimation accuracy.
Both models are built on the following assumptions for a given count matrix \(Y\) (Aitchison and Ho 1989):
\[ Y_{ij}| Z_{ij} \sim \mathcal P(\exp(o_{ij} + Z_{ij})), \quad Z_{i}\sim \mathcal N(X_i^{\top} B, \Sigma),\] where the input data includes:
endog
): the \(j\)-th count for the \(i\)-th observationexog
): covariates for the \(i\)-th observation (if available)offsets
): offset for the \(i\)-th observation (if available)and the model parameters are:
coef
): a matrix of regression coefficientscovariance
): the covariance matrix of the latent variables \(Z_i\)These models aim to capture the structure of the data through the latent variables \(Z\).
The Pln
model assumes \(\Sigma\) has full rank, while the PlnPCA
model assumes \(\Sigma\) has a low rank, which must be specified by the user. A lower A lower rank introduces a trade-off, reducing computational complexity but potentially compromising parameter estimation accuracy.
The pyPLNmodels
package is designed to:
This is achieved using the input count matrix \(Y\), along with optional covariate matrix \(X\) (defaulting to a vector of 1’s) and offsets \(O\) (defaulting to a matrix of 0’s).
In this example, we analyze single-cell RNA-seq data provided by the load_scrna
function in the package. Each column in the dataset represents a gene, while each row corresponds to a cell (i.e., an individual). Covariates for cell types (labels
) are also included. For simplicity, we limit the analysis to 20 variables (dimensions).
from pyPLNmodels import load_scrna
= load_scrna(dim=20)
rna print('Data: ', rna.keys())
Returning scRNA dataset of size (400, 20)
Data: dict_keys(['endog', 'labels', 'labels_1hot'])
endog
)= rna["endog"]
endog print(endog.head())
FTL MALAT1 FTH1 TMSB4X CD74 SPP1 APOE B2M ACTB HLA-DRA \
0 5.0 39.0 14.0 39.0 11.0 0.0 0.0 88.0 49.0 0.0
1 8.0 52.0 38.0 186.0 17.0 0.0 0.0 152.0 167.0 3.0
2 76.0 29.0 38.0 71.0 37.0 3.0 2.0 21.0 7.0 12.0
3 354.0 151.0 236.0 184.0 287.0 0.0 238.0 255.0 27.0 243.0
4 0.0 55.0 10.0 37.0 8.0 1.0 0.0 23.0 49.0 1.0
APOC1 MT-CO1 TMSB10 EEF1A1 HLA-DRB1 HLA-DPA1 HLA-DPB1 MT-CO2 LYZ \
0 0.0 29.0 22.0 34.0 0.0 0.0 1.0 21.0 0.0
1 0.0 109.0 65.0 128.0 2.0 4.0 1.0 172.0 1.0
2 3.0 46.0 7.0 11.0 0.0 7.0 17.0 16.0 0.0
3 196.0 118.0 45.0 98.0 66.0 115.0 160.0 27.0 6.0
4 0.0 11.0 25.0 10.0 1.0 2.0 2.0 6.0 0.0
RPLP1
0 14.0
1 124.0
2 20.0
3 40.0
4 18.0
= rna["labels"]
cell_type print('Possible cell types: ', cell_type.unique())
print(cell_type.head())
Possible cell types: ['T_cells_CD4+' 'Macrophages' 'T_cells_CD8+']
0 T_cells_CD4+
1 T_cells_CD4+
2 Macrophages
3 Macrophages
4 T_cells_CD4+
Name: standard_true_celltype_v5, dtype: object
To analyze the mean values for each cell type, we use cell type as a covariate, with Macrophages
as the reference category:
from pyPLNmodels import Pln
= Pln.from_formula('endog ~ 1 + labels', data=rna).fit() pln
Setting the offsets to zero.
Fitting a Pln model with full covariance.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 7.2 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 3%|▎ | 12/400 [00:00<00:03, 117.57it/s]Upper bound on the fitting time: 6%|▌ | 24/400 [00:00<00:03, 100.29it/s]Upper bound on the fitting time: 9%|▉ | 35/400 [00:00<00:03, 95.36it/s] Upper bound on the fitting time: 12%|█▏ | 47/400 [00:00<00:03, 102.52it/s]Upper bound on the fitting time: 14%|█▍ | 58/400 [00:00<00:03, 104.47it/s]Upper bound on the fitting time: 17%|█▋ | 69/400 [00:00<00:03, 106.13it/s]Upper bound on the fitting time: 20%|██ | 81/400 [00:00<00:02, 108.36it/s]Upper bound on the fitting time: 23%|██▎ | 93/400 [00:00<00:02, 109.86it/s]Upper bound on the fitting time: 26%|██▋ | 105/400 [00:00<00:02, 111.39it/s]Upper bound on the fitting time: 29%|██▉ | 117/400 [00:01<00:02, 112.57it/s]Upper bound on the fitting time: 32%|███▏ | 129/400 [00:01<00:02, 113.45it/s]Upper bound on the fitting time: 35%|███▌ | 141/400 [00:01<00:02, 115.26it/s]Upper bound on the fitting time: 38%|███▊ | 153/400 [00:01<00:02, 115.53it/s]Upper bound on the fitting time: 41%|████▏ | 165/400 [00:01<00:02, 114.84it/s]Upper bound on the fitting time: 44%|████▍ | 177/400 [00:01<00:01, 116.13it/s]Upper bound on the fitting time: 48%|████▊ | 190/400 [00:01<00:01, 118.75it/s]Upper bound on the fitting time: 51%|█████ | 203/400 [00:01<00:01, 120.74it/s]Upper bound on the fitting time: 54%|█████▍ | 216/400 [00:01<00:01, 122.81it/s]Upper bound on the fitting time: 57%|█████▋ | 229/400 [00:02<00:01, 123.70it/s]Upper bound on the fitting time: 60%|██████ | 242/400 [00:02<00:01, 122.67it/s]Upper bound on the fitting time: 64%|██████▍ | 255/400 [00:02<00:01, 119.18it/s]Upper bound on the fitting time: 67%|██████▋ | 267/400 [00:02<00:01, 114.22it/s]Upper bound on the fitting time: 70%|██████▉ | 279/400 [00:02<00:01, 114.61it/s]Upper bound on the fitting time: 73%|███████▎ | 291/400 [00:02<00:01, 106.51it/s]Upper bound on the fitting time: 76%|███████▌ | 302/400 [00:02<00:00, 102.25it/s]Upper bound on the fitting time: 78%|███████▊ | 314/400 [00:02<00:00, 106.28it/s]Upper bound on the fitting time: 82%|████████▏ | 326/400 [00:02<00:00, 109.12it/s]Upper bound on the fitting time: 85%|████████▍ | 339/400 [00:03<00:00, 112.56it/s]Upper bound on the fitting time: 88%|████████▊ | 351/400 [00:03<00:00, 114.24it/s]Upper bound on the fitting time: 91%|█████████ | 364/400 [00:03<00:00, 116.26it/s]Upper bound on the fitting time: 94%|█████████▍| 376/400 [00:03<00:00, 114.83it/s]Upper bound on the fitting time: 97%|█████████▋| 388/400 [00:03<00:00, 115.82it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 116.32it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 112.91it/s]
For more details on formula syntax and model initialization, including handling offsets, refer to the dedicated tutorial.
After fitting the model, you can print its configuration and key details:
print(pln)
A multivariate Pln with full covariance.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-22468.19 20 270 23277.0352 22738.1875 10228.37
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for Pln are:
None
* Additional methods for Pln are:
.get_variance_coef() .get_confidence_interval_coef() .summary() .get_coef_p_values() .plot_regression_forest()
To gain deeper insights into the model parameters and the optimization process, use the .show()
method:
pln.show()
Monitoring the norm of each parameters allows to know if the model has converged. If it has not converged, one can refit the model with a lower tolerance (tol
) and a bigger number iterations than the default (maxiter=400
):
=1000, tol = 0) pln.fit(maxiter
The latent variables \(Z\), which capture the underlying structure of the data, are accessible via the latent_variables
attribute, or the .transform()
method:
= pln.latent_variables
Z = pln.transform()
Z print('Shape of Z:', Z.shape)
Shape of Z: torch.Size([400, 20])
The effect of covariates on the latent variables can be removed by using the remove_exog_effect
keyword:
= pln.transform(remove_exog_effect=True) Z_moins_XB
You can visualize these latent variables using the .viz()
method:
=cell_type) pln.viz(colors
To visualize the latent positions without the effect of covariates (i.e., (Z - XB)), set the remove_exog_effect
parameter to True
:
=cell_type, remove_exog_effect=True) pln.viz(colors
Additionally, you can generate a pair plot of the first Principal Components (PCs) of the latent variables:
=4, colors=cell_type) pln.pca_pairplot(n_components
The model provides insights into the effects of covariates. For example, it may reveal that the mean value for Macrophages is higher compared to T_cells_CD4+ and T_cells_CD8+.
To summarize the model, including confidence intervals and p-values, use the summary
method:
pln.summary()
Coefficients and p-values per dimension:
Dimension: FTL
Exog Name Coefficient P-value
Intercept 4.405430 1e-16
labels[T.T_cells_CD4+] -3.238307 3.4e-14
labels[T.T_cells_CD8+] -3.691690 1e-16
Dimension: MALAT1
Exog Name Coefficient P-value
Intercept 4.494294 1e-16
labels[T.T_cells_CD4+] 0.047518 0.67
labels[T.T_cells_CD8+] -0.599757 0.00068
Dimension: FTH1
Exog Name Coefficient P-value
Intercept 3.977555 1e-16
labels[T.T_cells_CD4+] -2.103544 1e-16
labels[T.T_cells_CD8+] -3.101902 1e-16
Dimension: TMSB4X
Exog Name Coefficient P-value
Intercept 4.323444 1e-16
labels[T.T_cells_CD4+] -0.844740 3.6e-10
labels[T.T_cells_CD8+] -1.232170 4.2e-07
Dimension: CD74
Exog Name Coefficient P-value
Intercept 3.900579 1e-16
labels[T.T_cells_CD4+] -3.254608 1e-16
labels[T.T_cells_CD8+] -3.284683 1e-16
Dimension: SPP1
Exog Name Coefficient P-value
Intercept 0.200535 0.21
labels[T.T_cells_CD4+] -4.434213 1e-16
labels[T.T_cells_CD8+] -5.076484 1e-16
Dimension: APOE
Exog Name Coefficient P-value
Intercept 0.917977 8.2e-11
labels[T.T_cells_CD4+] -4.270944 1e-16
labels[T.T_cells_CD8+] -4.561553 1e-16
Dimension: B2M
Exog Name Coefficient P-value
Intercept 3.903286 1e-16
labels[T.T_cells_CD4+] -0.207227 0.19
labels[T.T_cells_CD8+] -0.571121 0.011
Dimension: ACTB
Exog Name Coefficient P-value
Intercept 3.477508 1e-16
labels[T.T_cells_CD4+] -1.058260 1.3e-05
labels[T.T_cells_CD8+] -1.433466 1e-16
Dimension: HLA-DRA
Exog Name Coefficient P-value
Intercept 3.569291 1e-16
labels[T.T_cells_CD4+] -4.681176 1e-16
labels[T.T_cells_CD8+] -4.322145 1e-16
Dimension: APOC1
Exog Name Coefficient P-value
Intercept 0.537658 0.00021
labels[T.T_cells_CD4+] -4.180346 1e-16
labels[T.T_cells_CD8+] -4.951382 1e-16
Dimension: MT-CO1
Exog Name Coefficient P-value
Intercept 3.719430 1e-16
labels[T.T_cells_CD4+] -0.864605 0.046
labels[T.T_cells_CD8+] -1.318557 1e-16
Dimension: TMSB10
Exog Name Coefficient P-value
Intercept 3.296484 1e-16
labels[T.T_cells_CD4+] -1.006131 0.033
labels[T.T_cells_CD8+] -1.902974 1e-16
Dimension: EEF1A1
Exog Name Coefficient P-value
Intercept 2.992825 1e-16
labels[T.T_cells_CD4+] 0.042000 0.71
labels[T.T_cells_CD8+] -1.093768 2.4e-15
Dimension: HLA-DRB1
Exog Name Coefficient P-value
Intercept 2.854607 1e-16
labels[T.T_cells_CD4+] -4.098247 1e-16
labels[T.T_cells_CD8+] -3.645638 1e-16
Dimension: HLA-DPA1
Exog Name Coefficient P-value
Intercept 2.543715 4.4e-16
labels[T.T_cells_CD4+] -3.681852 1e-16
labels[T.T_cells_CD8+] -3.447149 1e-16
Dimension: HLA-DPB1
Exog Name Coefficient P-value
Intercept 2.700437 4.9e-08
labels[T.T_cells_CD4+] -3.689348 1e-16
labels[T.T_cells_CD8+] -3.355143 1e-16
Dimension: MT-CO2
Exog Name Coefficient P-value
Intercept 2.992518 8.1e-09
labels[T.T_cells_CD4+] -0.491286 0.0016
labels[T.T_cells_CD8+] -0.978666 2.2e-16
Dimension: LYZ
Exog Name Coefficient P-value
Intercept 2.074273 9e-12
labels[T.T_cells_CD4+] -4.863344 1e-16
labels[T.T_cells_CD8+] -5.335721 1e-16
Dimension: RPLP1
Exog Name Coefficient P-value
Intercept 3.192550 4.3e-14
labels[T.T_cells_CD4+] -0.150987 0.28
labels[T.T_cells_CD8+] -0.957494 2.3e-11
You can also visualize confidence intervals for regression coefficients using the plot_regression_forest
method:
pln.plot_regression_forest()
The pyPLNmodels
package provides tools to analyze the effects of variables and visualize their relationships using the plot_correlation_circle()
and biplot()
methods.
Use the plot_correlation_circle()
method to visualize the relationships between variables:
=["FTL", "MALAT1", "FTH1"]) pln.plot_correlation_circle(column_names
You can access the column names using the column_names_endog
attribute of the model.
The biplot()
method allows simultaneous visualization of variables and latent variables ((Z)):
=["FTL", "MALAT1", "FTH1"], colors=cell_type) pln.biplot(column_names
PlnPCA
ModelThe PlnPCA
model (Chiquet, Mariadassou, and Robin 2018) extends the functionality of the Pln
model for high-dimensional data. It uses the same syntax as Pln
:
from pyPLNmodels import PlnPCA
= load_scrna(dim=500)
high_d_rna = PlnPCA.from_formula('endog ~ 1 + labels', data=high_d_rna, rank=5).fit() pca
Returning scRNA dataset of size (400, 500)
Setting the offsets to zero.
Fitting a PlnPCA model with 5 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.7 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 9/400 [00:00<00:04, 86.34it/s]Upper bound on the fitting time: 4%|▍ | 18/400 [00:00<00:04, 85.55it/s]Upper bound on the fitting time: 7%|▋ | 27/400 [00:00<00:04, 84.76it/s]Upper bound on the fitting time: 9%|▉ | 36/400 [00:00<00:04, 83.75it/s]Upper bound on the fitting time: 11%|█▏ | 45/400 [00:00<00:04, 84.96it/s]Upper bound on the fitting time: 14%|█▎ | 54/400 [00:00<00:04, 85.13it/s]Upper bound on the fitting time: 16%|█▌ | 63/400 [00:00<00:03, 84.55it/s]Upper bound on the fitting time: 18%|█▊ | 72/400 [00:00<00:03, 82.06it/s]Upper bound on the fitting time: 20%|██ | 81/400 [00:00<00:03, 82.59it/s]Upper bound on the fitting time: 22%|██▎ | 90/400 [00:01<00:03, 83.36it/s]Upper bound on the fitting time: 25%|██▍ | 99/400 [00:01<00:03, 84.16it/s]Upper bound on the fitting time: 27%|██▋ | 108/400 [00:01<00:03, 84.67it/s]Upper bound on the fitting time: 29%|██▉ | 117/400 [00:01<00:03, 85.28it/s]Upper bound on the fitting time: 32%|███▏ | 126/400 [00:01<00:03, 85.38it/s]Upper bound on the fitting time: 34%|███▍ | 135/400 [00:01<00:03, 83.57it/s]Upper bound on the fitting time: 36%|███▌ | 144/400 [00:01<00:03, 84.95it/s]Upper bound on the fitting time: 38%|███▊ | 153/400 [00:01<00:02, 85.99it/s]Upper bound on the fitting time: 40%|████ | 162/400 [00:01<00:02, 83.71it/s]Upper bound on the fitting time: 43%|████▎ | 172/400 [00:02<00:02, 86.44it/s]Upper bound on the fitting time: 45%|████▌ | 181/400 [00:02<00:02, 87.04it/s]Upper bound on the fitting time: 48%|████▊ | 190/400 [00:02<00:02, 86.93it/s]Upper bound on the fitting time: 50%|████▉ | 199/400 [00:02<00:02, 86.94it/s]Upper bound on the fitting time: 52%|█████▏ | 208/400 [00:02<00:02, 86.71it/s]Upper bound on the fitting time: 54%|█████▍ | 217/400 [00:02<00:02, 83.97it/s]Upper bound on the fitting time: 56%|█████▋ | 226/400 [00:02<00:02, 82.02it/s]Upper bound on the fitting time: 59%|█████▉ | 235/400 [00:02<00:02, 82.16it/s]Upper bound on the fitting time: 61%|██████ | 244/400 [00:02<00:01, 83.27it/s]Upper bound on the fitting time: 63%|██████▎ | 253/400 [00:02<00:01, 83.95it/s]Upper bound on the fitting time: 66%|██████▌ | 262/400 [00:03<00:01, 83.94it/s]Upper bound on the fitting time: 68%|██████▊ | 271/400 [00:03<00:01, 84.76it/s]Upper bound on the fitting time: 70%|███████ | 280/400 [00:03<00:01, 85.49it/s]Upper bound on the fitting time: 72%|███████▏ | 289/400 [00:03<00:01, 83.42it/s]Upper bound on the fitting time: 74%|███████▍ | 298/400 [00:03<00:01, 83.77it/s]Upper bound on the fitting time: 77%|███████▋ | 307/400 [00:03<00:01, 84.53it/s]Upper bound on the fitting time: 79%|███████▉ | 316/400 [00:03<00:00, 84.79it/s]Upper bound on the fitting time: 81%|████████▏ | 325/400 [00:03<00:00, 85.24it/s]Upper bound on the fitting time: 84%|████████▎ | 334/400 [00:03<00:00, 85.23it/s]Upper bound on the fitting time: 86%|████████▌ | 343/400 [00:04<00:00, 84.46it/s]Upper bound on the fitting time: 88%|████████▊ | 352/400 [00:04<00:00, 85.21it/s]Upper bound on the fitting time: 90%|█████████ | 361/400 [00:04<00:00, 85.94it/s]Upper bound on the fitting time: 92%|█████████▎| 370/400 [00:04<00:00, 86.14it/s]Upper bound on the fitting time: 95%|█████████▍| 379/400 [00:04<00:00, 85.91it/s]Upper bound on the fitting time: 97%|█████████▋| 388/400 [00:04<00:00, 86.03it/s]Upper bound on the fitting time: 99%|█████████▉| 397/400 [00:04<00:00, 83.08it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 84.52it/s]
⚠️ Note: P-values are not available in the PlnPCA
model.
print(pca)
A multivariate PlnPCA with 5 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-330671.62 500 3990 342624.5968 334661.625 339771.13
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
This model is particularly efficient for high-dimensional datasets, offering significantly reduced computation time compared to Pln
:
The rank is a hyperparameter that can be specified by the user. Alternatively, a data-driven approach to rank selection can be performed using the PlnPCACollection
class, which fits multiple models with different ranks:
from pyPLNmodels import PlnPCACollection
= PlnPCACollection.from_formula('endog ~ 1 + labels', data=high_d_rna, ranks=[3, 5, 10, 15]).fit() pcas
Setting the offsets to zero.
Adjusting 4 PlnPCA models.
Fitting a PlnPCA model with 3 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.9 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 5 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.0 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 10 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.0 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnPCA model with 15 principal components.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.4 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
======================================================================
DONE!
Best model (lower BIC): rank 15
Best model(lower AIC): rank 15
======================================================================
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 8/400 [00:00<00:05, 74.82it/s]Upper bound on the fitting time: 4%|▍ | 16/400 [00:00<00:05, 73.40it/s]Upper bound on the fitting time: 6%|▌ | 24/400 [00:00<00:05, 73.27it/s]Upper bound on the fitting time: 8%|▊ | 32/400 [00:00<00:05, 72.31it/s]Upper bound on the fitting time: 10%|█ | 40/400 [00:00<00:04, 72.14it/s]Upper bound on the fitting time: 12%|█▏ | 48/400 [00:00<00:04, 71.64it/s]Upper bound on the fitting time: 14%|█▍ | 56/400 [00:00<00:04, 72.34it/s]Upper bound on the fitting time: 16%|█▌ | 64/400 [00:00<00:04, 72.15it/s]Upper bound on the fitting time: 18%|█▊ | 72/400 [00:01<00:04, 70.18it/s]Upper bound on the fitting time: 20%|██ | 80/400 [00:01<00:04, 71.48it/s]Upper bound on the fitting time: 22%|██▏ | 88/400 [00:01<00:04, 72.30it/s]Upper bound on the fitting time: 24%|██▍ | 96/400 [00:01<00:04, 71.14it/s]Upper bound on the fitting time: 26%|██▋ | 105/400 [00:01<00:03, 74.80it/s]Upper bound on the fitting time: 28%|██▊ | 114/400 [00:01<00:03, 78.72it/s]Upper bound on the fitting time: 31%|███ | 123/400 [00:01<00:03, 81.14it/s]Upper bound on the fitting time: 33%|███▎ | 132/400 [00:01<00:03, 82.18it/s]Upper bound on the fitting time: 35%|███▌ | 141/400 [00:01<00:03, 84.43it/s]Upper bound on the fitting time: 38%|███▊ | 150/400 [00:01<00:02, 85.95it/s]Upper bound on the fitting time: 40%|███▉ | 159/400 [00:02<00:02, 86.91it/s]Upper bound on the fitting time: 42%|████▏ | 168/400 [00:02<00:02, 87.37it/s]Upper bound on the fitting time: 44%|████▍ | 177/400 [00:02<00:02, 87.46it/s]Upper bound on the fitting time: 46%|████▋ | 186/400 [00:02<00:02, 87.50it/s]Upper bound on the fitting time: 49%|████▉ | 195/400 [00:02<00:02, 88.14it/s]Upper bound on the fitting time: 51%|█████ | 204/400 [00:02<00:02, 88.27it/s]Upper bound on the fitting time: 54%|█████▎ | 214/400 [00:02<00:02, 89.16it/s]Upper bound on the fitting time: 56%|█████▌ | 224/400 [00:02<00:01, 89.70it/s]Upper bound on the fitting time: 58%|█████▊ | 233/400 [00:02<00:01, 89.52it/s]Upper bound on the fitting time: 60%|██████ | 242/400 [00:02<00:01, 87.74it/s]Upper bound on the fitting time: 63%|██████▎ | 251/400 [00:03<00:01, 87.16it/s]Upper bound on the fitting time: 65%|██████▌ | 260/400 [00:03<00:01, 85.44it/s]Upper bound on the fitting time: 67%|██████▋ | 269/400 [00:03<00:01, 86.45it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:03<00:01, 86.85it/s]Upper bound on the fitting time: 72%|███████▏ | 287/400 [00:03<00:01, 87.32it/s]Upper bound on the fitting time: 74%|███████▍ | 297/400 [00:03<00:01, 87.91it/s]Upper bound on the fitting time: 76%|███████▋ | 306/400 [00:03<00:01, 87.70it/s]Upper bound on the fitting time: 79%|███████▉ | 315/400 [00:03<00:00, 88.02it/s]Upper bound on the fitting time: 81%|████████ | 324/400 [00:03<00:00, 88.04it/s]Upper bound on the fitting time: 83%|████████▎ | 333/400 [00:04<00:00, 86.15it/s]Upper bound on the fitting time: 86%|████████▌ | 342/400 [00:04<00:00, 86.63it/s]Upper bound on the fitting time: 88%|████████▊ | 351/400 [00:04<00:00, 86.96it/s]Upper bound on the fitting time: 90%|█████████ | 360/400 [00:04<00:00, 87.49it/s]Upper bound on the fitting time: 92%|█████████▏| 369/400 [00:04<00:00, 87.31it/s]Upper bound on the fitting time: 94%|█████████▍| 378/400 [00:04<00:00, 87.53it/s]Upper bound on the fitting time: 97%|█████████▋| 387/400 [00:04<00:00, 79.52it/s]Upper bound on the fitting time: 99%|█████████▉| 396/400 [00:04<00:00, 77.05it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 81.86it/s]
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 8/400 [00:00<00:05, 71.16it/s]Upper bound on the fitting time: 4%|▍ | 16/400 [00:00<00:05, 70.73it/s]Upper bound on the fitting time: 6%|▋ | 25/400 [00:00<00:04, 75.75it/s]Upper bound on the fitting time: 8%|▊ | 33/400 [00:00<00:04, 76.28it/s]Upper bound on the fitting time: 10%|█ | 42/400 [00:00<00:04, 77.95it/s]Upper bound on the fitting time: 12%|█▎ | 50/400 [00:00<00:04, 77.33it/s]Upper bound on the fitting time: 14%|█▍ | 58/400 [00:00<00:04, 76.23it/s]Upper bound on the fitting time: 16%|█▋ | 66/400 [00:00<00:04, 77.17it/s]Upper bound on the fitting time: 19%|█▉ | 75/400 [00:00<00:04, 78.85it/s]Upper bound on the fitting time: 21%|██ | 84/400 [00:01<00:03, 79.84it/s]Upper bound on the fitting time: 23%|██▎ | 93/400 [00:01<00:03, 81.05it/s]Upper bound on the fitting time: 26%|██▌ | 102/400 [00:01<00:03, 82.91it/s]Upper bound on the fitting time: 28%|██▊ | 111/400 [00:01<00:03, 84.08it/s]Upper bound on the fitting time: 30%|███ | 120/400 [00:01<00:03, 85.11it/s]Upper bound on the fitting time: 32%|███▏ | 129/400 [00:01<00:03, 85.78it/s]Upper bound on the fitting time: 34%|███▍ | 138/400 [00:01<00:03, 86.07it/s]Upper bound on the fitting time: 37%|███▋ | 147/400 [00:01<00:02, 86.42it/s]Upper bound on the fitting time: 39%|███▉ | 156/400 [00:01<00:02, 85.72it/s]Upper bound on the fitting time: 41%|████▏ | 165/400 [00:02<00:02, 85.36it/s]Upper bound on the fitting time: 44%|████▎ | 174/400 [00:02<00:02, 86.26it/s]Upper bound on the fitting time: 46%|████▌ | 183/400 [00:02<00:02, 86.30it/s]Upper bound on the fitting time: 48%|████▊ | 192/400 [00:02<00:02, 86.14it/s]Upper bound on the fitting time: 50%|█████ | 201/400 [00:02<00:02, 86.02it/s]Upper bound on the fitting time: 52%|█████▎ | 210/400 [00:02<00:02, 82.90it/s]Upper bound on the fitting time: 55%|█████▍ | 219/400 [00:02<00:02, 77.44it/s]Upper bound on the fitting time: 57%|█████▋ | 227/400 [00:02<00:02, 76.32it/s]Upper bound on the fitting time: 59%|█████▉ | 236/400 [00:02<00:02, 78.84it/s]Upper bound on the fitting time: 61%|██████▏ | 245/400 [00:03<00:01, 78.95it/s]Upper bound on the fitting time: 63%|██████▎ | 253/400 [00:03<00:01, 76.69it/s]Upper bound on the fitting time: 66%|██████▌ | 262/400 [00:03<00:01, 78.18it/s]Upper bound on the fitting time: 68%|██████▊ | 271/400 [00:03<00:01, 80.89it/s]Upper bound on the fitting time: 70%|███████ | 280/400 [00:03<00:01, 82.88it/s]Upper bound on the fitting time: 72%|███████▏ | 289/400 [00:03<00:01, 84.37it/s]Upper bound on the fitting time: 74%|███████▍ | 298/400 [00:03<00:01, 83.60it/s]Upper bound on the fitting time: 77%|███████▋ | 307/400 [00:03<00:01, 81.66it/s]Upper bound on the fitting time: 79%|███████▉ | 316/400 [00:03<00:01, 80.25it/s]Upper bound on the fitting time: 81%|████████▏ | 325/400 [00:04<00:00, 79.36it/s]Upper bound on the fitting time: 84%|████████▎ | 334/400 [00:04<00:00, 80.82it/s]Upper bound on the fitting time: 86%|████████▌ | 343/400 [00:04<00:00, 81.62it/s]Upper bound on the fitting time: 88%|████████▊ | 352/400 [00:04<00:00, 82.67it/s]Upper bound on the fitting time: 90%|█████████ | 361/400 [00:04<00:00, 82.77it/s]Upper bound on the fitting time: 92%|█████████▎| 370/400 [00:04<00:00, 79.56it/s]Upper bound on the fitting time: 94%|█████████▍| 378/400 [00:04<00:00, 74.05it/s]Upper bound on the fitting time: 96%|█████████▋| 386/400 [00:04<00:00, 72.85it/s]Upper bound on the fitting time: 98%|█████████▊| 394/400 [00:04<00:00, 72.82it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 80.09it/s]
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 8/400 [00:00<00:04, 79.34it/s]Upper bound on the fitting time: 4%|▍ | 17/400 [00:00<00:04, 81.25it/s]Upper bound on the fitting time: 6%|▋ | 26/400 [00:00<00:04, 81.54it/s]Upper bound on the fitting time: 9%|▉ | 35/400 [00:00<00:04, 80.65it/s]Upper bound on the fitting time: 11%|█ | 44/400 [00:00<00:04, 77.33it/s]Upper bound on the fitting time: 13%|█▎ | 53/400 [00:00<00:04, 78.75it/s]Upper bound on the fitting time: 16%|█▌ | 62/400 [00:00<00:04, 79.83it/s]Upper bound on the fitting time: 18%|█▊ | 71/400 [00:00<00:04, 80.56it/s]Upper bound on the fitting time: 20%|██ | 80/400 [00:01<00:03, 80.32it/s]Upper bound on the fitting time: 22%|██▏ | 89/400 [00:01<00:03, 80.81it/s]Upper bound on the fitting time: 24%|██▍ | 98/400 [00:01<00:03, 81.50it/s]Upper bound on the fitting time: 27%|██▋ | 107/400 [00:01<00:03, 81.87it/s]Upper bound on the fitting time: 29%|██▉ | 116/400 [00:01<00:03, 82.01it/s]Upper bound on the fitting time: 31%|███▏ | 125/400 [00:01<00:03, 81.94it/s]Upper bound on the fitting time: 34%|███▎ | 134/400 [00:01<00:03, 82.12it/s]Upper bound on the fitting time: 36%|███▌ | 143/400 [00:01<00:03, 82.12it/s]Upper bound on the fitting time: 38%|███▊ | 152/400 [00:01<00:03, 81.78it/s]Upper bound on the fitting time: 40%|████ | 161/400 [00:01<00:02, 82.00it/s]Upper bound on the fitting time: 42%|████▎ | 170/400 [00:02<00:02, 82.31it/s]Upper bound on the fitting time: 45%|████▍ | 179/400 [00:02<00:02, 80.33it/s]Upper bound on the fitting time: 47%|████▋ | 188/400 [00:02<00:02, 81.15it/s]Upper bound on the fitting time: 49%|████▉ | 197/400 [00:02<00:02, 81.62it/s]Upper bound on the fitting time: 52%|█████▏ | 206/400 [00:02<00:02, 81.82it/s]Upper bound on the fitting time: 54%|█████▍ | 215/400 [00:02<00:02, 81.52it/s]Upper bound on the fitting time: 56%|█████▌ | 224/400 [00:02<00:02, 81.46it/s]Upper bound on the fitting time: 58%|█████▊ | 233/400 [00:02<00:02, 81.79it/s]Upper bound on the fitting time: 60%|██████ | 242/400 [00:02<00:01, 81.85it/s]Upper bound on the fitting time: 63%|██████▎ | 251/400 [00:03<00:01, 80.71it/s]Upper bound on the fitting time: 65%|██████▌ | 260/400 [00:03<00:01, 81.04it/s]Upper bound on the fitting time: 67%|██████▋ | 269/400 [00:03<00:01, 81.37it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:03<00:01, 81.91it/s]Upper bound on the fitting time: 72%|███████▏ | 287/400 [00:03<00:01, 81.96it/s]Upper bound on the fitting time: 74%|███████▍ | 296/400 [00:03<00:01, 79.39it/s]Upper bound on the fitting time: 76%|███████▌ | 304/400 [00:03<00:01, 77.90it/s]Upper bound on the fitting time: 78%|███████▊ | 312/400 [00:03<00:01, 76.19it/s]Upper bound on the fitting time: 80%|████████ | 320/400 [00:03<00:01, 76.62it/s]Upper bound on the fitting time: 82%|████████▏ | 328/400 [00:04<00:00, 76.60it/s]Upper bound on the fitting time: 84%|████████▍ | 336/400 [00:04<00:00, 76.88it/s]Upper bound on the fitting time: 86%|████████▌ | 344/400 [00:04<00:00, 77.11it/s]Upper bound on the fitting time: 88%|████████▊ | 352/400 [00:04<00:00, 76.94it/s]Upper bound on the fitting time: 90%|█████████ | 360/400 [00:04<00:00, 76.79it/s]Upper bound on the fitting time: 92%|█████████▏| 368/400 [00:04<00:00, 76.71it/s]Upper bound on the fitting time: 94%|█████████▍| 376/400 [00:04<00:00, 77.29it/s]Upper bound on the fitting time: 96%|█████████▋| 385/400 [00:04<00:00, 78.52it/s]Upper bound on the fitting time: 98%|█████████▊| 394/400 [00:04<00:00, 79.40it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:05<00:00, 79.99it/s]
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 9/400 [00:00<00:04, 81.63it/s]Upper bound on the fitting time: 4%|▍ | 18/400 [00:00<00:04, 79.52it/s]Upper bound on the fitting time: 6%|▋ | 26/400 [00:00<00:04, 78.83it/s]Upper bound on the fitting time: 8%|▊ | 34/400 [00:00<00:05, 72.70it/s]Upper bound on the fitting time: 10%|█ | 42/400 [00:00<00:05, 69.90it/s]Upper bound on the fitting time: 12%|█▎ | 50/400 [00:00<00:04, 71.23it/s]Upper bound on the fitting time: 14%|█▍ | 58/400 [00:00<00:04, 71.51it/s]Upper bound on the fitting time: 16%|█▋ | 66/400 [00:00<00:04, 72.43it/s]Upper bound on the fitting time: 18%|█▊ | 74/400 [00:01<00:04, 73.86it/s]Upper bound on the fitting time: 20%|██ | 82/400 [00:01<00:04, 74.41it/s]Upper bound on the fitting time: 22%|██▎ | 90/400 [00:01<00:04, 75.36it/s]Upper bound on the fitting time: 24%|██▍ | 98/400 [00:01<00:03, 76.21it/s]Upper bound on the fitting time: 26%|██▋ | 106/400 [00:01<00:03, 76.38it/s]Upper bound on the fitting time: 28%|██▊ | 114/400 [00:01<00:03, 76.86it/s]Upper bound on the fitting time: 30%|███ | 122/400 [00:01<00:03, 76.88it/s]Upper bound on the fitting time: 32%|███▎ | 130/400 [00:01<00:03, 75.39it/s]Upper bound on the fitting time: 34%|███▍ | 138/400 [00:01<00:03, 73.14it/s]Upper bound on the fitting time: 36%|███▋ | 146/400 [00:01<00:03, 70.52it/s]Upper bound on the fitting time: 38%|███▊ | 154/400 [00:02<00:03, 67.96it/s]Upper bound on the fitting time: 40%|████ | 161/400 [00:02<00:03, 66.97it/s]Upper bound on the fitting time: 42%|████▏ | 168/400 [00:02<00:03, 66.04it/s]Upper bound on the fitting time: 44%|████▍ | 175/400 [00:02<00:03, 66.85it/s]Upper bound on the fitting time: 46%|████▌ | 183/400 [00:02<00:03, 69.65it/s]Upper bound on the fitting time: 48%|████▊ | 191/400 [00:02<00:02, 71.68it/s]Upper bound on the fitting time: 50%|████▉ | 199/400 [00:02<00:02, 71.98it/s]Upper bound on the fitting time: 52%|█████▏ | 207/400 [00:02<00:02, 72.25it/s]Upper bound on the fitting time: 54%|█████▍ | 215/400 [00:02<00:02, 74.01it/s]Upper bound on the fitting time: 56%|█████▌ | 223/400 [00:03<00:02, 75.47it/s]Upper bound on the fitting time: 58%|█████▊ | 231/400 [00:03<00:02, 75.91it/s]Upper bound on the fitting time: 60%|█████▉ | 239/400 [00:03<00:02, 76.49it/s]Upper bound on the fitting time: 62%|██████▏ | 247/400 [00:03<00:02, 74.96it/s]Upper bound on the fitting time: 64%|██████▍ | 255/400 [00:03<00:01, 75.45it/s]Upper bound on the fitting time: 66%|██████▌ | 263/400 [00:03<00:01, 75.09it/s]Upper bound on the fitting time: 68%|██████▊ | 271/400 [00:03<00:01, 76.09it/s]Upper bound on the fitting time: 70%|██████▉ | 279/400 [00:03<00:01, 76.57it/s]Upper bound on the fitting time: 72%|███████▏ | 287/400 [00:03<00:01, 76.99it/s]Upper bound on the fitting time: 74%|███████▍ | 295/400 [00:04<00:01, 76.65it/s]Upper bound on the fitting time: 76%|███████▌ | 303/400 [00:04<00:01, 77.37it/s]Upper bound on the fitting time: 78%|███████▊ | 311/400 [00:04<00:01, 77.58it/s]Upper bound on the fitting time: 80%|███████▉ | 319/400 [00:04<00:01, 78.00it/s]Upper bound on the fitting time: 82%|████████▏ | 327/400 [00:04<00:00, 77.55it/s]Upper bound on the fitting time: 84%|████████▍ | 335/400 [00:04<00:00, 74.65it/s]Upper bound on the fitting time: 86%|████████▌ | 343/400 [00:04<00:00, 72.87it/s]Upper bound on the fitting time: 88%|████████▊ | 351/400 [00:04<00:00, 73.54it/s]Upper bound on the fitting time: 90%|████████▉ | 359/400 [00:04<00:00, 75.01it/s]Upper bound on the fitting time: 92%|█████████▏| 367/400 [00:04<00:00, 75.91it/s]Upper bound on the fitting time: 94%|█████████▍| 375/400 [00:05<00:00, 76.71it/s]Upper bound on the fitting time: 96%|█████████▌| 383/400 [00:05<00:00, 76.74it/s]Upper bound on the fitting time: 98%|█████████▊| 391/400 [00:05<00:00, 76.93it/s]Upper bound on the fitting time: 100%|█████████▉| 399/400 [00:05<00:00, 77.29it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:05<00:00, 74.34it/s]
Use the .show()
method to explore insights about the optimal rank:
pcas.show()
⚠️ Note: The best model typically correspond to the largest rank, which may not always be desirable.
The best model can be accessed using the .best_model()
method. The selection criterion can be AIC
, BIC
, or ICL
:
= pcas.best_model(criterion="BIC")
best_model print(best_model)
A multivariate PlnPCA with 15 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-281440.0 500 8895 308087.0386 290335.0 299427.15
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
All individual models in the collection can be accessed with the rank as the key:
= pcas[5]
pca_5 print(pca_5)
A multivariate PlnPCA with 5 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-331103.12 500 3990 343056.0968 335093.125 340204.75
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
You can also iterate through the collection:
for pca in pcas.values():
print(pca)
A multivariate PlnPCA with 3 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-363021.88 500 2997 372000.0846 366018.875 370292.93
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
A multivariate PlnPCA with 5 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-331103.12 500 3990 343056.0968 335093.125 340204.75
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
A multivariate PlnPCA with 10 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-295765.38 500 6455 315102.8268 302220.375 309371.6
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
A multivariate PlnPCA with 15 principal components.
======================================================================
Loglike Dimension Nb param BIC AIC ICL
-281440.0 500 8895 308087.0386 290335.0 299427.15
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnPCA are:
None
* Additional methods for PlnPCA are:
None
For further details, print the model to explore its methods and attributes.