from pyPLNmodels import load_scrna
= load_scrna(dim = 20)
data = data["endog"] endog
Returning scRNA dataset of size (400, 20)
The PlnNetwork
model in the pyPLNmodels package is designed for sparse network inference from multivariate count data, using a penalized precision matrix. It allows for network structure learning between variables, such as genes in single-cell RNA sequencing datasets.
This tutorial demonstrates how to:
Initialize a PlnNetwork
(documentation) model from count data or using a formula interface
Fit the model with regularization on the precision matrix
Visualize inferred network structures and latent variables
Gives tools for selecting the optimal penalty
Compared to the Pln
model, a constraint is imposed on the precision matrix \(\Sigma^{-1}\) of the latent space to enforce sparsity (Chiquet, Robin, and Mariadassou (2019)):
\[ \begin{align} Z_i &\sim \mathcal{N}(X_i^{\top} B, \Sigma), \quad \|\Sigma^{-1}\|_1 \leq C \\ Y_{ij} \mid Z_{ij} &\sim \mathcal{P}(\exp(o_{ij} + Z_{ij})), \end{align} \] where \(C\) is a penalty parameter. See Chiquet, Robin, and Mariadassou (2019) for more details.
For visualization purposes, we use only 20 variables (i.e. genes).
from pyPLNmodels import load_scrna
= load_scrna(dim = 20)
data = data["endog"] endog
Returning scRNA dataset of size (400, 20)
A penalty needs to be specified for the model. The greater the penalty, the sparser the precision matrix will be and the lower the number of linked variables. We use a value of \(200\) for now.
from pyPLNmodels import PlnNetwork
= PlnNetwork(endog, penalty=200).fit() net
Setting the offsets to zero.
Fitting a PlnNetwork model with penalty 200.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 6.4 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 3%|▎ | 12/400 [00:00<00:03, 116.59it/s]Upper bound on the fitting time: 6%|▌ | 24/400 [00:00<00:03, 116.57it/s]Upper bound on the fitting time: 9%|▉ | 36/400 [00:00<00:03, 113.43it/s]Upper bound on the fitting time: 12%|█▏ | 48/400 [00:00<00:03, 115.91it/s]Upper bound on the fitting time: 15%|█▌ | 60/400 [00:00<00:02, 116.04it/s]Upper bound on the fitting time: 18%|█▊ | 72/400 [00:00<00:02, 116.01it/s]Upper bound on the fitting time: 21%|██ | 84/400 [00:00<00:02, 115.28it/s]Upper bound on the fitting time: 24%|██▍ | 96/400 [00:00<00:02, 115.73it/s]Upper bound on the fitting time: 27%|██▋ | 108/400 [00:00<00:02, 116.31it/s]Upper bound on the fitting time: 30%|███ | 120/400 [00:01<00:02, 117.37it/s]Upper bound on the fitting time: 33%|███▎ | 132/400 [00:01<00:02, 114.87it/s]Upper bound on the fitting time: 36%|███▌ | 144/400 [00:01<00:02, 114.69it/s]Upper bound on the fitting time: 39%|███▉ | 156/400 [00:01<00:02, 114.92it/s]Upper bound on the fitting time: 42%|████▏ | 168/400 [00:01<00:02, 113.17it/s]Upper bound on the fitting time: 45%|████▌ | 180/400 [00:01<00:01, 111.13it/s]Upper bound on the fitting time: 48%|████▊ | 193/400 [00:01<00:01, 114.34it/s]Upper bound on the fitting time: 51%|█████▏ | 205/400 [00:01<00:01, 115.95it/s]Upper bound on the fitting time: 54%|█████▍ | 217/400 [00:01<00:01, 115.05it/s]Upper bound on the fitting time: 57%|█████▋ | 229/400 [00:01<00:01, 116.04it/s]Upper bound on the fitting time: 60%|██████ | 241/400 [00:02<00:01, 116.80it/s]Upper bound on the fitting time: 63%|██████▎ | 253/400 [00:02<00:01, 116.63it/s]Upper bound on the fitting time: 66%|██████▋ | 266/400 [00:02<00:01, 117.86it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:02<00:01, 117.10it/s]Upper bound on the fitting time: 72%|███████▎ | 290/400 [00:02<00:00, 115.55it/s]Upper bound on the fitting time: 76%|███████▌ | 302/400 [00:02<00:00, 114.02it/s]Upper bound on the fitting time: 78%|███████▊ | 314/400 [00:02<00:00, 114.17it/s]Upper bound on the fitting time: 82%|████████▏ | 326/400 [00:02<00:00, 115.37it/s]Upper bound on the fitting time: 85%|████████▍ | 339/400 [00:02<00:00, 117.19it/s]Upper bound on the fitting time: 88%|████████▊ | 351/400 [00:03<00:00, 114.98it/s]Upper bound on the fitting time: 91%|█████████ | 363/400 [00:03<00:00, 116.16it/s]Upper bound on the fitting time: 94%|█████████▍| 375/400 [00:03<00:00, 117.26it/s]Upper bound on the fitting time: 97%|█████████▋| 387/400 [00:03<00:00, 116.69it/s]Upper bound on the fitting time: 100%|█████████▉| 399/400 [00:03<00:00, 115.31it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 115.41it/s]
net.viz_network()
The network can be accessed as a dictionnary through the network
attribute:
= net.network
network print(network)
print("Genes associated with APOE: ", network["APOE"])
{'FTL': ['FTH1', 'SPP1', 'APOE', 'B2M', 'ACTB', 'HLA-DRA', 'APOC1', 'MT-CO1', 'TMSB10', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'LYZ'], 'MALAT1': ['HLA-DRA'], 'FTH1': ['FTL', 'TMSB4X', 'SPP1', 'APOE', 'ACTB', 'HLA-DRA', 'APOC1', 'TMSB10', 'EEF1A1', 'HLA-DRB1', 'LYZ', 'RPLP1'], 'TMSB4X': ['FTH1', 'HLA-DRA', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'MT-CO2', 'LYZ', 'RPLP1'], 'CD74': ['SPP1', 'APOE', 'B2M', 'ACTB', 'HLA-DRA', 'APOC1', 'MT-CO1', 'TMSB10', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'LYZ'], 'SPP1': ['FTL', 'FTH1', 'CD74', 'APOE', 'HLA-DRA', 'APOC1', 'LYZ'], 'APOE': ['FTL', 'FTH1', 'CD74', 'SPP1', 'HLA-DRA', 'APOC1', 'MT-CO1', 'TMSB10', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'LYZ'], 'B2M': ['FTL', 'CD74', 'HLA-DRA', 'TMSB10', 'HLA-DRB1', 'RPLP1'], 'ACTB': ['FTL', 'FTH1', 'CD74', 'HLA-DRA', 'MT-CO1', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'MT-CO2', 'RPLP1'], 'HLA-DRA': ['FTL', 'MALAT1', 'FTH1', 'TMSB4X', 'CD74', 'SPP1', 'APOE', 'B2M', 'ACTB', 'APOC1', 'MT-CO1', 'TMSB10', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'MT-CO2', 'LYZ', 'RPLP1'], 'APOC1': ['FTL', 'FTH1', 'CD74', 'SPP1', 'APOE', 'HLA-DRA', 'MT-CO1', 'TMSB10', 'HLA-DPB1', 'LYZ'], 'MT-CO1': ['FTL', 'CD74', 'APOE', 'ACTB', 'HLA-DRA', 'APOC1', 'TMSB10', 'EEF1A1', 'LYZ'], 'TMSB10': ['FTL', 'FTH1', 'CD74', 'APOE', 'B2M', 'HLA-DRA', 'APOC1', 'MT-CO1', 'EEF1A1', 'HLA-DRB1', 'LYZ', 'RPLP1'], 'EEF1A1': ['FTH1', 'TMSB4X', 'APOE', 'ACTB', 'HLA-DRA', 'MT-CO1', 'TMSB10', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'RPLP1'], 'HLA-DRB1': ['FTL', 'FTH1', 'TMSB4X', 'CD74', 'APOE', 'B2M', 'ACTB', 'HLA-DRA', 'TMSB10', 'EEF1A1', 'HLA-DPA1', 'HLA-DPB1', 'LYZ'], 'HLA-DPA1': ['FTL', 'TMSB4X', 'CD74', 'APOE', 'ACTB', 'HLA-DRA', 'EEF1A1', 'HLA-DRB1', 'HLA-DPB1', 'MT-CO2', 'LYZ'], 'HLA-DPB1': ['FTL', 'CD74', 'APOE', 'HLA-DRA', 'APOC1', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'MT-CO2', 'LYZ'], 'MT-CO2': ['TMSB4X', 'ACTB', 'HLA-DRA', 'HLA-DPA1', 'HLA-DPB1'], 'LYZ': ['FTL', 'FTH1', 'TMSB4X', 'CD74', 'SPP1', 'APOE', 'HLA-DRA', 'APOC1', 'MT-CO1', 'TMSB10', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'RPLP1'], 'RPLP1': ['FTH1', 'TMSB4X', 'B2M', 'ACTB', 'HLA-DRA', 'TMSB10', 'EEF1A1', 'LYZ']}
Genes associated with APOE: ['FTL', 'FTH1', 'CD74', 'SPP1', 'HLA-DRA', 'APOC1', 'MT-CO1', 'TMSB10', 'EEF1A1', 'HLA-DRB1', 'HLA-DPA1', 'HLA-DPB1', 'LYZ']
PlnNetworkCollection
for selecting the optimal penaltyThe penalty is an hyperparameter that needs to be tuned. To explore multiple penalty solutions, use PlnNetworkCollection
:
from pyPLNmodels import PlnNetworkCollection
= PlnNetworkCollection(endog, penalties=[0.1,10,1000]).fit() collection
Setting the offsets to zero.
Adjusting 3 PlnNetwork models.
Fitting a PlnNetwork model with penalty 0.1.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 3.5 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnNetwork model with penalty 10.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 3.6 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
Fitting a PlnNetwork model with penalty 1000.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 3.5 seconds.
Last criterion = 5.4e-06 . Required tolerance = 1e-06
======================================================================
DONE!
Best model (lower BIC): penalty 0.1
Best model(lower AIC): penalty 0.1
======================================================================
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 3%|▎ | 13/400 [00:00<00:03, 127.94it/s]Upper bound on the fitting time: 6%|▋ | 26/400 [00:00<00:03, 117.24it/s]Upper bound on the fitting time: 10%|▉ | 38/400 [00:00<00:03, 116.78it/s]Upper bound on the fitting time: 12%|█▎ | 50/400 [00:00<00:02, 117.88it/s]Upper bound on the fitting time: 16%|█▌ | 62/400 [00:00<00:02, 118.51it/s]Upper bound on the fitting time: 19%|█▉ | 75/400 [00:00<00:02, 119.49it/s]Upper bound on the fitting time: 22%|██▏ | 88/400 [00:00<00:02, 120.32it/s]Upper bound on the fitting time: 25%|██▌ | 101/400 [00:00<00:02, 118.44it/s]Upper bound on the fitting time: 28%|██▊ | 113/400 [00:00<00:02, 118.78it/s]Upper bound on the fitting time: 32%|███▏ | 126/400 [00:01<00:02, 118.69it/s]Upper bound on the fitting time: 35%|███▍ | 139/400 [00:01<00:02, 119.98it/s]Upper bound on the fitting time: 38%|███▊ | 152/400 [00:01<00:02, 119.94it/s]Upper bound on the fitting time: 41%|████ | 164/400 [00:01<00:02, 117.19it/s]Upper bound on the fitting time: 44%|████▍ | 176/400 [00:01<00:01, 117.39it/s]Upper bound on the fitting time: 47%|████▋ | 188/400 [00:01<00:01, 118.01it/s]Upper bound on the fitting time: 50%|█████ | 200/400 [00:01<00:01, 117.87it/s]Upper bound on the fitting time: 53%|█████▎ | 212/400 [00:01<00:01, 118.08it/s]Upper bound on the fitting time: 56%|█████▋ | 225/400 [00:01<00:01, 119.19it/s]Upper bound on the fitting time: 59%|█████▉ | 237/400 [00:02<00:01, 116.58it/s]Upper bound on the fitting time: 62%|██████▏ | 249/400 [00:02<00:01, 116.48it/s]Upper bound on the fitting time: 65%|██████▌ | 261/400 [00:02<00:01, 116.88it/s]Upper bound on the fitting time: 68%|██████▊ | 273/400 [00:02<00:01, 116.25it/s]Upper bound on the fitting time: 71%|███████▏ | 285/400 [00:02<00:00, 117.01it/s]Upper bound on the fitting time: 74%|███████▍ | 297/400 [00:02<00:00, 117.22it/s]Upper bound on the fitting time: 77%|███████▋ | 309/400 [00:02<00:00, 115.31it/s]Upper bound on the fitting time: 80%|████████ | 322/400 [00:02<00:00, 116.97it/s]Upper bound on the fitting time: 84%|████████▎ | 334/400 [00:02<00:00, 117.59it/s]Upper bound on the fitting time: 86%|████████▋ | 346/400 [00:02<00:00, 118.17it/s]Upper bound on the fitting time: 90%|████████▉ | 358/400 [00:03<00:00, 117.88it/s]Upper bound on the fitting time: 92%|█████████▎| 370/400 [00:03<00:00, 118.12it/s]Upper bound on the fitting time: 96%|█████████▌| 382/400 [00:03<00:00, 113.44it/s]Upper bound on the fitting time: 98%|█████████▊| 394/400 [00:03<00:00, 100.98it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 115.18it/s]
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 8/400 [00:00<00:05, 74.48it/s]Upper bound on the fitting time: 4%|▍ | 16/400 [00:00<00:05, 72.80it/s]Upper bound on the fitting time: 6%|▌ | 24/400 [00:00<00:04, 75.31it/s]Upper bound on the fitting time: 8%|▊ | 34/400 [00:00<00:04, 82.78it/s]Upper bound on the fitting time: 11%|█ | 44/400 [00:00<00:04, 88.45it/s]Upper bound on the fitting time: 14%|█▍ | 55/400 [00:00<00:03, 94.60it/s]Upper bound on the fitting time: 17%|█▋ | 67/400 [00:00<00:03, 102.62it/s]Upper bound on the fitting time: 20%|█▉ | 79/400 [00:00<00:02, 107.20it/s]Upper bound on the fitting time: 23%|██▎ | 91/400 [00:00<00:02, 108.80it/s]Upper bound on the fitting time: 26%|██▌ | 103/400 [00:01<00:02, 110.81it/s]Upper bound on the fitting time: 29%|██▉ | 116/400 [00:01<00:02, 114.16it/s]Upper bound on the fitting time: 32%|███▏ | 129/400 [00:01<00:02, 116.13it/s]Upper bound on the fitting time: 36%|███▌ | 142/400 [00:01<00:02, 118.19it/s]Upper bound on the fitting time: 39%|███▉ | 155/400 [00:01<00:02, 118.90it/s]Upper bound on the fitting time: 42%|████▏ | 167/400 [00:01<00:01, 117.54it/s]Upper bound on the fitting time: 45%|████▍ | 179/400 [00:01<00:01, 117.67it/s]Upper bound on the fitting time: 48%|████▊ | 192/400 [00:01<00:01, 118.37it/s]Upper bound on the fitting time: 51%|█████ | 204/400 [00:01<00:01, 116.86it/s]Upper bound on the fitting time: 54%|█████▍ | 216/400 [00:01<00:01, 117.02it/s]Upper bound on the fitting time: 57%|█████▋ | 228/400 [00:02<00:01, 117.00it/s]Upper bound on the fitting time: 60%|██████ | 240/400 [00:02<00:01, 117.58it/s]Upper bound on the fitting time: 63%|██████▎ | 253/400 [00:02<00:01, 118.36it/s]Upper bound on the fitting time: 66%|██████▋ | 266/400 [00:02<00:01, 119.48it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:02<00:01, 119.60it/s]Upper bound on the fitting time: 72%|███████▎ | 290/400 [00:02<00:00, 119.62it/s]Upper bound on the fitting time: 76%|███████▌ | 302/400 [00:02<00:00, 117.82it/s]Upper bound on the fitting time: 78%|███████▊ | 314/400 [00:02<00:00, 117.69it/s]Upper bound on the fitting time: 82%|████████▏ | 326/400 [00:02<00:00, 118.16it/s]Upper bound on the fitting time: 85%|████████▍ | 339/400 [00:03<00:00, 119.42it/s]Upper bound on the fitting time: 88%|████████▊ | 351/400 [00:03<00:00, 113.86it/s]Upper bound on the fitting time: 91%|█████████ | 363/400 [00:03<00:00, 115.07it/s]Upper bound on the fitting time: 94%|█████████▍| 375/400 [00:03<00:00, 114.35it/s]Upper bound on the fitting time: 97%|█████████▋| 387/400 [00:03<00:00, 115.63it/s]Upper bound on the fitting time: 100%|█████████▉| 399/400 [00:03<00:00, 116.86it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 112.12it/s]
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 3%|▎ | 12/400 [00:00<00:03, 118.97it/s]Upper bound on the fitting time: 6%|▌ | 24/400 [00:00<00:03, 102.87it/s]Upper bound on the fitting time: 9%|▉ | 35/400 [00:00<00:03, 104.60it/s]Upper bound on the fitting time: 12%|█▏ | 47/400 [00:00<00:03, 108.85it/s]Upper bound on the fitting time: 15%|█▍ | 59/400 [00:00<00:03, 111.82it/s]Upper bound on the fitting time: 18%|█▊ | 71/400 [00:00<00:02, 114.00it/s]Upper bound on the fitting time: 21%|██ | 83/400 [00:00<00:02, 114.80it/s]Upper bound on the fitting time: 24%|██▍ | 95/400 [00:00<00:02, 115.25it/s]Upper bound on the fitting time: 27%|██▋ | 107/400 [00:00<00:02, 112.24it/s]Upper bound on the fitting time: 30%|██▉ | 119/400 [00:01<00:02, 112.11it/s]Upper bound on the fitting time: 33%|███▎ | 131/400 [00:01<00:02, 113.37it/s]Upper bound on the fitting time: 36%|███▌ | 143/400 [00:01<00:02, 113.19it/s]Upper bound on the fitting time: 39%|███▉ | 155/400 [00:01<00:02, 114.86it/s]Upper bound on the fitting time: 42%|████▏ | 168/400 [00:01<00:01, 117.22it/s]Upper bound on the fitting time: 45%|████▌ | 180/400 [00:01<00:01, 116.24it/s]Upper bound on the fitting time: 48%|████▊ | 192/400 [00:01<00:01, 116.81it/s]Upper bound on the fitting time: 51%|█████▏ | 205/400 [00:01<00:01, 117.82it/s]Upper bound on the fitting time: 55%|█████▍ | 218/400 [00:01<00:01, 119.37it/s]Upper bound on the fitting time: 57%|█████▊ | 230/400 [00:02<00:01, 119.36it/s]Upper bound on the fitting time: 60%|██████ | 242/400 [00:02<00:01, 118.99it/s]Upper bound on the fitting time: 64%|██████▎ | 254/400 [00:02<00:01, 116.78it/s]Upper bound on the fitting time: 66%|██████▋ | 266/400 [00:02<00:01, 117.40it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:02<00:01, 117.37it/s]Upper bound on the fitting time: 72%|███████▎ | 290/400 [00:02<00:00, 117.41it/s]Upper bound on the fitting time: 76%|███████▌ | 302/400 [00:02<00:00, 117.88it/s]Upper bound on the fitting time: 78%|███████▊ | 314/400 [00:02<00:00, 116.76it/s]Upper bound on the fitting time: 82%|████████▏ | 326/400 [00:02<00:00, 116.89it/s]Upper bound on the fitting time: 84%|████████▍ | 338/400 [00:02<00:00, 117.80it/s]Upper bound on the fitting time: 88%|████████▊ | 350/400 [00:03<00:00, 116.88it/s]Upper bound on the fitting time: 90%|█████████ | 362/400 [00:03<00:00, 114.64it/s]Upper bound on the fitting time: 94%|█████████▎| 374/400 [00:03<00:00, 115.93it/s]Upper bound on the fitting time: 96%|█████████▋| 386/400 [00:03<00:00, 115.17it/s]Upper bound on the fitting time: 100%|█████████▉| 398/400 [00:03<00:00, 115.45it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:03<00:00, 115.29it/s]
=(8, 8)) collection.show(figsize
This displays the BIC, AIC and log-likelihood criteria for each model, helping to identify the most suitable one. Additionally, the number of links (i.e., non-zero elements in the precision matrix) is provided. You may consider using the ELBOW method to select the optimal model based on the number of links.
The best model can be chosen according to the BIC or AIC:
= collection.best_model("BIC") # Could be also AIC.
best_net print(best_net)
A multivariate PlnNetwork with penalty 0.1.
======================================================================
Loglike Dimension Nb param BIC AIC ICL Nb edges
-22940.33 20 229 23626.3528 23169.3301 10467.28 189
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnNetwork are:
.network .nb_zeros_precision
* Additional methods for PlnNetwork are:
.viz_network()
One can also access each individual model in the collection, using the penalty as a key:
print(collection[10])
A multivariate PlnNetwork with penalty 10.
======================================================================
Loglike Dimension Nb param BIC AIC ICL Nb edges
-23560.48 20 174 24081.7398 23734.4824 10591.98 134
======================================================================
* Useful attributes
.latent_variables .latent_positions .coef .covariance .precision .model_parameters .latent_parameters .optim_details
* Useful methods
.transform() .show() .predict() .sigma() .projected_latent_variables() .plot_correlation_circle() .biplot() .viz() .pca_pairplot() .plot_expected_vs_true()
* Additional attributes for PlnNetwork are:
.network .nb_zeros_precision
* Additional methods for PlnNetwork are:
.viz_network()
and loop through the collection:
for net in collection.values():
print(net)