from pyPLNmodels import load_crossover
= load_crossover(chromosome_numbers=[1, 5, 8])
data print('Data: ', data.keys())
Returning crossover dataset of size (474, 4)
Data: dict_keys(['endog', 'chrom', 'chrom_1hot', 'offsets', 'location'])
The Poisson lognormal (PLN) model can be extended to capture temporal (or similarly, 1D spatial dependencies) by structuring the latent space. This leads to the PlnAR
(PLN autoregressive, documentation) model, where an autocorrelation matrix \(\Phi\) encodes how latent variables evolve across time (or space).
Depending on the desired complexity, several types of autocorrelation structures for \(\Phi\) can be used:
full
: Each variable depends on all others (\(\Phi\) is a full matrix).diagonal
: Each variable follows its own independent temporal dynamics (\(\Phi\) is diagonal).spherical
: All variables share a common temporal pattern (\(\Phi\) is a scalar multiple of the identity).The mathematical model is given by:
\[ \begin{align} Z_{i} & \sim \mathcal{N}(X_i^{\top} B, \Sigma)\\ Z_{i}|Z_{i-1} & \sim \Phi Z_{i-1} + \mathcal{N}(\mu^{\epsilon}_i, \Sigma^{\epsilon}) \\ Y_{ij} \mid Z_{ij} &\sim \mathcal{P}(\exp(o_{ij} + Z_{ij})), \end{align} \]
with:
The matrix \(\Phi\) is constrained such that \(\Sigma^{\epsilon}\) remains positive definite.
⚠️ Note: Setting \(\Phi\) to diagonal
enforces a diagonal covariance structure for the correlation matrix \(\Sigma\).
The PlnAR
package provides tools to analyze dependencies between consecutive time points or spatial sites by estimating the autoregression matrix \(\Phi\).
We illustrate the package using the crossover dataset (Petit et al. (2017)), which contains recombination patterns in sheep over the genome. This dataset captures male meiotic recombination maps for the Lacaune breed, combining historical data from Lacaune and Soay sheep.
from pyPLNmodels import load_crossover
= load_crossover(chromosome_numbers=[1, 5, 8])
data print('Data: ', data.keys())
Returning crossover dataset of size (474, 4)
Data: dict_keys(['endog', 'chrom', 'chrom_1hot', 'offsets', 'location'])
print(data["endog"].head())
nco_Lacaune_M nco_Lacaune_F nco_Soay_F nco_Soay_M
detailed_location
chrom1Loc:0-1 97 0 9 52
chrom1Loc:1-2 279 2 15 156
chrom1Loc:2-3 205 6 21 112
chrom1Loc:3-4 249 2 31 124
chrom1Loc:4-5 446 11 69 180
The dataset also includes chromosome identifiers:
print(data["chrom"].unique())
['1' '5' '8']
We start by fitting a model with a diagonal autoregressive structure:
from pyPLNmodels import PlnAR
= PlnAR.from_formula("endog ~ 1", data=data, ar_type="diagonal").fit() ar_diag
Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type diagonal.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 6.4 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▎ | 10/400 [00:00<00:03, 99.97it/s]Upper bound on the fitting time: 5%|▌ | 21/400 [00:00<00:03, 101.49it/s]Upper bound on the fitting time: 8%|▊ | 33/400 [00:00<00:03, 105.63it/s]Upper bound on the fitting time: 11%|█ | 44/400 [00:00<00:03, 97.38it/s] Upper bound on the fitting time: 14%|█▎ | 54/400 [00:00<00:04, 82.69it/s]Upper bound on the fitting time: 16%|█▌ | 63/400 [00:00<00:04, 77.74it/s]Upper bound on the fitting time: 18%|█▊ | 71/400 [00:00<00:04, 74.41it/s]Upper bound on the fitting time: 20%|██ | 80/400 [00:00<00:04, 76.46it/s]Upper bound on the fitting time: 22%|██▏ | 89/400 [00:01<00:03, 78.56it/s]Upper bound on the fitting time: 24%|██▍ | 98/400 [00:01<00:03, 79.15it/s]Upper bound on the fitting time: 26%|██▋ | 106/400 [00:01<00:03, 76.30it/s]Upper bound on the fitting time: 29%|██▉ | 116/400 [00:01<00:03, 81.55it/s]Upper bound on the fitting time: 32%|███▏ | 127/400 [00:01<00:03, 88.07it/s]Upper bound on the fitting time: 34%|███▍ | 138/400 [00:01<00:02, 93.51it/s]Upper bound on the fitting time: 37%|███▋ | 149/400 [00:01<00:02, 98.02it/s]Upper bound on the fitting time: 40%|████ | 161/400 [00:01<00:02, 102.05it/s]Upper bound on the fitting time: 43%|████▎ | 172/400 [00:01<00:02, 104.19it/s]Upper bound on the fitting time: 46%|████▌ | 183/400 [00:02<00:02, 104.74it/s]Upper bound on the fitting time: 48%|████▊ | 194/400 [00:02<00:02, 100.59it/s]Upper bound on the fitting time: 51%|█████▏ | 205/400 [00:02<00:01, 102.95it/s]Upper bound on the fitting time: 54%|█████▍ | 216/400 [00:02<00:01, 104.87it/s]Upper bound on the fitting time: 57%|█████▋ | 227/400 [00:02<00:01, 105.23it/s]Upper bound on the fitting time: 60%|█████▉ | 238/400 [00:02<00:01, 105.95it/s]Upper bound on the fitting time: 62%|██████▏ | 249/400 [00:02<00:01, 105.31it/s]Upper bound on the fitting time: 65%|██████▌ | 260/400 [00:02<00:01, 106.07it/s]Upper bound on the fitting time: 68%|██████▊ | 271/400 [00:02<00:01, 107.13it/s]Upper bound on the fitting time: 70%|███████ | 282/400 [00:02<00:01, 107.97it/s]Upper bound on the fitting time: 73%|███████▎ | 293/400 [00:03<00:00, 108.04it/s]Upper bound on the fitting time: 76%|███████▌ | 304/400 [00:03<00:00, 108.30it/s]Upper bound on the fitting time: 79%|███████▉ | 315/400 [00:03<00:00, 106.89it/s]Upper bound on the fitting time: 82%|████████▏ | 326/400 [00:03<00:00, 107.19it/s]Upper bound on the fitting time: 84%|████████▍ | 337/400 [00:03<00:00, 107.65it/s]Upper bound on the fitting time: 87%|████████▋ | 349/400 [00:03<00:00, 108.91it/s]Upper bound on the fitting time: 90%|█████████ | 360/400 [00:03<00:00, 108.89it/s]Upper bound on the fitting time: 93%|█████████▎| 371/400 [00:03<00:00, 108.59it/s]Upper bound on the fitting time: 96%|█████████▌| 382/400 [00:03<00:00, 108.37it/s]Upper bound on the fitting time: 98%|█████████▊| 393/400 [00:03<00:00, 107.16it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 98.65it/s]
For more information on how to include covariates and offsets, see the dedicated tutorial.
⚠️ Note: A diagonal \(\Phi\) implies a diagonal covariance structure \(\Sigma\).
ar_diag.show()
The output indicates, for instance, that the variable nco_Soay_F
depends approximately 80% on its past values (see the graph on the bottom left).
To explore latent dynamics, extract the latent variables \(Z\):
= ar_diag.latent_variables z
Visualize the temporal structure via the viz_dims
method:
ar_diag.viz_dims(=["nco_Lacaune_M", "nco_Lacaune_F", "nco_Soay_F", "nco_Soay_M"],
column_names=data["chrom"]
colors )
Column names of the endogenous variables can be accessed via the column_names_endog
attribute.
Other structures for \(\Phi\) include the full
and spherical
autoregression.
Each variable depends on all others (default behavior):
= PlnAR.from_formula("endog ~ 1", data=data, ar_type="full").fit()
ar_full ar_full.show()
Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type full.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 5.2 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▏ | 9/400 [00:00<00:04, 81.34it/s]Upper bound on the fitting time: 4%|▍ | 18/400 [00:00<00:04, 81.24it/s]Upper bound on the fitting time: 7%|▋ | 27/400 [00:00<00:04, 81.79it/s]Upper bound on the fitting time: 9%|▉ | 36/400 [00:00<00:04, 82.34it/s]Upper bound on the fitting time: 11%|█▏ | 45/400 [00:00<00:04, 82.43it/s]Upper bound on the fitting time: 14%|█▎ | 54/400 [00:00<00:04, 82.34it/s]Upper bound on the fitting time: 16%|█▌ | 63/400 [00:00<00:04, 82.60it/s]Upper bound on the fitting time: 18%|█▊ | 72/400 [00:00<00:03, 82.11it/s]Upper bound on the fitting time: 20%|██ | 81/400 [00:00<00:03, 82.37it/s]Upper bound on the fitting time: 22%|██▎ | 90/400 [00:01<00:03, 82.70it/s]Upper bound on the fitting time: 25%|██▍ | 99/400 [00:01<00:03, 82.46it/s]Upper bound on the fitting time: 27%|██▋ | 108/400 [00:01<00:03, 82.78it/s]Upper bound on the fitting time: 29%|██▉ | 117/400 [00:01<00:03, 83.17it/s]Upper bound on the fitting time: 32%|███▏ | 126/400 [00:01<00:03, 82.32it/s]Upper bound on the fitting time: 34%|███▍ | 135/400 [00:01<00:03, 82.34it/s]Upper bound on the fitting time: 36%|███▌ | 144/400 [00:01<00:03, 82.13it/s]Upper bound on the fitting time: 38%|███▊ | 153/400 [00:01<00:02, 82.85it/s]Upper bound on the fitting time: 40%|████ | 162/400 [00:02<00:03, 74.17it/s]Upper bound on the fitting time: 42%|████▎ | 170/400 [00:02<00:03, 71.75it/s]Upper bound on the fitting time: 44%|████▍ | 178/400 [00:02<00:03, 70.51it/s]Upper bound on the fitting time: 46%|████▋ | 186/400 [00:02<00:03, 70.23it/s]Upper bound on the fitting time: 48%|████▊ | 194/400 [00:02<00:02, 69.68it/s]Upper bound on the fitting time: 50%|█████ | 202/400 [00:02<00:02, 69.62it/s]Upper bound on the fitting time: 52%|█████▏ | 209/400 [00:02<00:02, 69.04it/s]Upper bound on the fitting time: 55%|█████▍ | 218/400 [00:02<00:02, 73.26it/s]Upper bound on the fitting time: 57%|█████▋ | 227/400 [00:02<00:02, 76.65it/s]Upper bound on the fitting time: 59%|█████▉ | 236/400 [00:03<00:02, 78.28it/s]Upper bound on the fitting time: 61%|██████▏ | 245/400 [00:03<00:01, 79.66it/s]Upper bound on the fitting time: 63%|██████▎ | 253/400 [00:03<00:01, 79.06it/s]Upper bound on the fitting time: 65%|██████▌ | 261/400 [00:03<00:01, 78.85it/s]Upper bound on the fitting time: 68%|██████▊ | 270/400 [00:03<00:01, 80.49it/s]Upper bound on the fitting time: 70%|██████▉ | 279/400 [00:03<00:01, 81.41it/s]Upper bound on the fitting time: 72%|███████▏ | 288/400 [00:03<00:01, 82.43it/s]Upper bound on the fitting time: 74%|███████▍ | 297/400 [00:03<00:01, 82.60it/s]Upper bound on the fitting time: 76%|███████▋ | 306/400 [00:03<00:01, 82.21it/s]Upper bound on the fitting time: 79%|███████▉ | 315/400 [00:03<00:01, 82.45it/s]Upper bound on the fitting time: 81%|████████ | 324/400 [00:04<00:00, 83.07it/s]Upper bound on the fitting time: 83%|████████▎ | 333/400 [00:04<00:00, 82.42it/s]Upper bound on the fitting time: 86%|████████▌ | 342/400 [00:04<00:00, 76.84it/s]Upper bound on the fitting time: 88%|████████▊ | 350/400 [00:04<00:00, 77.63it/s]Upper bound on the fitting time: 90%|████████▉ | 358/400 [00:04<00:00, 77.51it/s]Upper bound on the fitting time: 92%|█████████▏| 367/400 [00:04<00:00, 79.09it/s]Upper bound on the fitting time: 94%|█████████▍| 376/400 [00:04<00:00, 79.82it/s]Upper bound on the fitting time: 96%|█████████▋| 385/400 [00:04<00:00, 80.52it/s]Upper bound on the fitting time: 98%|█████████▊| 394/400 [00:04<00:00, 80.77it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:05<00:00, 79.13it/s]
All variables share a common temporal trend:
= PlnAR.from_formula("endog ~ 1", data=data, ar_type="spherical").fit()
ar_spher ar_spher.show()
Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type spherical.
Intializing parameters ...
Initialization finished.
Maximum number of iterations (400) reached in 4.5 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 2%|▎ | 10/400 [00:00<00:04, 92.67it/s]Upper bound on the fitting time: 5%|▌ | 20/400 [00:00<00:04, 91.26it/s]Upper bound on the fitting time: 8%|▊ | 30/400 [00:00<00:04, 90.85it/s]Upper bound on the fitting time: 10%|█ | 40/400 [00:00<00:03, 90.67it/s]Upper bound on the fitting time: 12%|█▎ | 50/400 [00:00<00:03, 92.08it/s]Upper bound on the fitting time: 15%|█▌ | 60/400 [00:00<00:03, 92.79it/s]Upper bound on the fitting time: 18%|█▊ | 70/400 [00:00<00:03, 94.15it/s]Upper bound on the fitting time: 20%|██ | 80/400 [00:00<00:03, 92.96it/s]Upper bound on the fitting time: 22%|██▎ | 90/400 [00:00<00:03, 92.52it/s]Upper bound on the fitting time: 25%|██▌ | 100/400 [00:01<00:03, 91.47it/s]Upper bound on the fitting time: 28%|██▊ | 110/400 [00:01<00:03, 92.12it/s]Upper bound on the fitting time: 30%|███ | 120/400 [00:01<00:03, 92.59it/s]Upper bound on the fitting time: 32%|███▎ | 130/400 [00:01<00:02, 93.17it/s]Upper bound on the fitting time: 35%|███▌ | 140/400 [00:01<00:02, 91.92it/s]Upper bound on the fitting time: 38%|███▊ | 150/400 [00:01<00:02, 92.79it/s]Upper bound on the fitting time: 40%|████ | 160/400 [00:01<00:02, 93.33it/s]Upper bound on the fitting time: 42%|████▎ | 170/400 [00:01<00:02, 86.26it/s]Upper bound on the fitting time: 45%|████▍ | 179/400 [00:01<00:02, 86.18it/s]Upper bound on the fitting time: 47%|████▋ | 188/400 [00:02<00:02, 87.18it/s]Upper bound on the fitting time: 50%|████▉ | 198/400 [00:02<00:02, 88.24it/s]Upper bound on the fitting time: 52%|█████▏ | 208/400 [00:02<00:02, 89.15it/s]Upper bound on the fitting time: 55%|█████▍ | 218/400 [00:02<00:02, 90.88it/s]Upper bound on the fitting time: 57%|█████▋ | 228/400 [00:02<00:01, 92.46it/s]Upper bound on the fitting time: 60%|█████▉ | 238/400 [00:02<00:01, 93.85it/s]Upper bound on the fitting time: 62%|██████▏ | 248/400 [00:02<00:01, 93.18it/s]Upper bound on the fitting time: 64%|██████▍ | 258/400 [00:02<00:01, 92.51it/s]Upper bound on the fitting time: 67%|██████▋ | 268/400 [00:02<00:01, 91.99it/s]Upper bound on the fitting time: 70%|██████▉ | 278/400 [00:03<00:01, 93.19it/s]Upper bound on the fitting time: 72%|███████▏ | 288/400 [00:03<00:01, 94.57it/s]Upper bound on the fitting time: 74%|███████▍ | 298/400 [00:03<00:01, 95.09it/s]Upper bound on the fitting time: 77%|███████▋ | 308/400 [00:03<00:00, 93.75it/s]Upper bound on the fitting time: 80%|███████▉ | 318/400 [00:03<00:00, 94.42it/s]Upper bound on the fitting time: 82%|████████▏ | 328/400 [00:03<00:00, 94.66it/s]Upper bound on the fitting time: 84%|████████▍ | 338/400 [00:03<00:00, 95.52it/s]Upper bound on the fitting time: 87%|████████▋ | 348/400 [00:03<00:00, 96.16it/s]Upper bound on the fitting time: 90%|████████▉ | 358/400 [00:03<00:00, 95.93it/s]Upper bound on the fitting time: 92%|█████████▏| 368/400 [00:03<00:00, 94.36it/s]Upper bound on the fitting time: 94%|█████████▍| 378/400 [00:04<00:00, 93.85it/s]Upper bound on the fitting time: 97%|█████████▋| 388/400 [00:04<00:00, 91.97it/s]Upper bound on the fitting time: 100%|█████████▉| 398/400 [00:04<00:00, 93.35it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:04<00:00, 92.34it/s]
One can check that the spherical, diagonal and autoregressive models gives similar outcomes.
Access the estimated autoregression parameters:
print("Spherical autoregressive parameter:", ar_spher.ar_coef.shape)
print("Diagonal autoregressive parameter:", ar_diag.ar_coef.shape)
print("Full autoregressive parameter:", ar_full.ar_coef.shape)
Spherical autoregressive parameter: torch.Size([1])
Diagonal autoregressive parameter: torch.Size([4])
Full autoregressive parameter: torch.Size([4, 4])
This concludes the tutorial to temporal modeling with the PlnAR
model.