from pyPLNmodels import load_crossover
data = load_crossover(chromosome_numbers=[1, 5, 8])
print('Data: ', data.keys())Returning crossover dataset of size (474, 4)
Data: dict_keys(['endog', 'chrom', 'chrom_1hot', 'offsets', 'location'])
The Poisson lognormal (PLN) model can be extended to capture temporal (or similarly, 1D spatial dependencies) by structuring the latent space. This leads to the PlnAR (PLN autoregressive, documentation) model, where an autocorrelation matrix \(\Phi\) encodes how latent variables evolve across time (or space).
Depending on the desired complexity, several types of autocorrelation structures for \(\Phi\) can be used:
full: Each variable depends on all others (\(\Phi\) is a full matrix).diagonal: Each variable follows its own independent temporal dynamics (\(\Phi\) is diagonal).spherical: All variables share a common temporal pattern (\(\Phi\) is a scalar multiple of the identity).The mathematical model is given by:
\[ \begin{align} Z_{i} & \sim \mathcal{N}(X_i^{\top} B, \Sigma)\\ Z_{i}|Z_{i-1} & \sim \Phi Z_{i-1} + \mathcal{N}(\mu^{\epsilon}_i, \Sigma^{\epsilon}) \\ Y_{ij} \mid Z_{ij} &\sim \mathcal{P}(\exp(o_{ij} + Z_{ij})), \end{align} \]
with:
The matrix \(\Phi\) is constrained such that \(\Sigma^{\epsilon}\) remains positive definite.
⚠️ Note: Setting \(\Phi\) to diagonal enforces a diagonal covariance structure for the correlation matrix \(\Sigma\).
The PlnAR package provides tools to analyze dependencies between consecutive time points or spatial sites by estimating the autoregression matrix \(\Phi\).
We illustrate the package using the crossover dataset (Petit et al. (2017)), which contains recombination patterns in sheep over the genome. This dataset captures male meiotic recombination maps for the Lacaune breed, combining historical data from Lacaune and Soay sheep.
from pyPLNmodels import load_crossover
data = load_crossover(chromosome_numbers=[1, 5, 8])
print('Data: ', data.keys())Returning crossover dataset of size (474, 4)
Data: dict_keys(['endog', 'chrom', 'chrom_1hot', 'offsets', 'location'])
print(data["endog"].head()) nco_Lacaune_M nco_Lacaune_F nco_Soay_F nco_Soay_M
detailed_location
chrom1Loc:0-1 97 0 9 52
chrom1Loc:1-2 279 2 15 156
chrom1Loc:2-3 205 6 21 112
chrom1Loc:3-4 249 2 31 124
chrom1Loc:4-5 446 11 69 180
The dataset also includes chromosome identifiers:
print(data["chrom"].unique())['1' '5' '8']
We start by fitting a model with a diagonal autoregressive structure:
from pyPLNmodels import PlnAR
ar_diag = PlnAR.from_formula("endog ~ 1", data=data, ar_type="diagonal").fit()Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type diagonal.
Intializing parameters ...
Initialization finished.
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 18%|█▊ | 72/400 [00:00<00:00, 718.35it/s]Upper bound on the fitting time: 37%|███▋ | 149/400 [00:00<00:00, 743.45it/s]Upper bound on the fitting time: 56%|█████▌ | 224/400 [00:00<00:00, 744.79it/s]Upper bound on the fitting time: 75%|███████▍ | 299/400 [00:00<00:00, 739.19it/s]Upper bound on the fitting time: 94%|█████████▎| 374/400 [00:00<00:00, 742.32it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 740.01it/s]
Maximum number of iterations (400) reached in 1.4 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
For more information on how to include covariates and offsets, see the dedicated tutorial.
⚠️ Note: A diagonal \(\Phi\) implies a diagonal covariance structure \(\Sigma\).
ar_diag.show()The output indicates, for instance, that the variable nco_Soay_F depends approximately 80% on its past values (see the graph on the bottom left).
To explore latent dynamics, extract the latent variables \(Z\):
z = ar_diag.latent_variablesVisualize the temporal structure via the viz_dims method:
ar_diag.viz_dims(
column_names=["nco_Lacaune_M", "nco_Lacaune_F", "nco_Soay_F", "nco_Soay_M"],
colors=data["chrom"]
)Column names of the endogenous variables can be accessed via the column_names_endog attribute.
Other structures for \(\Phi\) include the full and spherical autoregression.
Each variable depends on all others (default behavior):
ar_full = PlnAR.from_formula("endog ~ 1", data=data, ar_type="full").fit()
ar_full.show()Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type full.
Intializing parameters ...
Initialization finished.
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 12%|█▏ | 49/400 [00:00<00:00, 487.55it/s]Upper bound on the fitting time: 24%|██▍ | 98/400 [00:00<00:00, 481.37it/s]Upper bound on the fitting time: 37%|███▋ | 147/400 [00:00<00:00, 475.81it/s]Upper bound on the fitting time: 49%|████▉ | 195/400 [00:00<00:00, 473.89it/s]Upper bound on the fitting time: 61%|██████ | 243/400 [00:00<00:00, 469.13it/s]Upper bound on the fitting time: 72%|███████▎ | 290/400 [00:00<00:00, 463.60it/s]Upper bound on the fitting time: 84%|████████▍ | 337/400 [00:00<00:00, 462.21it/s]Upper bound on the fitting time: 96%|█████████▋| 386/400 [00:00<00:00, 469.28it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 469.89it/s]
Maximum number of iterations (400) reached in 0.9 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
All variables share a common temporal trend:
ar_spher = PlnAR.from_formula("endog ~ 1", data=data, ar_type="spherical").fit()
ar_spher.show()Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type spherical.
Intializing parameters ...
Initialization finished.
Upper bound on the fitting time: 0%| | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time: 14%|█▎ | 54/400 [00:00<00:00, 534.81it/s]Upper bound on the fitting time: 27%|██▋ | 108/400 [00:00<00:00, 513.65it/s]Upper bound on the fitting time: 40%|████ | 160/400 [00:00<00:00, 506.06it/s]Upper bound on the fitting time: 53%|█████▎ | 213/400 [00:00<00:00, 515.20it/s]Upper bound on the fitting time: 67%|██████▋ | 269/400 [00:00<00:00, 529.22it/s]Upper bound on the fitting time: 81%|████████▏ | 325/400 [00:00<00:00, 538.58it/s]Upper bound on the fitting time: 95%|█████████▌| 380/400 [00:00<00:00, 541.21it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 531.33it/s]
Maximum number of iterations (400) reached in 0.8 seconds.
Last criterion = 5.34e-06 . Required tolerance = 1e-06
One can check that the spherical, diagonal and autoregressive models gives similar outcomes.
Access the estimated autoregression parameters:
print("Spherical autoregressive parameter:", ar_spher.ar_coef.shape)
print("Diagonal autoregressive parameter:", ar_diag.ar_coef.shape)
print("Full autoregressive parameter:", ar_full.ar_coef.shape)Spherical autoregressive parameter: torch.Size([1])
Diagonal autoregressive parameter: torch.Size([4])
Full autoregressive parameter: torch.Size([4, 4])
This concludes the tutorial to temporal modeling with the PlnAR model.