Time-series count data ⏱️

Published

March 27, 2025

1 Introduction

The Poisson lognormal (PLN) model can be extended to capture temporal (or similarly, 1D spatial dependencies) by structuring the latent space. This leads to the PlnAR (PLN autoregressive, documentation) model, where an autocorrelation matrix \(\Phi\) encodes how latent variables evolve across time (or space).

Depending on the desired complexity, several types of autocorrelation structures for \(\Phi\) can be used:

full: Each variable depends on all others (\(\Phi\) is a full matrix).
diagonal: Each variable follows its own independent temporal dynamics (\(\Phi\) is diagonal).
spherical: All variables share a common temporal pattern (\(\Phi\) is a scalar multiple of the identity).

1.1 Statistical background

The mathematical model is given by:

\[ \begin{align} Z_{i} & \sim \mathcal{N}(X_i^{\top} B, \Sigma)\\ Z_{i}|Z_{i-1} & \sim \Phi Z_{i-1} + \mathcal{N}(\mu^{\epsilon}_i, \Sigma^{\epsilon}) \\ Y_{ij} \mid Z_{ij} &\sim \mathcal{P}(\exp(o_{ij} + Z_{ij})), \end{align} \]

with:

\(\mu^{\epsilon}_i = \mu_i - \Phi \mu_{i-1}\)
\(\Sigma^{\epsilon} = \Sigma - \Phi \Sigma \Phi\)
\(\mu_i = X_i^{\top} B\)
\(\Phi \in \mathcal{S}^{p}_+\)

The matrix \(\Phi\) is constrained such that \(\Sigma^{\epsilon}\) remains positive definite.

⚠️ Note: Setting \(\Phi\) to diagonal enforces a diagonal covariance structure for the correlation matrix \(\Sigma\).

1.2 Purpose

The PlnAR package provides tools to analyze dependencies between consecutive time points or spatial sites by estimating the autoregression matrix \(\Phi\).

2 Data

We illustrate the package using the crossover dataset (Petit et al. (2017)), which contains recombination patterns in sheep over the genome. This dataset captures male meiotic recombination maps for the Lacaune breed, combining historical data from Lacaune and Soay sheep.

from pyPLNmodels import load_crossover
data = load_crossover(chromosome_numbers=[1, 5, 8])
print('Data: ', data.keys())

Returning crossover dataset of size (474, 4)
Data:  dict_keys(['endog', 'chrom', 'chrom_1hot', 'offsets', 'location'])

2.1 Endogenous variables

print(data["endog"].head())

                   nco_Lacaune_M  nco_Lacaune_F  nco_Soay_F  nco_Soay_M
detailed_location
chrom1Loc:0-1                 97              0           9          52
chrom1Loc:1-2                279              2          15         156
chrom1Loc:2-3                205              6          21         112
chrom1Loc:3-4                249              2          31         124
chrom1Loc:4-5                446             11          69         180

The dataset also includes chromosome identifiers:

print(data["chrom"].unique())

['1' '5' '8']

3 Model Initialization and Fitting

We start by fitting a model with a diagonal autoregressive structure:

from pyPLNmodels import PlnAR
ar_diag = PlnAR.from_formula("endog ~ 1", data=data, ar_type="diagonal").fit()

Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type diagonal.
Intializing parameters ...
Initialization finished.

Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:  18%|█▊        | 72/400 [00:00<00:00, 718.35it/s]Upper bound on the fitting time:  37%|███▋      | 149/400 [00:00<00:00, 743.45it/s]Upper bound on the fitting time:  56%|█████▌    | 224/400 [00:00<00:00, 744.79it/s]Upper bound on the fitting time:  75%|███████▍  | 299/400 [00:00<00:00, 739.19it/s]Upper bound on the fitting time:  94%|█████████▎| 374/400 [00:00<00:00, 742.32it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 740.01it/s]

Maximum number of iterations (400)  reached in 1.4 seconds.
Last  criterion = 5.34e-06 . Required tolerance = 1e-06

For more information on how to include covariates and offsets, see the dedicated tutorial.

⚠️ Note: A diagonal \(\Phi\) implies a diagonal covariance structure \(\Sigma\).

4 Temporal structure

ar_diag.show()

The output indicates, for instance, that the variable nco_Soay_F depends approximately 80% on its past values (see the graph on the bottom left).

5 Latent variables and visualization

To explore latent dynamics, extract the latent variables \(Z\):

z = ar_diag.latent_variables

Visualize the temporal structure via the viz_dims method:

ar_diag.viz_dims(
    column_names=["nco_Lacaune_M", "nco_Lacaune_F", "nco_Soay_F", "nco_Soay_M"],
    colors=data["chrom"]
)

Column names of the endogenous variables can be accessed via the column_names_endog attribute.

6 Full and Spherical Autoregression

Other structures for \(\Phi\) include the full and spherical autoregression.

6.1 Full autoregression

Each variable depends on all others (default behavior):

ar_full = PlnAR.from_formula("endog ~ 1", data=data, ar_type="full").fit()
ar_full.show()

Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type full.
Intializing parameters ...
Initialization finished.

Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:  12%|█▏        | 49/400 [00:00<00:00, 487.55it/s]Upper bound on the fitting time:  24%|██▍       | 98/400 [00:00<00:00, 481.37it/s]Upper bound on the fitting time:  37%|███▋      | 147/400 [00:00<00:00, 475.81it/s]Upper bound on the fitting time:  49%|████▉     | 195/400 [00:00<00:00, 473.89it/s]Upper bound on the fitting time:  61%|██████    | 243/400 [00:00<00:00, 469.13it/s]Upper bound on the fitting time:  72%|███████▎  | 290/400 [00:00<00:00, 463.60it/s]Upper bound on the fitting time:  84%|████████▍ | 337/400 [00:00<00:00, 462.21it/s]Upper bound on the fitting time:  96%|█████████▋| 386/400 [00:00<00:00, 469.28it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 469.89it/s]

Maximum number of iterations (400)  reached in 0.9 seconds.
Last  criterion = 5.34e-06 . Required tolerance = 1e-06

6.2 Spherical autoregression

All variables share a common temporal trend:

ar_spher = PlnAR.from_formula("endog ~ 1", data=data, ar_type="spherical").fit()
ar_spher.show()

Taking the offsets from the data given.
Fitting a PlnAR model with autoregressive type spherical.
Intializing parameters ...
Initialization finished.

Upper bound on the fitting time:   0%|          | 0/400 [00:00<?, ?it/s]Upper bound on the fitting time:  14%|█▎        | 54/400 [00:00<00:00, 534.81it/s]Upper bound on the fitting time:  27%|██▋       | 108/400 [00:00<00:00, 513.65it/s]Upper bound on the fitting time:  40%|████      | 160/400 [00:00<00:00, 506.06it/s]Upper bound on the fitting time:  53%|█████▎    | 213/400 [00:00<00:00, 515.20it/s]Upper bound on the fitting time:  67%|██████▋   | 269/400 [00:00<00:00, 529.22it/s]Upper bound on the fitting time:  81%|████████▏ | 325/400 [00:00<00:00, 538.58it/s]Upper bound on the fitting time:  95%|█████████▌| 380/400 [00:00<00:00, 541.21it/s]Upper bound on the fitting time: 100%|██████████| 400/400 [00:00<00:00, 531.33it/s]

Maximum number of iterations (400)  reached in 0.8 seconds.
Last  criterion = 5.34e-06 . Required tolerance = 1e-06

One can check that the spherical, diagonal and autoregressive models gives similar outcomes.

7 Autoregression Coefficients

Access the estimated autoregression parameters:

print("Spherical autoregressive parameter:", ar_spher.ar_coef.shape)
print("Diagonal autoregressive parameter:", ar_diag.ar_coef.shape)
print("Full autoregressive parameter:", ar_full.ar_coef.shape)

Spherical autoregressive parameter: torch.Size([1])
Diagonal autoregressive parameter: torch.Size([4])
Full autoregressive parameter: torch.Size([4, 4])

This concludes the tutorial to temporal modeling with the PlnAR model.

References

Batardière, Bastien, Joon Kwon, and Julien Chiquet. 2024. “pyPLNmodels: A Python Package to Analyze Multivariate High-Dimensional Count Data.” Journal of Open Source Software.

Petit, Morgane, Jean-Michel Astruc, Julien Sarry, Laurence Drouilhet, Stéphane Fabre, Carole Moreno, and Bertrand Servin. 2017. “Variation in Recombination Rate and Its Genetic Determinism in Sheep Populations.” Genetics 207 (2): 767–84. https://doi.org/10.1534/genetics.117.300123.