ZIPln
ZIPln Documentation
- class pyPLNmodels.ZIPln(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True)[source]
Zero-Inflated Pln (ZIPln) class. Like a Pln but adds zero-inflation. Fitting such a model is slower than fitting a Pln. For more details, see Batardière, Chiquet, Gindraud, Mariadassou (2024) “Zero-inflation in the Multivariate Poisson Lognormal Family.”
Examples
>>> from pyPLNmodels import ZIPln, Pln, load_microcosm >>> data = load_microcosm() # microcosm dataset is highly zero-inflated (96% of zeros) >>> zi = ZIPln.from_formula("endog ~ 1 + site", data) >>> zi.fit() >>> zi.viz(colors = data["site"]) >>> # Here Pln is not appropriate: >>> pln = Pln.from_formula("endog ~ 1 + site", data) >>> pln.fit() >>> pln.viz(colors = data["site"]) >>> # Can also give different covariates: >>> zi_diff = ZIPln.from_formula("endog ~ 1 + site | 1 + time", data) >>> zi.fit() >>> zi.viz(colors = data["site"]) >>> ## Or take all the covariates >>> zi_all = ZIPln.from_formula("endog ~ 1 + site*time | 1 + site*time", data) >>> zi_all.fit()
- Parameters:
endog (Tensor | ndarray | DataFrame | None)
exog (Tensor | ndarray | DataFrame | Series | None)
exog_inflation (Tensor | ndarray | DataFrame | Series | None)
offsets (Tensor | ndarray | DataFrame | None)
compute_offsets_method ({'logsum', 'zero'})
add_const (bool)
add_const_inflation (bool)
- __init__(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True)[source]
Initializes the ZIPln class, which is a Pln model with zero-inflation.
- Parameters:
endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame, pd.Series], optional(keyword-only)) – The covariate data. Defaults to None.
exog_inflation (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data for the inflation part. Defaults to None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.
compute_offsets_method (str, optional(keyword-only)) –
- Method to compute offsets if not provided. Options are:
”zero” that will set the offsets to zero.
”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if offsets is not None.
add_const (bool, optional(keyword-only)) – Whether to add a column of ones in the exog. Defaults to True.
add_const_inflation (bool, optional(keyword-only)) – Whether to add a column of ones in the exog_inflation. Defaults to True.
- Return type:
A ZIPln object
See also
Examples
- classmethod from_formula(formula, data, *, compute_offsets_method='zero')[source]
Create an instance from a formula and data.
- Parameters:
formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.
compute_offsets_method (str, optional(keyword-only)) –
- Method to compute offsets if not provided. Options are:
”zero” that will set the offsets to zero.
”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if data[“offsets”] is not None.
- Return type:
ZIPln
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> # same covariates for the zero inflation and the gaussian component >>> zi_same = ZIPln.from_formula("endog ~ 1 + site", data = data) >>> # different covariates >>> zi_different = ZIPln.from_formula("endog ~ 1 + site | 1 + time", data = data)
- fit(*, maxiter=400, lr=0.01, tol=1e-06, verbose=False)[source]
Fit the model using variational inference. The lower the tol (tolerance), the more accurate the model.
- Parameters:
maxiter (int, optional) – The maximum number of iterations to be done. Defaults to 400.
lr (float, optional(keyword-only)) – The learning rate. Defaults to 0.01.
tol (float, optional(keyword-only)) – The tolerance for convergence. Defaults to 1e-6.
verbose (bool, optional(keyword-only)) – Whether to print training progress. Defaults to False.
- Raises:
ValueError – If maxiter is not an int.
- Return type:
ZIPln object
Examples
>>> from pyPLNmodels import ZIPln, load_scrna >>> data = load_scrna() >>> zi = ZIPln.from_formula("endog ~ 1", data) >>> zi.fit() >>> print(zi)
>>> from pyPLNmodels import ZIPln, load_scrna >>> data = load_scrna() >>> zi = ZIPln.from_formula("endog ~ 1 | 1 + labels", data) >>> zi.fit(maxiter = 500, verbose = True) >>> print(zi)
- property latent_prob
The probabilities that the zero inflation variable is 0.
- property list_of_parameters_needing_gradient
The list of all the parameters of the model that needs to be updated at each iteration.
- property dict_model_parameters
The parameters of the model.
- property dict_latent_parameters
The latent parameters of the model.
- property latent_variables
The (conditional) mean of the latent variables. This is the best approximation of latent variables. This variable is supposed to be more meaningful than the counts (endog).
- property latent_prob_variables
The (conditional) probabilities of the latent probability variables.
- transform(remove_exog_effect=True)[source]
Returns the latent variables. Can be seen as a normalization of the counts given.
- Parameters:
remove_exog_effect (bool (optional)) – Whether to remove or not the mean induced by the exogenous variables. Default is False.
- Returns:
The transformed endogenous variables (latent variables of the model).
- Return type:
torch.Tensor
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1", data = data) >>> zi.fit() >>> transformed_endog = zi.transform() >>> print(transformed_endog.shape)
- property number_of_parameters
Returns the number of parameters of the model.
- pca_pairplot(n_components=3, colors=None)[source]
Generates a scatter matrix plot based on Principal Component Analysis (PCA) on the latent variables.
- Parameters:
(int (n_components) – Defaults to 3. It Cannot be greater than 6.
optional) (The number of components to consider for plotting.) – Defaults to 3. It Cannot be greater than 6.
(np.ndarray) (colors) – sample in the endog property of the object. Defaults to None.
n_components (bool)
- Raises:
ValueError – If the number of components requested is greater: than the number of variables in the dataset.
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1", data = data) >>> zi.fit() >>> zi.pca_pairplot(n_components = 5) >>> zi.pca_pairplot(n_components = 5, colors = data["time"])
- pca_pairplot_prob(n_components=3, colors=None)[source]
Generates a scatter matrix plot based on Principal Component Analysis (PCA) on the latent variables associated with the zero inflation (i.e. the Bernoulli variables). This may not be very informative.
- Parameters:
n_components (int (optional)) – The number of components to consider for plotting. Defaults to 3. Cannot be greater than 6.
colors (np.ndarray (optional)) – An array with one label for each sample in the endog property of the object. Defaults to None.
See also
- plot_correlation_circle(column_names, column_index=None, title='')[source]
Visualizes variables using PCA and plots a correlation circle. If the endog has been given as a pd.DataFrame, the column_names have been stored and may be indicated with the column_names argument. Else, one should provide the indices of variables.
- Parameters:
column_names (List[str]) – A list of variable names to visualize. If column_index is None, the variables plotted are the ones in column_names. If column_index is not None, this only serves as a legend. Check the attribute column_names_endog.
column_index (Optional[List[int]], optional) – A list of indices corresponding to the variables that should be plotted. If None, the indices are determined based on column_names_endog given the column_names, by default None. If not None, should have the same length as column_names.
title (str) – An additional title for the plot.
- Raises:
ValueError – If column_index is None and column_names_endog is not set, that has been set if the model has been initialized with a pd.DataFrame as endog.
ValueError – If the length of column_index is different from the length of column_names.
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1", data = data) >>> zi.fit() >>> zi.plot_correlation_circle(column_names = ["ASV_315", "ASV_749"]) >>> zi.plot_correlation_circle(column_names = ["A", "B"], column_index = [0,2])
- biplot(column_names, *, column_index=None, colors=None, title='')[source]
Visualizes variables using the correlation circle along with the pca transformed samples. If the endog has been given as a pd.DataFrame, the column_names have been stored and may be indicated with the column_names argument. Else, one should provide the indices of variables.
- Parameters:
column_names (List[str]) – A list of variable names to visualize. If column_index is None, the variables plotted are the ones in column_names. If column_index is not None, this only serves as a legend. Check the attribute column_names_endog.
column_index (Optional[List[int]], optional keyword-only) – A list of indices corresponding to the variables that should be plotted. If None, the indices are determined based on column_names_endog given the column_names, by default None. If not None, should have the same length as column_names.
title (str optional, keyword-only) – An additional title for the plot.
colors (list, optional, keyword-only) – The labels to color the samples, of size n_samples.
- Raises:
ValueError – If column_index is None and column_names_endog is not set, that has been set if the model has been initialized with a pd.DataFrame as endog.
ValueError – If the length of column_index is different from the length of column_names.
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1", data = data) >>> zi.fit() >>> zi.biplot(column_names = ["ASV_315", "ASV_749"]) >>> zi.biplot(column_names = ["A", "B"], column_index = [0,2], colors = data["time"])
- property nb_cov_inflation
Number of covariates associated with the zero inflation.
- property exog_inflation
Property representing the exogenous variables (covariates) associated with the zero inflation.
- Returns:
The exogenous variables of the zero inflation variable.
- Return type:
torch.Tensor
- predict_prob_inflation(array_like=None)
- property coef_inflation
Property representing the regression coefficients associated with the zero-inflation component, of size (nb_cov_inflation, dim).
- Returns:
The coefficients.
- Return type:
torch.Tensor
- viz(*, ax=None, colors=None, show_cov=False, remove_exog_effect=False)[source]
Visualize the latent variables. One can remove the effect of exogenous variables with the remove_exog_effect boolean variable.
- Parameters:
ax (matplotlib.axes.Axes, optional) – The axes on which to plot, by default None.
colors (list, optional) – The labels to color the samples, of size n_samples.
show_cov (bool, optional) – Whether to show covariances, by default False.
remove_exog_effect (bool, optional) – Whether to remove or not the effect of exogenous variables. Default to False.
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1 + site", data = data) >>> zi.fit() >>> zi.viz() >>> zi.viz(colors = data["site"]) >>> zi.viz(show_cov = True) >>> zi.viz(remove_exog_effect = True, colors = data["site"])
- viz_prob(*, ax=None, colors=None)[source]
Visualize the latent variables.
- Parameters:
ax (matplotlib.axes.Axes, optional) – The axes on which to plot, by default None.
colors (list, optional) – The labels to color the samples, of size n_samples.
Examples
>>> from pyPLNmodels import ZIPln, load_microcosm >>> data = load_microcosm() >>> zi = ZIPln.from_formula("endog ~ 1 + site", data = data) >>> zi.fit() >>> zi.viz_prob() >>> zi.viz_prob(colors = data["site"])
- plot_expected_vs_true(ax=None, colors=None)[source]
Plot the predicted value of the endog against the endog.
- Parameters:
ax (Optional[matplotlib.axes.Axes], optional) – The matplotlib axis to use. If None, the current axis is used, by default None.
colors (Optional[Any], optional) – The labels to color the samples, of size n_samples. By default None (no colors).
- Returns:
The matplotlib axis.
- Return type:
matplotlib.axes.Axes
See also
pyPLNmodels.Pln.pca_pairplot()
,pyPLNmodels.PlnPCA.pca_pairplot()
,pyPLNmodels.Pln.biplot()
,pyPLNmodels.PlnPCA.biplot()
Examples
- property latent_positions
The (conditional) mean of the latent variables with the effect of covariates removed.
- property AIC
Akaike Information Criterion (AIC).
- property BIC
Bayesian Information Criterion (BIC) of the model.
- property ICL
Integrated Completed Likelihood criterion.
- property coef
Property representing the regression coefficients of size (nb_cov, dim). If no exogenous (exog) is available, returns None.
- Returns:
The coefficients or None if no coefficients are given in the model.
- Return type:
torch.Tensor or None
- property covariance
Property representing the covariance of the model.
- Returns:
The covariance.
- Return type:
torch.Tensor
- property dim
Number of dimensions (i.e. variables) of the dataset.
- property elbo
Returns the last elbo computed.
- property endog
Property representing the endogenous variables (counts).
- Returns:
The endogenous variables.
- Return type:
torch.Tensor
- property entropy
Entropy of the latent variables.
- property exog
Property representing the exogenous variables (covariates).
- Returns:
The exogenous variables or None if no covariates are given in the model.
- Return type:
torch.Tensor or None
- property latent_mean
Property representing the latent mean conditionally on the observed counts, i.e. the conditional mean of the latent variable of each sample.
- Returns:
The latent mean.
- Return type:
torch.Tensor
- property latent_parameters
Alias for dict_latent_parameters.
- property latent_sqrt_variance
Property representing the latent square root variance conditionally on the observed counts, i.e. the square root variance of the latent variable of each sample.
- Returns:
The square root of the latent variance.
- Return type:
torch.Tensor
- property latent_variance
Property representing the latent variance conditionally on the observed counts, i.e. the conditional variance of the latent variable of each sample.
- property loglike
Alias for elbo.
- property marginal_mean
The marginal mean of the model, i.e. the mean of the gaussian latent variable.
- property model_parameters
Alias for dict_model_parameters.
- property n_samples
Number of samples in the dataset.
- property nb_cov: int
The number of exogenous variables.
- property offsets
Property representing the offsets.
- Returns:
The offsets.
- Return type:
torch.Tensor
- property optim_details
Property representing the optimization details.
- Returns:
The dictionary of optimization details.
- Return type:
dict
- property precision
Property representing the precision of the model, that is the inverse covariance matrix.
- Returns:
The precision matrix of size (dim, dim).
- Return type:
torch.Tensor
- predict(array_like=None)
- projected_latent_variables(rank=2, remove_exog_effect=False)
Perform PCA on latent variables and return the projected variables.
- Parameters:
rank (int, optional) – The number of principal components to compute, by default 2.
remove_exog_effect (bool, optional) – Whether to remove or not the effect of exogenous variables. Default to False.
- Returns:
The projected variables.
- Return type:
numpy.ndarray
- remove_zero_columns = True
- show(savefig=False, name_file='', figsize=(10, 10))
Display the model parameters, norm evolution of the parameters and the criterion.
- Parameters:
savefig (bool, optional) – If True, the figure will be saved to a file. Default is False.
name_file (str, optional) – The name of the file to save the figure. Only used if savefig is True. Default is an empty string.
figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,10)
- sigma()
Covariance of the model.
- optim: torch.optim.Optimizer
List of methods and attributes
Public Data Attributes:
|
The probabilities that the zero inflation variable is 0. |
|
The list of all the parameters of the model that needs to be updated at each iteration. |
|
The parameters of the model. |
|
The latent parameters of the model. |
|
The (conditional) mean of the latent variables. |
|
The (conditional) probabilities of the latent probability variables. |
|
Returns the number of parameters of the model. |
|
Number of covariates associated with the zero inflation. |
|
Property representing the exogenous variables (covariates) associated with the zero inflation. |
|
Property representing the regression coefficients associated with the zero-inflation component, of size (nb_cov_inflation, dim). |
|
The (conditional) mean of the latent variables with the effect of covariates removed. |
|
Entropy of the latent variables. |
|
Inherited from BaseModel
|
|
|
The list of all the parameters of the model that needs to be updated at each iteration. |
|
The parameters of the model. |
|
Alias for dict_model_parameters. |
|
The latent parameters of the model. |
|
Alias for dict_latent_parameters. |
|
Number of samples in the dataset. |
|
Number of dimensions (i.e. variables) of the dataset. |
|
Property representing the endogenous variables (counts). |
|
Property representing the exogenous variables (covariates). |
|
The number of exogenous variables. |
|
Property representing the offsets. |
|
Property representing the latent mean conditionally on the observed counts, i.e. the conditional mean of the latent variable of each sample. |
|
Property representing the latent variance conditionally on the observed counts, i.e. the conditional variance of the latent variable of each sample. |
|
Property representing the latent square root variance conditionally on the observed counts, i.e. the square root variance of the latent variable of each sample. |
|
Property representing the regression coefficients of size (nb_cov, dim). |
|
Property representing the covariance of the model. |
|
Property representing the precision of the model, that is the inverse covariance matrix. |
|
The marginal mean of the model, i.e. the mean of the gaussian latent variable. |
|
The (conditional) mean of the latent variables. |
|
The (conditional) mean of the latent variables with the effect of covariates removed. |
|
Returns the last elbo computed. |
|
Alias for elbo. |
|
Bayesian Information Criterion (BIC) of the model. |
|
Integrated Completed Likelihood criterion. |
|
Akaike Information Criterion (AIC). |
|
Returns the number of parameters of the model. |
|
Entropy of the latent variables. |
|
Property representing the optimization details. |
|
Public Methods:
|
Initializes the ZIPln class, which is a Pln model with zero-inflation. |
|
Create an instance from a formula and data. |
|
Fit the model using variational inference. |
|
Compute the elbo of the current parameters. |
|
Returns the latent variables. |
|
Generates a scatter matrix plot based on Principal Component Analysis (PCA) on the latent variables. |
|
Generates a scatter matrix plot based on Principal Component Analysis (PCA) on the latent variables associated with the zero inflation (i.e. the Bernoulli variables). |
|
Visualizes variables using PCA and plots a correlation circle. |
|
Visualizes variables using the correlation circle along with the pca transformed samples. |
|
|
|
Visualize the latent variables. |
|
Visualize the latent variables. |
|
Plot the predicted value of the endog against the endog. |
Inherited from BaseModel
|
Initializes the model class. |
|
Create an instance from a formula and data. |
|
Fit the model using variational inference. |
|
Display the model parameters, norm evolution of the parameters and the criterion. |
|
Visualizes variables using PCA and plots a correlation circle. |
|
Visualizes variables using the correlation circle along with the pca transformed samples. |
|
Compute the elbo of the current parameters. |
|
Perform PCA on latent variables and return the projected variables. |
|
Returns the latent variables. |
|
Visualize the latent variables. |
|
Generate the string representation of the model. |
|
|
|
Covariance of the model. |
|
Generates a scatter matrix plot based on Principal Component Analysis (PCA) on the latent variables. |
|
Plot the predicted value of the endog against the endog. |
Private Data Attributes:
|
|
|
|
|
|
|
Abstract method the predict the endog variables. |
|
The attributes that are specific to this model. |
|
The methods that are specific to this model. |
|
Description of the model. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inherited from BaseModel
|
|
|
Description of the model. |
|
|
|
|
|
|
|
|
|
|
|
|
|
The attributes that are specific to this model. |
|
The methods that are specific to this model. |
|
Property representing the dictionary for printing. |
|
Abstract method the predict the endog variables. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Inherited from ABC
|
Private Methods:
|
Initialization of model parameters. |
|
Initialization of latent parameters. |
|
Computes the covariance when the latent variables are embedded in a lower dimensional space (often 2) with sklearn_components. |
|
Project some parameters such as probabilities. |
Inherited from BaseModel
|
|
|
Compute the elbo and do a gradient step. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Initialization of model parameters. |
|
Initialization of latent parameters. |
|
Move parameters to the GPU device if present. |
|
|
|
|
|
Project some parameters such as probabilities. |
|
Update some parameters. |
|
|
|
Print the training statistics. |
|
Perform PCA on latent variables and return the projected variables along with their covariances in the two dimensional space. |
|
Computes the covariance when the latent variables are embedded in a lower dimensional space (often 2) with sklearn_components. |