ZIPlnPCACollection

ZIPlnPCACollection Documentation

class pyPLNmodels.ZIPlnPCACollection(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True, ranks=(3, 5), use_closed_form_prob=True)[source]

A collection of ZIPlnPCA models, each with a different number of components. The number of components can also be referred to as the number of PCs, or the rank of the covariance matrix. For more details, see B. Bricout ?

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection.from_formula("endog ~ 1", data = data, ranks = [5,8, 12])
>>> zipcas.fit()
>>> print(zipcas)
>>> zipcas.show()
>>> print(zipcas.best_model())
>>> print(zipcas[5])
Parameters:
  • endog (Tensor | ndarray | DataFrame | None)

  • exog (Tensor | ndarray | DataFrame | Series | None)

  • exog_inflation (Tensor | ndarray | DataFrame | Series | None)

  • offsets (Tensor | ndarray | DataFrame | None)

  • compute_offsets_method ({'logsum', 'zero'})

  • add_const (bool)

  • add_const_inflation (bool)

  • ranks (Iterable[int] | None)

  • use_closed_form_prob (bool)

PlnModel

alias of ZIPlnPCA

__init__(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True, ranks=(3, 5), use_closed_form_prob=True)[source]

Initializes the collection.

Parameters:
  • endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.

  • exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.

  • offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.

  • compute_offsets_method (str, optional(keyword-only)) –

    Method to compute offsets if not provided. Options are:
    • ”zero” that will set the offsets to zero.

    • ”logsum” that will take the logarithm of the sum (per line) of the counts.

    Overridden (useless) if offsets is not None.

  • add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.

  • ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default (3, 5).

  • exog_inflation (Tensor | ndarray | DataFrame | Series | None)

  • add_const_inflation (bool)

  • use_closed_form_prob (bool)

Return type:

ZIPlnPCACollection

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()
>>> print(zipcas.best_model())
classmethod from_formula(formula, data, *, compute_offsets_method='zero', ranks=(3, 5), use_closed_form_prob=True)[source]

Create an instance from a formula and data.

Parameters:
  • formula (str) – The formula.

  • data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.

  • compute_offsets_method (str, optional(keyword-only)) –

    Method to compute offsets if not provided. Options are:
    • ”zero” that will set the offsets to zero.

    • ”logsum” that will take the logarithm of the sum (per line) of the counts.

    Overridden (useless) if data[“offsets”] is not None.

  • ranks (Iterable[int], optional(keyword-only)) – The ranks (or number of PCs) that needs to be tested. By default (3, 5)

  • use_closed_form_prob (bool)

fit(maxiter=400, lr=0.01, tol=1e-06, verbose=False)[source]

Fit each model in the collection.

Parameters:
  • maxiter (int, optional) – The maximum number of iterations to be done, by default 400.

  • lr (float, optional(keyword-only)) – The learning rate, by default 0.01.

  • tol (float, optional(keyword-only)) – The tolerance, by default 1e-6.

  • verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.

Return type:

Collection

Return type:

ZIPlnPCACollection

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()
property coef_inflation: Dict[int, Tensor]

Property representing the coef_inflation, for each model in the collection.

Returns:

The coef inflation for each model.

Return type:

Dict[int, torch.Tensor]

property latent_prob: Dict[int, Tensor]

Property representing the latent_prob, for each model in the collection.

Returns:

The coef inflation for each model.

Return type:

Dict[int, torch.Tensor]

best_model(criterion='BIC')[source]

Get the best model according to the specified criterion.

Parameters:

criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘BIC’.

Returns:

The best model.

Return type:

Any

Return type:

ZIPlnPCACollection

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()
>>> print(zipcas.best_model())
property nb_cov_inflation: int

The number of exogenous variables for the inflation part.

property AIC: Dict[int, int]

Property representing the AIC scores of the models in the collection.

Returns:

The AIC scores of the models.

Return type:

Dict[int, float]

property BIC: Dict[int, int]

Property representing the BIC scores of the models in the collection.

Returns:

The BIC scores of the models.

Return type:

Dict[int, float]

property ICL: Dict[int, int]

Property representing the ICL scores of the models in the collection.

Returns:

The ICL scores of the models.

Return type:

Dict[int, float]

property coef: Dict[int, Tensor]

Property representing the coefficients of the collection.

Returns:

The coefficients.

Return type:

Dict[float, torch.Tensor]

property components: Dict[float, Tensor]

Property representing the components of each model in the collection.

Returns:

The components.

Return type:

Dict[int, torch.Tensor]

property dim: int

Number of dimensions (i.e. variables) of the dataset.

property endog: Tensor

Property representing the endogenous variables (counts).

Returns:

The endogenous variables.

Return type:

torch.Tensor

property exog: Tensor

Property representing the exogenous variables (covariates).

Returns:

The exogenous variables or None if no covariates are given in the model.

Return type:

torch.Tensor or None

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

Parameters:
  • key (Any) – The key to search for.

  • default (Any) – The default value to return if the key does not exist.

Returns:

The model with the specified key, or the default value if the key does not exist.

Return type:

Any

property grid: List[float]

Property representing the grid given in initialization.

Returns:

The grid.

Return type:

List[float]

items()

Get the key-value pairs of the models in the collection.

Returns:

The key-value pairs of the models.

Return type:

ItemsView

keys()

Get the grid of the collection.

Returns:

The grid of the collection.

Return type:

KeysView

property latent_mean: Dict[int, Tensor]

Property representing the latent mean, for each model in the collection.

Returns:

The latent means.

Return type:

Dict[int, torch.Tensor]

property latent_variance: Dict[int, Tensor]

Property representing the latent variance, for each model in the collection.

Returns:

The latent variances.

Return type:

Dict[int, torch.Tensor]

property loglike: Dict[int, float]

Property representing the log-likelihoods of the models in the collection.

Returns:

The log-likelihoods of the models.

Return type:

Dict[int, float]

property n_samples

Number of samples in the dataset.

property nb_cov: int

The number of exogenous variables.

property offsets: Tensor

Property representing the offsets.

Returns:

The offsets.

Return type:

torch.Tensor

property ranks

Property representing the ranks (of the covariance matrix) of each model in the collection.

Returns:

The ranks.

Return type:

List[int]

show(figsize=(10, 10))

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. Also show the explained variance pourcentage.

Parameters:

figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,10)

values()

Models in the collection as a list.

Returns:

The models in the collection.

Return type:

ValuesView

List of methods and attributes

Public Data Attributes:

coef_inflation

Property representing the coef_inflation, for each model in the collection.

latent_prob

Property representing the latent_prob, for each model in the collection.

nb_cov_inflation

The number of exogenous variables for the inflation part.

Inherited from PlnPCACollection

components

Property representing the components of each model in the collection.

ranks

Property representing the ranks (of the covariance matrix) of each model in the collection.

latent_mean

Property representing the latent mean, for each model in the collection.

latent_variance

Property representing the latent variance, for each model in the collection.

Inherited from Collection

exog

Property representing the exogenous variables (covariates).

offsets

Property representing the offsets.

endog

Property representing the endogenous variables (counts).

n_samples

Number of samples in the dataset.

grid

Property representing the grid given in initialization.

coef

Property representing the coefficients of the collection.

dim

Number of dimensions (i.e. variables) of the dataset.

nb_cov

The number of exogenous variables.

BIC

Property representing the BIC scores of the models in the collection.

ICL

Property representing the ICL scores of the models in the collection.

AIC

Property representing the AIC scores of the models in the collection.

loglike

Property representing the log-likelihoods of the models in the collection.

PlnModel

Public Methods:

__init__(endog, *[, exog, exog_inflation, ...])

Initializes the collection.

from_formula(formula, data, *[, ...])

Create an instance from a formula and data.

fit([maxiter, lr, tol, verbose])

Fit each model in the collection.

best_model([criterion])

Get the best model according to the specified criterion.

Inherited from PlnPCACollection

__init__(endog, *[, exog, offsets, ...])

Initializes the collection.

from_formula(formula, data, *[, ...])

Create an instance from a formula and data.

fit([maxiter, lr, tol, verbose])

Fit each model in the collection.

best_model([criterion])

Get the best model according to the specified criterion.

show([figsize])

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

Inherited from Collection

__init__(endog, grid, *[, exog, offsets, ...])

Initializes the collection.

from_formula(formula, data, grid, *[, ...])

Create an instance from a formula and data.

values()

Models in the collection as a list.

items()

Get the key-value pairs of the models in the collection.

__getitem__(grid_value)

Model with the specified grid_value.

__len__()

Number of models in the collection.

__iter__()

Iterate over the models in the collection.

__contains__(grid_value)

Check if a model with the specified grid_value exists in the collection.

keys()

Get the grid of the collection.

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

fit([maxiter, lr, tol, verbose])

Fit each model in the collection.

best_model([criterion])

Get the best model according to the specified criterion.

show([figsize])

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

__repr__()

Return a string representation of the Collection object.

Private Data Attributes:

_grid_value_name

_abc_impl

Inherited from PlnPCACollection

_grid_value_name

_abc_impl

Inherited from Collection

_useful_methods_strings

_useful_attributes_string

_abc_impl

Inherited from ABC

_abc_impl

Private Methods:

_instantiate_model(grid_value)

_set_column_names(model)

_init_next_model_with_current_model(...)

Initialize the next PlnModel model with the parameters of the current PlnModel model.

Inherited from PlnPCACollection

_instantiate_model(grid_value)

_init_next_model_with_current_model(...)

Initialize the next PlnModel model with the parameters of the current PlnModel model.

Inherited from Collection

_init_models(grid)

Method for initializing the models.

_is_right_instance(grid_value)

_set_column_names(model)

_instantiate_model(grid_value)

_print_beginning_message()

_init_next_model_with_current_model(...)

Initialize the next PlnModel model with the parameters of the current PlnModel model.

_print_ending_message()

_best_grid_value(criterion)