ZIPlnPCACollection

For an in-depth tutorial to the ZIPlnPCACollection model, see the zero-inflation tutorial.

ZIPlnPCACollection Documentation

class pyPLNmodels.ZIPlnPCACollection(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True, ranks=(3, 5), use_closed_form_prob=True)[source]

A collection of ZIPlnPCA models, each with a different number of components. The number of components can also be referred to as the number of PCs, or the rank of the covariance matrix. For more details, see B. Bricout ?

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection.from_formula("endog ~ 1", data = data, ranks = [5,8, 12])
>>> zipcas.fit()
>>> print(zipcas)
>>> zipcas.show()
>>> print(zipcas.best_model())
>>> print(zipcas[5])

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()
>>> print(zipcas.best_model())

classmethod from_formula(formula, data, *, compute_offsets_method='zero', ranks=(3, 5), use_closed_form_prob=True)[source]

Create an instance from a formula and data.

Parameters:

formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.
compute_offsets_method (str, optional(keyword-only)) –
Method to compute offsets if not provided. Options are:
- ”zero” that will set the offsets to zero.
- ”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if data[“offsets”] is not None.
ranks (Iterable[int], optional(keyword-only)) – The ranks (or number of PCs) that needs to be tested. By default (3, 5)
use_closed_form_prob (bool)

fit(maxiter=1000, lr=0.01, tol=9.999999999999999e-10, verbose=False)[source]

Fit each model in the collection.

Parameters:

maxiter (int, optional) – The maximum number of iterations to be done, by default 400.
lr (float, optional(keyword-only)) – The learning rate, by default 0.01.
tol (float, optional(keyword-only)) – The tolerance, by default 1e-6.
verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.

Return type:

Collection

Return type:

ZIPlnPCACollection

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()

property coef_inflation: Dict[int, Tensor]

Property representing the coef_inflation, for each model in the collection.

Returns:: The coef inflation for each model.
Return type:: Dict[int, torch.Tensor]

property latent_prob: Dict[int, Tensor]

Property representing the latent_prob, for each model in the collection.

Returns:: The coef inflation for each model.
Return type:: Dict[int, torch.Tensor]

best_model(criterion='BIC')[source]

Get the best model according to the specified criterion.

Parameters:: criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘BIC’.
Returns:: The best model.
Return type:: Any
Return type:: ZIPlnPCACollection

Examples

>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna
>>> data = load_scrna()
>>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8])
>>> zipcas.fit()
>>> print(zipcas.best_model())

property nb_cov_inflation: int: The number of exogenous variables for the inflation part.

property AIC: Dict[int, int]

Property representing the AIC scores of the models in the collection.

Returns:: The AIC scores of the models.
Return type:: Dict[int, float]

property BIC: Dict[int, int]

Property representing the BIC scores of the models in the collection.

Returns:: The BIC scores of the models.
Return type:: Dict[int, float]

property ICL: Dict[int, int]

Property representing the ICL scores of the models in the collection.

Returns:: The ICL scores of the models.
Return type:: Dict[int, float]

property coef: Dict[int, Tensor]

Property representing the coefficients of the collection.

Returns:: The coefficients.
Return type:: Dict[float, torch.Tensor]

property components: Dict[float, Tensor]

Property representing the components of each model in the collection.

Returns:: The components.
Return type:: Dict[int, torch.Tensor]

property dim: int: Number of dimensions (i.e. variables) of the dataset.

property endog: Tensor

Property representing the endogenous variables (counts).

Returns:: The endogenous variables.
Return type:: torch.Tensor

property exog: Tensor

Property representing the exogenous variables (covariates).

Returns:: The exogenous variables or None if no covariates are given in the model.
Return type:: torch.Tensor or None

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

Parameters:

key (Any) – The key to search for.
default (Any) – The default value to return if the key does not exist.

Returns:

The model with the specified key, or the default value if the key does not exist.

Return type:

Any

property grid: List[float]

Property representing the grid given in initialization.

Returns:: The grid.
Return type:: List[float]

items()

Get the key-value pairs of the models in the collection.

Returns:: The key-value pairs of the models.
Return type:: ItemsView

keys()

Get the grid of the collection.

Returns:: The grid of the collection.
Return type:: KeysView

property latent_mean: Dict[int, Tensor]

Property representing the latent mean, for each model in the collection.

Returns:: The latent means.
Return type:: Dict[int, torch.Tensor]

property latent_variance: Dict[int, Tensor]

Property representing the latent variance, for each model in the collection.

Returns:: The latent variances.
Return type:: Dict[int, torch.Tensor]

property loglike: Dict[int, float]

Property representing the log-likelihoods of the models in the collection.

Returns:: The log-likelihoods of the models.
Return type:: Dict[int, float]

property n_samples: Number of samples in the dataset.

property nb_cov: int: The number of exogenous variables.

property offsets: Tensor

Property representing the offsets.

Returns:: The offsets.
Return type:: torch.Tensor

property ranks

Property representing the ranks (of the covariance matrix) of each model in the collection.

Returns:: The ranks.
Return type:: List[int]

show(figsize=(10, 10))

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. Also show the explained variance pourcentage.

Parameters:: figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,10)

values()

Models in the collection as a list.

Returns:: The models in the collection.
Return type:: ValuesView

List of methods and attributes

Public Data Attributes:

`coef_inflation`	Property representing the coef_inflation, for each model in the collection.
`latent_prob`	Property representing the latent_prob, for each model in the collection.
`nb_cov_inflation`	The number of exogenous variables for the inflation part.

Inherited from PlnPCACollection

`components`	Property representing the components of each model in the collection.
`ranks`	Property representing the ranks (of the covariance matrix) of each model in the collection.
`latent_mean`	Property representing the latent mean, for each model in the collection.
`latent_variance`	Property representing the latent variance, for each model in the collection.

Inherited from Collection

`exog`	Property representing the exogenous variables (covariates).
`offsets`	Property representing the offsets.
`endog`	Property representing the endogenous variables (counts).
`n_samples`	Number of samples in the dataset.
`grid`	Property representing the grid given in initialization.
`coef`	Property representing the coefficients of the collection.
`dim`	Number of dimensions (i.e. variables) of the dataset.
`nb_cov`	The number of exogenous variables.
`BIC`	Property representing the BIC scores of the models in the collection.
`ICL`	Property representing the ICL scores of the models in the collection.
`AIC`	Property representing the AIC scores of the models in the collection.
`loglike`	Property representing the log-likelihoods of the models in the collection.
`PlnModel`

Public Methods:

`__init__`(endog, *[, exog, exog_inflation, ...])	Initializes the collection.
`from_formula`(formula, data, *[, ...])	Create an instance from a formula and data.
`fit`([maxiter, lr, tol, verbose])	Fit each model in the collection.
`best_model`([criterion])	Get the best model according to the specified criterion.

Inherited from PlnPCACollection

`__init__`(endog, *[, exog, offsets, ...])	Initializes the collection.
`from_formula`(formula, data, *[, ...])	Create an instance from a formula and data.
`fit`([maxiter, lr, tol, verbose])	Fit each model in the collection.
`best_model`([criterion])	Get the best model according to the specified criterion.
`show`([figsize])	Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

Inherited from Collection

`__init__`(endog, grid, *[, exog, offsets, ...])	Initializes the collection.
`from_formula`(formula, data, grid, *[, ...])	Create an instance from a formula and data.
`values`()	Models in the collection as a list.
`items`()	Get the key-value pairs of the models in the collection.
`__getitem__`(grid_value)	Model with the specified grid_value.
`__len__`()	Number of models in the collection.
`__iter__`()	Iterate over the models in the collection.
`__contains__`(grid_value)	Check if a model with the specified grid_value exists in the collection.
`keys`()	Get the grid of the collection.
`get`(key, default)	Get the model with the specified key, or return a default value if the key does not exist.
`fit`([maxiter, lr, tol, verbose])	Fit each model in the collection.
`best_model`([criterion])	Get the best model according to the specified criterion.
`show`([figsize])	Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.
`__repr__`()	Return a string representation of the Collection object.

Private Data Attributes:

`_grid_value_name`
`_abc_impl`

Inherited from PlnPCACollection

`_grid_value_name`
`_abc_impl`

Inherited from Collection

`_useful_methods_strings`
`_useful_attributes_string`
`_name`
`_abc_impl`

Inherited from ABC

_abc_impl

Private Methods:

`_instantiate_model`(grid_value)
`_set_column_names`(model)
`_init_next_model_with_current_model`(...)	Initialize the next PlnModel model with the parameters of the current PlnModel model.

Inherited from PlnPCACollection

`_instantiate_model`(grid_value)
`_init_next_model_with_current_model`(...)	Initialize the next PlnModel model with the parameters of the current PlnModel model.

Inherited from Collection

`_init_models`(grid)	Method for initializing the models.
`_is_right_instance`(grid_value)
`_set_column_names`(model)
`_instantiate_model`(grid_value)
`_print_beginning_message`()
`_init_next_model_with_current_model`(...)	Initialize the next PlnModel model with the parameters of the current PlnModel model.
`_print_ending_message`()
`_best_grid_value`(criterion)