PlnMixtureCollection

PlnMixtureCollection Documentation

class pyPLNmodels.PlnMixtureCollection(endog, *, exog=None, offsets=None, compute_offsets_method='zero', add_const=False, n_clusters=(2, 3, 4))[source]

A collection of PlnMixture models, each with a different number of clusters. For more details, see: J. Chiquet, M. Mariadassou, S. Robin: “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.”

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection.from_formula("endog ~ 0", data = data, n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures)
>>> mixtures.show()
>>> print(mixtures.best_model())
>>> print(mixtures[3])
Parameters:
  • endog (Tensor | ndarray | DataFrame)

  • exog (Tensor | ndarray | DataFrame | None)

  • offsets (Tensor | ndarray | DataFrame | None)

  • compute_offsets_method ({'logsum', 'zero'})

  • add_const (bool)

  • n_clusters (Iterable[int] | None)

PlnModel

alias of PlnMixture

__init__(endog, *, exog=None, offsets=None, compute_offsets_method='zero', add_const=False, n_clusters=(2, 3, 4))[source]

Initializes the collection.

Parameters:
  • endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.

  • exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.

  • offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.

  • compute_offsets_method (str, optional(keyword-only)) –

    Method to compute offsets if not provided. Options are:
    • ”zero” that will set the offsets to zero.

    • ”logsum” that will take the logarithm of the sum (per line) of the counts.

    Overridden (useless) if offsets is not None.

  • add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.

  • n_clusters (Iterable[int], optional(keyword-only)) – The range of clusters to test, by default (2, 3, 4).

Return type:

PlnMixtureCollection

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures.best_model())
classmethod from_formula(formula, data, *, compute_offsets_method='zero', n_clusters=(2, 3, 4))[source]

Create an instance from a formula and data.

Parameters:
  • formula (str) – The formula.

  • data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.

  • compute_offsets_method (str, optional(keyword-only)) –

    Method to compute offsets if not provided. Options are:
    • ”zero” that will set the offsets to zero.

    • ”logsum” that will take the logarithm of the sum (per line) of the counts.

    Overridden (useless) if data[“offsets”] is not None.

  • n_clusters (Iterable[int], optional(keyword-only)) – The number of clusters (or components in Kmeans) that needs to be tested. By default (2, 3, 4)

property n_clusters

Property representing the number of cluster of each model in the collection.

Returns:

The number of clusters.

Return type:

List[int]

fit(maxiter=400, lr=0.01, tol=1e-06, verbose=False)[source]

Fit each model in the collection.

Parameters:
  • maxiter (int, optional) – The maximum number of iterations to be done, by default 400.

  • lr (float, optional(keyword-only)) – The learning rate, by default 0.01.

  • tol (float, optional(keyword-only)) – The tolerance, by default 1e-6.

  • verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.

Return type:

Collection

Return type:

PlnMixtureCollection

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()
property latent_means: Dict[int, Tensor]

Property representing the latent means, for each model in the collection.

Returns:

The latent means.

Return type:

Dict[int, torch.Tensor]

property latent_variances: Dict[int, Tensor]

Property representing the latent mean, for each model in the collection.

Returns:

The latent variances.

Return type:

Dict[int, torch.Tensor]

best_model(criterion='BIC')[source]

Get the best model according to the specified criterion.

Parameters:

criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘BIC’.

Returns:

The best model.

Return type:

Any

Return type:

PlnMixtureCollection

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures.best_model())
property WCSS: Dict[int, int]

Compute the Within-Cluster Sum of Squares on the latent positions for each model in the collection.

The higher the better, but increasing n_cluster can only increase the metric. A trade-off (with the elbow method for example) must be applied.

Returns:

The WCSS scores of the models.

Return type:

Dict[int, float]

property silhouette: Dict[int, int]

Compute the silhouette score on the latent_positions for each model in the collection. See scikit-learn.metrics.silhouette_score for more information.

The higher the better.

Returns:

The silhouette scores of the models.

Return type:

Dict[int, float]

show(figsize=(15, 10))[source]

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. Also show two cluster criterion.

Parameters:

figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,15)

property AIC: Dict[int, int]

Property representing the AIC scores of the models in the collection.

Returns:

The AIC scores of the models.

Return type:

Dict[int, float]

property BIC: Dict[int, int]

Property representing the BIC scores of the models in the collection.

Returns:

The BIC scores of the models.

Return type:

Dict[int, float]

property ICL: Dict[int, int]

Property representing the ICL scores of the models in the collection.

Returns:

The ICL scores of the models.

Return type:

Dict[int, float]

property coef: Dict[int, Tensor]

Property representing the coefficients of the collection.

Returns:

The coefficients.

Return type:

Dict[float, torch.Tensor]

property dim: int

Number of dimensions (i.e. variables) of the dataset.

property endog: Tensor

Property representing the endogenous variables (counts).

Returns:

The endogenous variables.

Return type:

torch.Tensor

property exog: Tensor

Property representing the exogenous variables (covariates).

Returns:

The exogenous variables or None if no covariates are given in the model.

Return type:

torch.Tensor or None

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

Parameters:
  • key (Any) – The key to search for.

  • default (Any) – The default value to return if the key does not exist.

Returns:

The model with the specified key, or the default value if the key does not exist.

Return type:

Any

property grid: List[float]

Property representing the grid given in initialization.

Returns:

The grid.

Return type:

List[float]

items()

Get the key-value pairs of the models in the collection.

Returns:

The key-value pairs of the models.

Return type:

ItemsView

keys()

Get the grid of the collection.

Returns:

The grid of the collection.

Return type:

KeysView

property loglike: Dict[int, float]

Property representing the log-likelihoods of the models in the collection.

Returns:

The log-likelihoods of the models.

Return type:

Dict[int, float]

property n_samples

Number of samples in the dataset.

property nb_cov: int

The number of exogenous variables.

property offsets: Tensor

Property representing the offsets.

Returns:

The offsets.

Return type:

torch.Tensor

values()

Models in the collection as a list.

Returns:

The models in the collection.

Return type:

ValuesView

List of methods and attributes

Public Data Attributes:

n_clusters

Property representing the number of cluster of each model in the collection.

latent_means

Property representing the latent means, for each model in the collection.

latent_variances

Property representing the latent mean, for each model in the collection.

WCSS

Compute the Within-Cluster Sum of Squares on the latent positions for each model in the collection.

silhouette

Compute the silhouette score on the latent_positions for each model in the collection.

Inherited from Collection

exog

Property representing the exogenous variables (covariates).

offsets

Property representing the offsets.

endog

Property representing the endogenous variables (counts).

n_samples

Number of samples in the dataset.

grid

Property representing the grid given in initialization.

coef

Property representing the coefficients of the collection.

dim

Number of dimensions (i.e. variables) of the dataset.

nb_cov

The number of exogenous variables.

BIC

Property representing the BIC scores of the models in the collection.

ICL

Property representing the ICL scores of the models in the collection.

AIC

Property representing the AIC scores of the models in the collection.

loglike

Property representing the log-likelihoods of the models in the collection.

PlnModel

Public Methods:

__init__(endog, *[, exog, offsets, ...])

Initializes the collection.

from_formula(formula, data, *[, ...])

Create an instance from a formula and data.

fit([maxiter, lr, tol, verbose])

Fit each model in the collection.

best_model([criterion])

Get the best model according to the specified criterion.

show([figsize])

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

Inherited from Collection

__init__(endog, grid, *[, exog, offsets, ...])

Initializes the collection.

from_formula(formula, data, grid, *[, ...])

Create an instance from a formula and data.

values()

Models in the collection as a list.

items()

Get the key-value pairs of the models in the collection.

__getitem__(grid_value)

Model with the specified grid_value.

__len__()

Number of models in the collection.

__iter__()

Iterate over the models in the collection.

__contains__(grid_value)

Check if a model with the specified grid_value exists in the collection.

keys()

Get the grid of the collection.

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

fit([maxiter, lr, tol, verbose])

Fit each model in the collection.

best_model([criterion])

Get the best model according to the specified criterion.

show([figsize])

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

__repr__()

Return a string representation of the Collection object.

Private Data Attributes:

_grid_value_name

_abc_impl

Inherited from Collection

_useful_methods_strings

_useful_attributes_string

_abc_impl

Inherited from ABC

_abc_impl

Private Methods:

_instantiate_model(grid_value)

_init_next_model_with_current_model(...)

Initialize the next PlnModel model with the parameters of the current PlnModel model.

Inherited from Collection

_init_models(grid)

Method for initializing the models.

_is_right_instance(grid_value)

_set_column_names(model)

_instantiate_model(grid_value)

_print_beginning_message()

_init_next_model_with_current_model(...)

Initialize the next PlnModel model with the parameters of the current PlnModel model.

_print_ending_message()

_best_grid_value(criterion)