PlnMixtureCollection

For an in-depth tutorial to the PlnMixtureCollection model, see the clustering tutorial.

PlnMixtureCollection Documentation

class pyPLNmodels.PlnMixtureCollection(endog, *, exog=None, offsets=None, compute_offsets_method='zero', add_const=False, n_clusters=(2, 3, 4))[source]

A collection of PlnMixture models, each with a different number of clusters. For more details, see: J. Chiquet, M. Mariadassou, S. Robin: “The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances.”

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection.from_formula("endog ~ 0", data = data, n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures)
>>> mixtures.show()
>>> print(mixtures.best_model())
>>> print(mixtures[3])

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures.best_model())

classmethod from_formula(formula, data, *, compute_offsets_method='zero', n_clusters=(2, 3, 4))[source]

Create an instance from a formula and data.

Parameters:

formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.
compute_offsets_method (str, optional(keyword-only)) –
Method to compute offsets if not provided. Options are:
- ”zero” that will set the offsets to zero.
- ”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if data[“offsets”] is not None.
n_clusters (Iterable[int], optional(keyword-only)) – The number of clusters (or components in Kmeans) that needs to be tested. By default (2, 3, 4)

property n_clusters

Property representing the number of cluster of each model in the collection.

Returns:: The number of clusters.
Return type:: List[int]

fit(maxiter=400, lr=0.01, tol=1e-06, verbose=False)[source]

Fit each model in the collection.

Parameters:

maxiter (int, optional) – The maximum number of iterations to be done, by default 400.
lr (float, optional(keyword-only)) – The learning rate, by default 0.01.
tol (float, optional(keyword-only)) – The tolerance, by default 1e-6.
verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.

Return type:

Collection

Return type:

PlnMixtureCollection

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()

property latent_means: Dict[int, Tensor]

Property representing the latent means, for each model in the collection.

Returns:: The latent means.
Return type:: Dict[int, torch.Tensor]

property latent_variances: Dict[int, Tensor]

Property representing the latent mean, for each model in the collection.

Returns:: The latent variances.
Return type:: Dict[int, torch.Tensor]

best_model(criterion='BIC')[source]

Get the best model according to the specified criterion.

Parameters:: criterion (str, optional) – The criterion to use (‘AIC’, ‘BIC’ or ‘silhouette’), by default ‘silhouette’.
Return type:: PlnMixture

Examples

>>> from pyPLNmodels import PlnMixtureCollection, load_scrna
>>> data = load_scrna()
>>> mixtures = PlnMixtureCollection(endog = data["endog"], n_clusters = [2,3,4])
>>> mixtures.fit()
>>> print(mixtures.best_model())

Returns:: The best model.
Return type:: PlnMixtureCollection
Parameters:: criterion (str)

property WCSS: Dict[int, int]

Compute the Within-Cluster Sum of Squares on the latent positions for each model in the collection.

The higher the better, but increasing n_cluster can only increase the metric. A trade-off (with the elbow method for example) must be applied.

Returns:: The WCSS scores of the models.
Return type:: Dict[int, float]

property silhouette: Dict[int, int]

Compute the silhouette score on the latent_positions for each model in the collection. See scikit-learn.metrics.silhouette_score for more information.

The higher the better.

Returns:: The silhouette scores of the models.
Return type:: Dict[int, float]

show(figsize=(15, 10))[source]

Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. Also show two cluster criterion.

Parameters:: figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,15)

property AIC: Dict[int, int]

Property representing the AIC scores of the models in the collection.

Returns:: The AIC scores of the models.
Return type:: Dict[int, float]

property BIC: Dict[int, int]

Property representing the BIC scores of the models in the collection.

Returns:: The BIC scores of the models.
Return type:: Dict[int, float]

property ICL: Dict[int, int]

Property representing the ICL scores of the models in the collection.

Returns:: The ICL scores of the models.
Return type:: Dict[int, float]

property coef: Dict[int, Tensor]

Property representing the coefficients of the collection.

Returns:: The coefficients.
Return type:: Dict[float, torch.Tensor]

property dim: int: Number of dimensions (i.e. variables) of the dataset.

property endog: Tensor

Property representing the endogenous variables (counts).

Returns:: The endogenous variables.
Return type:: torch.Tensor

property exog: Tensor

Property representing the exogenous variables (covariates).

Returns:: The exogenous variables or None if no covariates are given in the model.
Return type:: torch.Tensor or None

get(key, default)

Get the model with the specified key, or return a default value if the key does not exist.

Parameters:

key (Any) – The key to search for.
default (Any) – The default value to return if the key does not exist.

Returns:

The model with the specified key, or the default value if the key does not exist.

Return type:

Any

property grid: List[float]

Property representing the grid given in initialization.

Returns:: The grid.
Return type:: List[float]

items()

Get the key-value pairs of the models in the collection.

Returns:: The key-value pairs of the models.
Return type:: ItemsView

keys()

Get the grid of the collection.

Returns:: The grid of the collection.
Return type:: KeysView

property loglike: Dict[int, float]

Property representing the log-likelihoods of the models in the collection.

Returns:: The log-likelihoods of the models.
Return type:: Dict[int, float]

property n_samples: Number of samples in the dataset.

property nb_cov: int: The number of exogenous variables.

property offsets: Tensor

Property representing the offsets.

Returns:: The offsets.
Return type:: torch.Tensor

values()

Models in the collection as a list.

Returns:: The models in the collection.
Return type:: ValuesView

List of methods and attributes

Public Data Attributes:

`n_clusters`	Property representing the number of cluster of each model in the collection.
`latent_means`	Property representing the latent means, for each model in the collection.
`latent_variances`	Property representing the latent mean, for each model in the collection.
`WCSS`	Compute the Within-Cluster Sum of Squares on the latent positions for each model in the collection.
`silhouette`	Compute the silhouette score on the latent_positions for each model in the collection.

Inherited from Collection

`exog`	Property representing the exogenous variables (covariates).
`offsets`	Property representing the offsets.
`endog`	Property representing the endogenous variables (counts).
`n_samples`	Number of samples in the dataset.
`grid`	Property representing the grid given in initialization.
`coef`	Property representing the coefficients of the collection.
`dim`	Number of dimensions (i.e. variables) of the dataset.
`nb_cov`	The number of exogenous variables.
`BIC`	Property representing the BIC scores of the models in the collection.
`ICL`	Property representing the ICL scores of the models in the collection.
`AIC`	Property representing the AIC scores of the models in the collection.
`loglike`	Property representing the log-likelihoods of the models in the collection.
`PlnModel`

Public Methods:

`__init__`(endog, *[, exog, offsets, ...])	Initializes the collection.
`from_formula`(formula, data, *[, ...])	Create an instance from a formula and data.
`fit`([maxiter, lr, tol, verbose])	Fit each model in the collection.
`best_model`([criterion])	Get the best model according to the specified criterion.
`show`([figsize])	Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.

Inherited from Collection

`__init__`(endog, grid, *[, exog, offsets, ...])	Initializes the collection.
`from_formula`(formula, data, grid, *[, ...])	Create an instance from a formula and data.
`values`()	Models in the collection as a list.
`items`()	Get the key-value pairs of the models in the collection.
`__getitem__`(grid_value)	Model with the specified grid_value.
`__len__`()	Number of models in the collection.
`__iter__`()	Iterate over the models in the collection.
`__contains__`(grid_value)	Check if a model with the specified grid_value exists in the collection.
`keys`()	Get the grid of the collection.
`get`(key, default)	Get the model with the specified key, or return a default value if the key does not exist.
`fit`([maxiter, lr, tol, verbose])	Fit each model in the collection.
`best_model`([criterion])	Get the best model according to the specified criterion.
`show`([figsize])	Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models.
`__repr__`()	Return a string representation of the Collection object.

Private Data Attributes:

`_grid_value_name`
`_abc_impl`

Inherited from Collection

`_useful_methods_strings`
`_useful_attributes_string`
`_name`
`_abc_impl`

Inherited from ABC

_abc_impl

Private Methods:

`_instantiate_model`(grid_value)
`_init_next_model_with_current_model`(...)	Initialize the next PlnModel model with the parameters of the current PlnModel model.
`_best_grid_value`(criterion)

Inherited from Collection

`_init_models`(grid)	Method for initializing the models.
`_is_right_instance`(grid_value)
`_set_column_names`(model)
`_instantiate_model`(grid_value)
`_print_beginning_message`()
`_init_next_model_with_current_model`(...)	Initialize the next PlnModel model with the parameters of the current PlnModel model.
`_print_ending_message`()
`_best_grid_value`(criterion)