ZIPlnPCACollection
ZIPlnPCACollection Documentation
- class pyPLNmodels.ZIPlnPCACollection(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True, ranks=(3, 5), use_closed_form_prob=True)[source]
A collection of ZIPlnPCA models, each with a different number of components. The number of components can also be referred to as the number of PCs, or the rank of the covariance matrix. For more details, see B. Bricout ?
Examples
>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna >>> data = load_scrna() >>> zipcas = ZIPlnPCACollection.from_formula("endog ~ 1", data = data, ranks = [5,8, 12]) >>> zipcas.fit() >>> print(zipcas) >>> zipcas.show() >>> print(zipcas.best_model()) >>> print(zipcas[5])
See also
ZIPlnPCA
,pyPLNmodels.ZIPlnPCACollection.from_formula()
,pyPLNmodels.ZIPlnPCACollection.__init__()
,PlnNetworkCollection
,PlnPCACollection
,PlnMixtureCollection
- Parameters:
endog (Tensor | ndarray | DataFrame | None)
exog (Tensor | ndarray | DataFrame | Series | None)
exog_inflation (Tensor | ndarray | DataFrame | Series | None)
offsets (Tensor | ndarray | DataFrame | None)
compute_offsets_method ({'logsum', 'zero'})
add_const (bool)
add_const_inflation (bool)
ranks (Iterable[int] | None)
use_closed_form_prob (bool)
- __init__(endog, *, exog=None, exog_inflation=None, offsets=None, compute_offsets_method='zero', add_const=True, add_const_inflation=True, ranks=(3, 5), use_closed_form_prob=True)[source]
Initializes the collection.
- Parameters:
endog (Union[torch.Tensor, np.ndarray, pd.DataFrame]) – The count data.
exog (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The covariate data. Defaults to None.
offsets (Union[torch.Tensor, np.ndarray, pd.DataFrame], optional(keyword-only)) – The offsets data. Defaults to None.
compute_offsets_method (str, optional(keyword-only)) –
- Method to compute offsets if not provided. Options are:
”zero” that will set the offsets to zero.
”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if offsets is not None.
add_const (bool, optional(keyword-only)) – Whether to add a column of one in the exog. Defaults to True.
ranks (Iterable[int], optional(keyword-only)) – The range of ranks, by default (3, 5).
exog_inflation (Tensor | ndarray | DataFrame | Series | None)
add_const_inflation (bool)
use_closed_form_prob (bool)
- Return type:
Examples
>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna >>> data = load_scrna() >>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8]) >>> zipcas.fit() >>> print(zipcas.best_model())
- classmethod from_formula(formula, data, *, compute_offsets_method='zero', ranks=(3, 5), use_closed_form_prob=True)[source]
Create an instance from a formula and data.
- Parameters:
formula (str) – The formula.
data (dict) – The data dictionary. Each value can be either a torch.Tensor, np.ndarray, pd.DataFrame or pd.Series. The categorical exogenous variables should be 1-dimensional.
compute_offsets_method (str, optional(keyword-only)) –
- Method to compute offsets if not provided. Options are:
”zero” that will set the offsets to zero.
”logsum” that will take the logarithm of the sum (per line) of the counts.
Overridden (useless) if data[“offsets”] is not None.
ranks (Iterable[int], optional(keyword-only)) – The ranks (or number of PCs) that needs to be tested. By default (3, 5)
use_closed_form_prob (bool)
- fit(maxiter=400, lr=0.01, tol=1e-06, verbose=False)[source]
Fit each model in the collection.
- Parameters:
maxiter (int, optional) – The maximum number of iterations to be done, by default 400.
lr (float, optional(keyword-only)) – The learning rate, by default 0.01.
tol (float, optional(keyword-only)) – The tolerance, by default 1e-6.
verbose (bool, optional(keyword-only)) – Whether to print verbose output, by default False.
- Return type:
Collection
- Return type:
Examples
>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna >>> data = load_scrna() >>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8]) >>> zipcas.fit()
- property coef_inflation: Dict[int, Tensor]
Property representing the coef_inflation, for each model in the collection.
- Returns:
The coef inflation for each model.
- Return type:
Dict[int, torch.Tensor]
- property latent_prob: Dict[int, Tensor]
Property representing the latent_prob, for each model in the collection.
- Returns:
The coef inflation for each model.
- Return type:
Dict[int, torch.Tensor]
- best_model(criterion='BIC')[source]
Get the best model according to the specified criterion.
- Parameters:
criterion (str, optional) – The criterion to use (‘AIC’ or ‘BIC’), by default ‘BIC’.
- Returns:
The best model.
- Return type:
Any
- Return type:
Examples
>>> from pyPLNmodels import ZIPlnPCACollection, load_scrna >>> data = load_scrna() >>> zipcas = ZIPlnPCACollection(endog = data["endog"], ranks = [4,6,8]) >>> zipcas.fit() >>> print(zipcas.best_model())
- property nb_cov_inflation: int
The number of exogenous variables for the inflation part.
- property AIC: Dict[int, int]
Property representing the AIC scores of the models in the collection.
- Returns:
The AIC scores of the models.
- Return type:
Dict[int, float]
- property BIC: Dict[int, int]
Property representing the BIC scores of the models in the collection.
- Returns:
The BIC scores of the models.
- Return type:
Dict[int, float]
- property ICL: Dict[int, int]
Property representing the ICL scores of the models in the collection.
- Returns:
The ICL scores of the models.
- Return type:
Dict[int, float]
- property coef: Dict[int, Tensor]
Property representing the coefficients of the collection.
- Returns:
The coefficients.
- Return type:
Dict[float, torch.Tensor]
- property components: Dict[float, Tensor]
Property representing the components of each model in the collection.
- Returns:
The components.
- Return type:
Dict[int, torch.Tensor]
- property dim: int
Number of dimensions (i.e. variables) of the dataset.
- property endog: Tensor
Property representing the endogenous variables (counts).
- Returns:
The endogenous variables.
- Return type:
torch.Tensor
- property exog: Tensor
Property representing the exogenous variables (covariates).
- Returns:
The exogenous variables or None if no covariates are given in the model.
- Return type:
torch.Tensor or None
- get(key, default)
Get the model with the specified key, or return a default value if the key does not exist.
- Parameters:
key (Any) – The key to search for.
default (Any) – The default value to return if the key does not exist.
- Returns:
The model with the specified key, or the default value if the key does not exist.
- Return type:
Any
- property grid: List[float]
Property representing the grid given in initialization.
- Returns:
The grid.
- Return type:
List[float]
- items()
Get the key-value pairs of the models in the collection.
- Returns:
The key-value pairs of the models.
- Return type:
ItemsView
- keys()
Get the grid of the collection.
- Returns:
The grid of the collection.
- Return type:
KeysView
- property latent_mean: Dict[int, Tensor]
Property representing the latent mean, for each model in the collection.
- Returns:
The latent means.
- Return type:
Dict[int, torch.Tensor]
- property latent_variance: Dict[int, Tensor]
Property representing the latent variance, for each model in the collection.
- Returns:
The latent variances.
- Return type:
Dict[int, torch.Tensor]
- property loglike: Dict[int, float]
Property representing the log-likelihoods of the models in the collection.
- Returns:
The log-likelihoods of the models.
- Return type:
Dict[int, float]
- property n_samples
Number of samples in the dataset.
- property nb_cov: int
The number of exogenous variables.
- property offsets: Tensor
Property representing the offsets.
- Returns:
The offsets.
- Return type:
torch.Tensor
- property ranks
Property representing the ranks (of the covariance matrix) of each model in the collection.
- Returns:
The ranks.
- Return type:
List[int]
- show(figsize=(10, 10))
Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. Also show the explained variance pourcentage.
- Parameters:
figsize (tuple of two positive floats.) – Size of the figure that will be created. By default (10,10)
- values()
Models in the collection as a list.
- Returns:
The models in the collection.
- Return type:
ValuesView
List of methods and attributes
Public Data Attributes:
|
Property representing the coef_inflation, for each model in the collection. |
|
Property representing the latent_prob, for each model in the collection. |
|
The number of exogenous variables for the inflation part. |
Inherited from PlnPCACollection
|
Property representing the components of each model in the collection. |
|
Property representing the ranks (of the covariance matrix) of each model in the collection. |
|
Property representing the latent mean, for each model in the collection. |
|
Property representing the latent variance, for each model in the collection. |
Inherited from Collection
|
Property representing the exogenous variables (covariates). |
|
Property representing the offsets. |
|
Property representing the endogenous variables (counts). |
|
Number of samples in the dataset. |
|
Property representing the grid given in initialization. |
|
Property representing the coefficients of the collection. |
|
Number of dimensions (i.e. variables) of the dataset. |
|
The number of exogenous variables. |
|
Property representing the BIC scores of the models in the collection. |
|
Property representing the ICL scores of the models in the collection. |
|
Property representing the AIC scores of the models in the collection. |
|
Property representing the log-likelihoods of the models in the collection. |
|
Public Methods:
|
Initializes the collection. |
|
Create an instance from a formula and data. |
|
Fit each model in the collection. |
|
Get the best model according to the specified criterion. |
Inherited from PlnPCACollection
|
Initializes the collection. |
|
Create an instance from a formula and data. |
|
Fit each model in the collection. |
|
Get the best model according to the specified criterion. |
|
Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. |
Inherited from Collection
|
Initializes the collection. |
|
Create an instance from a formula and data. |
|
Models in the collection as a list. |
|
Get the key-value pairs of the models in the collection. |
|
Model with the specified grid_value. |
|
Number of models in the collection. |
|
Iterate over the models in the collection. |
|
Check if a model with the specified grid_value exists in the collection. |
|
Get the grid of the collection. |
|
Get the model with the specified key, or return a default value if the key does not exist. |
|
Fit each model in the collection. |
|
Get the best model according to the specified criterion. |
|
Show a plot with BIC scores, AIC scores, and negative log-likelihoods of the models. |
|
Return a string representation of the Collection object. |
Private Data Attributes:
|
|
|
Inherited from PlnPCACollection
|
|
|
Inherited from Collection
|
|
|
|
|
Inherited from ABC
|
Private Methods:
|
|
|
|
|
Initialize the next PlnModel model with the parameters of the current PlnModel model. |
Inherited from PlnPCACollection
|
|
|
Initialize the next PlnModel model with the parameters of the current PlnModel model. |
Inherited from Collection
|
Method for initializing the models. |
|
|
|
|
|
|
|
|
|
Initialize the next PlnModel model with the parameters of the current PlnModel model. |
|
|
|