The function `PLNPCA()`

produces a collection of models which are instances of object with class `PLNPCAfit`

.
This class comes with a set of methods, some of them being useful for the user:
See the documentation for the methods inherited by `PLNfit`

and the `plot()`

methods for PCA visualization

The function `PLNPCA`

, the class `PLNPCAfamily`

`PLNmodels::PLNfit`

-> `PLNPCAfit`

`rank`

the dimension of the current model

`vcov_model`

character: the model used for the residual covariance

`nb_param`

number of parameters in the current PLN model

`entropy`

entropy of the variational distribution

`latent_pos`

a matrix: values of the latent position vector (Z) without covariates effects or offset

`model_par`

a list with the matrices associated with the estimated parameters of the pPCA model: B (covariates), Sigma (covariance), Omega (precision) and C (loadings)

`percent_var`

the percent of variance explained by each axis

`corr_circle`

a matrix of correlations to plot the correlation circles

`scores`

a matrix of scores to plot the individual factor maps (a.k.a. principal components)

`rotation`

a matrix of rotation of the latent space

`eig`

description of the eigenvalues, similar to percent_var but for use with external methods

`var`

a list of data frames with PCA results for the variables:

`coord`

(coordinates of the variables),`cor`

(correlation between variables and dimensions),`cos2`

(Cosine of the variables) and`contrib`

(contributions of the variable to the axes)`ind`

a list of data frames with PCA results for the individuals:

`coord`

(coordinates of the individuals),`cos2`

(Cosine of the individuals),`contrib`

(contributions of individuals to an axis inertia) and`dist`

(distance of individuals to the origin).`call`

Hacky binding for compatibility with factoextra functions

`new()`

Initialize a `PLNPCAfit`

object

`PLNPCAfit$new(rank, responses, covariates, offsets, weights, formula, control)`

`rank`

rank of the PCA (or equivalently, dimension of the latent space)

`responses`

the matrix of responses (called Y in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`covariates`

design matrix (called X in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`offsets`

offset matrix (called O in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`weights`

an optional vector of observation weights to be used in the fitting process.

`formula`

model formula used for fitting, extracted from the formula in the upper-level call

`control`

a list for controlling the optimization. See details.

`update()`

Update a `PLNPCAfit`

object

```
PLNPCAfit$update(
B = NA,
Sigma = NA,
Omega = NA,
C = NA,
M = NA,
S = NA,
Z = NA,
A = NA,
Ji = NA,
R2 = NA,
monitoring = NA
)
```

`B`

matrix of regression matrix

`Sigma`

variance-covariance matrix of the latent variables

`Omega`

precision matrix of the latent variables. Inverse of Sigma.

`C`

matrix of PCA loadings (in the latent space)

`M`

matrix of mean vectors for the variational approximation

`S`

matrix of variance vectors for the variational approximation

`Z`

matrix of latent vectors (includes covariates and offset effects)

`A`

matrix of fitted values

`Ji`

vector of variational lower bounds of the log-likelihoods (one value per sample)

`R2`

approximate R^2 goodness-of-fit criterion

`monitoring`

a list with optimization monitoring quantities

`optimize()`

Call to the C++ optimizer and update of the relevant fields

`responses`

the matrix of responses (called Y in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`covariates`

design matrix (called X in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`offsets`

offset matrix (called O in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`weights`

an optional vector of observation weights to be used in the fitting process.

`config`

part of the

`control`

argument which configures the optimizer

`optimize_vestep()`

Result of one call to the VE step of the optimization procedure: optimal variational parameters (M, S) and corresponding log likelihood values for fixed model parameters (C, B). Intended to position new data in the latent space for further use with PCA.

```
PLNPCAfit$optimize_vestep(
covariates,
offsets,
responses,
weights = rep(1, self$n),
control = PLNPCA_param(backend = "nlopt")
)
```

`covariates`

design matrix (called X in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`offsets`

offset matrix (called O in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`responses`

the matrix of responses (called Y in the model). Will usually be extracted from the corresponding field in

`PLNfamily`

`weights`

an optional vector of observation weights to be used in the fitting process.

`control`

a list for controlling the optimization. See details.

`project()`

Project new samples into the PCA space using one VE step

`PLNPCAfit$project(newdata, control = PLNPCA_param(), envir = parent.frame())`

`newdata`

A data frame in which to look for variables, offsets and counts with which to predict.

`control`

a list for controlling the optimization. See

`PLN()`

for details.`envir`

Environment in which the projection is evaluated

`postTreatment()`

Update R2, fisher, std_err fields and set up visualization

`responses`

`PLNfamily`

`covariates`

`PLNfamily`

`offsets`

`PLNfamily`

`weights`

an optional vector of observation weights to be used in the fitting process.

`config`

part of the

`control`

argument which configures the optimizer`nullModel`

null model used for approximate R2 computations. Defaults to a GLM model with same design matrix but not latent variable.

The list of parameters `config`

controls the post-treatment processing, with the following entries:

jackknife boolean indicating whether jackknife should be performed to evaluate bias and variance of the model parameters. Default is FALSE.

bootstrap integer indicating the number of bootstrap resamples generated to evaluate the variance of the model parameters. Default is 0 (inactivated).

variational_var boolean indicating whether variational Fisher information matrix should be computed to estimate the variance of the model parameters (highly underestimated). Default is FALSE.

rsquared boolean indicating whether approximation of R2 based on deviance should be computed. Default is TRUE

trace integer for verbosity. should be > 1 to see output in post-treatments

`plot_individual_map()`

Plot the factorial map of the PCA

```
PLNPCAfit$plot_individual_map(
axes = 1:min(2, self$rank),
main = "Individual Factor Map",
plot = TRUE,
cols = "default"
)
```

`axes`

numeric, the axes to use for the plot when map = "individual" or "variable". Default it c(1,min(rank))

`main`

character. A title for the single plot (individual or variable factor map). If NULL (the default), an hopefully appropriate title will be used.

`plot`

logical. Should the plot be displayed or sent back as ggplot object

`cols`

a character, factor or numeric to define the color associated with the individuals. By default, all individuals receive the default color of the current palette.

`plot_correlation_circle()`

Plot the correlation circle of a specified axis for a `PLNLDAfit`

object

```
PLNPCAfit$plot_correlation_circle(
axes = 1:min(2, self$rank),
main = "Variable Factor Map",
cols = "default",
plot = TRUE
)
```

`axes`

numeric, the axes to use for the plot when map = "individual" or "variable". Default it c(1,min(rank))

`main`

character. A title for the single plot (individual or variable factor map). If NULL (the default), an hopefully appropriate title will be used.

`cols`

a character, factor or numeric to define the color associated with the variables. By default, all variables receive the default color of the current palette.

`plot`

logical. Should the plot be displayed or sent back as ggplot object

`plot_PCA()`

Plot a summary of the `PLNPCAfit`

object

```
PLNPCAfit$plot_PCA(
nb_axes = min(3, self$rank),
ind_cols = "ind_cols",
var_cols = "var_cols",
plot = TRUE
)
```

`nb_axes`

scalar: the number of axes to be considered when map = "both". The default is min(3,rank).

`ind_cols`

a character, factor or numeric to define the color associated with the individuals. By default, all variables receive the default color of the current palette.

`var_cols`

a character, factor or numeric to define the color associated with the variables. By default, all variables receive the default color of the current palette.

`plot`

logical. Should the plot be displayed or sent back as ggplot object

```
data(trichoptera)
trichoptera <- prepare_data(trichoptera$Abundance, trichoptera$Covariate)
myPCAs <- PLNPCA(Abundance ~ 1 + offset(log(Offset)), data = trichoptera, ranks = 1:5)
#>
#> Initialization...
#>
#> Adjusting 5 PLN models for PCA analysis.
#> Rank approximation = 3
Rank approximation = 2
Rank approximation = 4
Rank approximation = 1
Rank approximation = 5
#> Post-treatments
#> DONE!
myPCA <- getBestModel(myPCAs)
class(myPCA)
#> [1] "PLNPCAfit" "PLNfit" "PCA" "R6"
print(myPCA)
#> Poisson Lognormal with rank constrained for PCA (rank = 3)
#> ==================================================================
#> nb_param loglik BIC ICL
#> 65 -641.638 -768.122 -824.467
#> ==================================================================
#> * Useful fields
#> $model_par, $latent, $latent_pos, $var_par, $optim_par
#> $loglik, $BIC, $ICL, $loglik_vec, $nb_param, $criteria
#> * Useful S3 methods
#> print(), coef(), sigma(), vcov(), fitted()
#> predict(), predict_cond(), standard_error()
#> * Additional fields for PCA
#> $percent_var, $corr_circle, $scores, $rotation, $eig, $var, $ind
#> * Additional S3 methods for PCA
#> plot.PLNPCAfit()
```