Function for fitting latent class models with multiple groups, which may or may not include latent class structure for group variable.
glca(
formula,
group = NULL,
data = NULL,
nclass = 3,
ncluster = NULL,
std.err = TRUE,
measure.inv = TRUE,
coeff.inv = TRUE,
init.param = NULL,
n.init = 10,
decreasing = FALSE,
testiter = 50,
maxiter = 5000,
eps = 1e-06,
na.rm = FALSE,
seed = NULL,
verbose = TRUE
)
a formula for specifying manifest items and covariates using the "item
" function.
an optional vector specifying a group of observations. Given group variable, group covariates can be incorporated.
a data frame containing the manifest item, covariates and group variable.
number of level-1 (individual-level) latent classes.
number of level-2 (group-level) latent classes. When group
and ncluster
(>1) are given the multilevel latent class models will be fitted.
a logical value for whether calculating standard errors for estimates.
a logical value of the measurement invariance assumption across groups.
a logical value of the coefficient invariance assumption across groups (random intercept model).
A set of model parameters to be used as the user-defined initial values for the EM algorithm. It should be list
with the named parameters and have same structure of param
of the glca
output. In default, initial parameters are randomly generated.
number of randomly generated initial parameter sets to be used for avoiding the problem of local maxima.
a logical value for whether reordering the parameters by descending order responding probability for first-category of first manifest item.
number of iterations in the EM algorithm for each initial parameter set. The initial parameter set that provides the largest log-likelihood will be selected for estimating the model.
maximum number of iterations for the EM algorithm.
a convergence tolerance value. When the largest absolute difference between former estimates and current estimates is less than eps
, the algorithm will stop updating and consider the convergence to be reached.
a logical value for deleting the lines that have at least one missing manifest item. If na.rm = FALSE
, MAR procedure will be conducted.
In default, the set of initial parameters is drawn randomly. As the same value for seed guarantees the same initial parameters to be drawn, this argument can be used for reproducibility of estimation results.
a logical value indicating whether glca
should print the estimation procedure onto the screen.
glca
returns an object of class "glca
".
The function summary
prints estimates for parameters and glca.gof
function gives goodness of fit measures for the model.
An object of class "glca
" is a list containing the following components:
the matched call.
the terms
object used.
a list
of model description.
a list
of names of data.
a list
of data used for fitting.
a list
of parameter estimates.
a list
of standard errors for estimates.
a list
of logistic regression coefficients for prevalence of level-1 class.
a data.frame
or a list
of posterior probablities of each individaul for latent classes and each group for latent clusters.
a list
of goodness of fit measures.
a list
containing information about convergence.
The glca
is the function for implementing LCA consist of two-type latent categorical variables (i.e., level-1 and level-2 latent class). The level-1 (individual-level) latent class is identified by the association among the individuals' responses to multiple manifest items, but level-2 (group-level) latent class is categorized by the prevalence of level-1 latent class for group variable. The function glca
can handle two types of covariates: level-1 and level-2 covariates. If covariates vary across individuals, they are considered as level-1 covariates. When group
and ncluster
(>1) are given, covariates which are varying across groups are considered as level-2 covariates. Both types of covariates have effect on level-1 class prevalence.
The formula should consist of an ~
operator between two sides. Manifest items should be indicated in LHS of formula using item
function and covariates should be specified in RHS of formula. For example, item(y1, y2, y3) ~ 1
item(y1, y2, y3) ~ x1 + x2
where the first fomula indicates LCA with three manifest variables (y1
, y2
, and y3
) and no covariate, and the second formula includes two covariates (x1
and x2
). Two types of covariates (i.e., level-1 and level-2 covariates) will be automatically detected by glca
.
The estimated parameters in glca
are rho
, gamma
, delta
, and beta
. The set of item response probabilities for each level-1 class is rho
. The sets of prevalences for level-1 and level-2 class are gamma
and delta
, respectively. The prevalence for level-1 class (i.e., gamma
) can be modeled as logistic regression using level-1 and/or level-2 covariates. The set of logistic regression coefficients is beta
in glca
output.
Vermunt, J.K. (2003) Multilevel latent class models. Sociological Methodology, 33, 213–239. doi:10.1111/j.0081-1750.2003.t01-1-00131.x
Collins, L.M. and Lanza, S.T. (2009) Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. John Wiley & Sons Inc.
##
## Example 1. GSS dataset
##
data("gss08")
# LCA
lca = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 1,
data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#>
#> Deleted observation(s) :
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Latent class analysis Fitting...
#>
#> . 123 iteration
#>
#> Converged at 123 iteration (loglik :-687.4486)
summary(lca)
#>
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~
#> 1, data = gss08, nclass = 3, n.init = 1)
#>
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE
#>
#> Categories for manifest items :
#> Y = 1 Y = 2
#> DEFECT YES NO
#> HLTH YES NO
#> RAPE YES NO
#> POOR YES NO
#> SINGLE YES NO
#> NOMORE YES NO
#>
#> Model : Latent class analysis
#>
#> Number of latent classes : 3
#> Number of observations : 352
#> Number of parameters : 20
#>
#> log-likelihood : -687.4486
#> G-squared : 29.82695
#> AIC : 1414.897
#> BIC : 1492.17
#>
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3
#> 0.34467 0.19138 0.46396
#>
#> Item-response probabilities (Y = 1) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.8275 0.9453 0.7960 0.0638 0.0390 0.1344
#> Class 2 0.0466 0.3684 0.0949 0.0000 0.0000 0.0000
#> Class 3 1.0000 1.0000 1.0000 0.9813 0.9284 0.9657
#>
#> Item-response probabilities (Y = 2) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.1725 0.0547 0.2040 0.9362 0.9610 0.8656
#> Class 2 0.9534 0.6316 0.9051 1.0000 1.0000 1.0000
#> Class 3 0.0000 0.0000 0.0000 0.0187 0.0716 0.0343
# LCA with covariate(s)
lcr = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ AGE,
data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Covariates (Level 1) :
#> AGE
#>
#> Deleted observation(s) :
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Latent class analysis Fitting...
#>
#> . 117 iteration
#>
#> Converged at 117 iteration (loglik :-686.4118)
summary(lcr)
#>
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~
#> AGE, data = gss08, nclass = 3, n.init = 1)
#>
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Covariates (Level 1) : AGE
#>
#> Categories for manifest items :
#> Y = 1 Y = 2
#> DEFECT YES NO
#> HLTH YES NO
#> RAPE YES NO
#> POOR YES NO
#> SINGLE YES NO
#> NOMORE YES NO
#>
#> Model : Latent class analysis
#>
#> Number of latent classes : 3
#> Number of observations : 352
#> Number of parameters : 22
#>
#> log-likelihood : -686.4118
#> G-squared : 589.4148
#> AIC : 1416.824
#> BIC : 1501.824
#>
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3
#> 0.18809 0.46397 0.34794
#>
#> Logistic regression coefficients :
#> Class 1/3 Class 2/3
#> (Intercept) 0.1210 0.6065
#> AGE -0.0156 -0.0066
#> Item-response probabilities (Y = 1) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.0239 0.3719 0.1016 0.0000 0.0000 0.0000
#> Class 2 1.0000 1.0000 1.0000 0.9813 0.9283 0.9656
#> Class 3 0.8338 0.9399 0.7852 0.0628 0.0386 0.1330
#>
#> Item-response probabilities (Y = 2) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.9761 0.6281 0.8984 1.0000 1.0000 1.0000
#> Class 2 0.0000 0.0000 0.0000 0.0187 0.0717 0.0344
#> Class 3 0.1662 0.0601 0.2148 0.9372 0.9614 0.8670
coef(lcr)
#> Class 1 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> (Intercept) 1.12860 0.12098 0.51570 0.235 0.815
#> AGE 0.98453 -0.01559 0.01077 -1.447 0.149
#>
#> Class 2 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> (Intercept) 1.834068 0.606537 0.364407 1.664 0.097 .
#> AGE 0.993465 -0.006557 0.007060 -0.929 0.354
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
# Multiple-group LCA (MGLCA)
mglca = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 1,
group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Grouping variable : DEGREE
#>
#> Deleted observation(s) :
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Multiple-group latent class analysis Fitting...
#>
#> 91 iteration
#>
#> Converged at 91 iteration (loglik :-672.4138)
summary(mglca)
#>
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~
#> 1, group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#>
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Grouping variable : DEGREE
#>
#> Categories for manifest items :
#> Y = 1 Y = 2
#> DEFECT YES NO
#> HLTH YES NO
#> RAPE YES NO
#> POOR YES NO
#> SINGLE YES NO
#> NOMORE YES NO
#>
#> Model : Multiple-group latent class analysis
#>
#> Number of latent classes : 3
#> Number of groups : 4
#> Number of observations : 352
#> Number of parameters : 26
#>
#> log-likelihood : -672.4138
#> G-squared : 87.85135
#> AIC : 1396.828
#> BIC : 1497.282
#>
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3
#> 0.34468 0.46148 0.19384
#>
#> Class prevalences by group :
#> Class 1 Class 2 Class 3
#> <= HS 0.51954 0.16850 0.31196
#> HIGH SCHOOL 0.34862 0.44393 0.20745
#> COLLEGE 0.29846 0.55348 0.14806
#> GRADUATE 0.20439 0.71186 0.08375
#>
#> Item-response probabilities (Y = 1) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.8318 0.9470 0.7999 0.0690 0.0433 0.1389
#> Class 2 1.0000 1.0000 1.0000 0.9835 0.9309 0.9681
#> Class 3 0.0507 0.3724 0.0995 0.0000 0.0000 0.0000
#>
#> Item-response probabilities (Y = 2) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.1682 0.0530 0.2001 0.9310 0.9567 0.8611
#> Class 2 0.0000 0.0000 0.0000 0.0165 0.0691 0.0319
#> Class 3 0.9493 0.6276 0.9005 1.0000 1.0000 1.0000
# Multiple-group LCA with covariate(s) (MGLCR)
mglcr = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ SEX,
group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Grouping variable : DEGREE
#> Covariates (Level 1) :
#> SEX
#>
#> Deleted observation(s) :
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Multiple-group latent class analysis Fitting...
#>
#> 98 iteration
#>
#> Converged at 98 iteration (loglik :-666.7097)
summary(mglcr)
#>
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~
#> SEX, group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#>
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Grouping variable : DEGREE
#> Covariates (Level 1) : SEX
#>
#> Categories for manifest items :
#> Y = 1 Y = 2
#> DEFECT YES NO
#> HLTH YES NO
#> RAPE YES NO
#> POOR YES NO
#> SINGLE YES NO
#> NOMORE YES NO
#>
#> Model : Multiple-group latent class analysis
#>
#> Number of latent classes : 3
#> Number of groups : 4
#> Number of observations : 352
#> Number of parameters : 28
#>
#> log-likelihood : -666.7097
#> G-squared : 149.9656
#> AIC : 1389.419
#> BIC : 1497.601
#>
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3
#> 0.33996 0.19860 0.46144
#>
#> Class prevalences by group :
#> Class 1 Class 2 Class 3
#> <= HS 0.51010 0.32143 0.16848
#> HIGH SCHOOL 0.34339 0.21275 0.44386
#> COLLEGE 0.29616 0.15036 0.55347
#> GRADUATE 0.20237 0.08580 0.71183
#>
#> Logistic regression coefficients :
#> Group : <= HS
#> Class 1/3 Class 2/3
#> (Intercept) 1.0634 -0.0811
#> SEXFEMALE 0.0834 1.1140
#>
#> Group : HIGH SCHOOL
#> Class 1/3 Class 2/3
#> (Intercept) -0.2979 -1.424
#> SEXFEMALE 0.0834 1.114
#>
#> Group : COLLEGE
#> Class 1/3 Class 2/3
#> (Intercept) -0.6686 -2.0173
#> SEXFEMALE 0.0834 1.1140
#>
#> Group : GRADUATE
#> Class 1/3 Class 2/3
#> (Intercept) -1.2872 -2.6492
#> SEXFEMALE 0.0834 1.1140
#>
#> Item-response probabilities (Y = 1) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.8342 0.9488 0.8086 0.0700 0.0440 0.1409
#> Class 2 0.0649 0.3825 0.0989 0.0000 0.0000 0.0000
#> Class 3 1.0000 1.0000 1.0000 0.9836 0.9309 0.9682
#>
#> Item-response probabilities (Y = 2) :
#> DEFECT HLTH RAPE POOR SINGLE NOMORE
#> Class 1 0.1658 0.0512 0.1914 0.9300 0.9560 0.8591
#> Class 2 0.9351 0.6175 0.9011 1.0000 1.0000 1.0000
#> Class 3 0.0000 0.0000 0.0000 0.0164 0.0691 0.0318
coef(mglcr)
#> Coefficients :
#>
#> Class 1 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SEXFEMALE 1.08693 0.08335 0.06815 1.223 0.222
#>
#> Class 2 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SEXFEMALE 3.04651 1.11400 0.09062 12.29 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
# \donttest{
##
## Example 2. NYTS dataset
##
data("nyts18")
# Multilevel LCA (MLCA)
mlca = glca(item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~ 1,
group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2, n.init = 1)
#> Manifest items :
#> ECIGT ECIGAR ESLT EELCIGT EHOOKAH
#> Grouping variable : SCH_ID
#>
#> Deleted observation(s) :
#> 0 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Nonparametric multilevel latent class analysis Fitting...
#>
#> .. 230 iteration
#>
#> Converged at 230 iteration (loglik :-1955.487)
summary(mlca)
#>
#> Call:
#> glca(formula = item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~
#> 1, group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2,
#> n.init = 1)
#>
#> Manifest items : ECIGT ECIGAR ESLT EELCIGT EHOOKAH
#> Grouping variable : SCH_ID
#>
#> Categories for manifest items :
#> Y = 1 Y = 2
#> ECIGT Yes No
#> ECIGAR Yes No
#> ESLT Yes No
#> EELCIGT Yes No
#> EHOOKAH Yes No
#>
#> Model : Nonparametric multilevel latent class analysis
#>
#> Number of latent classes : 3
#> Number of latent clusters : 2
#> Number of groups : 45
#> Number of observations : 1734
#> Number of parameters : 20
#>
#> log-likelihood : -1955.487
#> G-squared : 768.5035
#> AIC : 3950.973
#> BIC : 4060.137
#>
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3
#> 0.76960 0.05961 0.17079
#>
#> Marginal prevalences for latent clusters :
#> Cluster 1 Cluster 2
#> 0.6207 0.3793
#>
#> Class prevalences by cluster :
#> Class 1 Class 2 Class 3
#> Cluster 1 0.92994 0.00876 0.06130
#> Cluster 2 0.51176 0.14137 0.34687
#>
#> Item-response probabilities (Y = 1) :
#> ECIGT ECIGAR ESLT EELCIGT EHOOKAH
#> Class 1 0.0062 0.0043 0.0088 0.0413 0.0057
#> Class 2 0.9112 0.9750 0.5651 0.9778 0.5363
#> Class 3 0.3488 0.2006 0.1236 0.7783 0.0443
#>
#> Item-response probabilities (Y = 2) :
#> ECIGT ECIGAR ESLT EELCIGT EHOOKAH
#> Class 1 0.9938 0.9957 0.9912 0.9587 0.9943
#> Class 2 0.0888 0.0250 0.4349 0.0222 0.4637
#> Class 3 0.6512 0.7994 0.8764 0.2217 0.9557
#>
# MLCA with covariate(s) (MLCR)
# (SEX: level-1 covariate, SCH_LEV: level-2 covariate)
mlcr = glca(item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~ SEX + SCH_LEV,
group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2, n.init = 1)
#> Manifest items :
#> ECIGT ECIGAR ESLT EELCIGT EHOOKAH
#> Grouping variable : SCH_ID
#> Covariates (Level 2) :
#> SCH_LEV
#> Covariates (Level 1) :
#> SEX
#>
#> Deleted observation(s) :
#> 0 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#>
#> Nonparametric multilevel latent class analysis Fitting...
#>
#> .. 240 iteration
#>
#> Converged at 240 iteration (loglik :-1921.19)
coef(mlcr)
#>
#> Level 1 Coefficients :
#>
#> Class 1 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SEXFemale 1.6859 0.5223 0.2098 2.489 0.0129 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Class 2 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SEXFemale 0.8514 -0.1609 0.1951 -0.825 0.41
#>
#>
#> Level 2 Coefficients :
#>
#> Class 1 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SCH_LEVMiddle School 0.1318 -2.0264 1.5545 -1.304 0.193
#>
#> Class 2 / 3 :
#> Odds Ratio Coefficient Std. Error t value Pr(>|t|)
#> SCH_LEVMiddle School 19.8234 2.9869 0.4142 7.212 8.27e-13 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
# }