Function for fitting latent class models with multiple groups, which may or may not include latent class structure for group variable.

glca(
  formula,
  group = NULL,
  data = NULL,
  nclass = 3,
  ncluster = NULL,
  std.err = TRUE,
  measure.inv = TRUE,
  coeff.inv = TRUE,
  init.param = NULL,
  n.init = 10,
  decreasing = FALSE,
  testiter = 50,
  maxiter = 5000,
  eps = 1e-06,
  na.rm = FALSE,
  seed = NULL,
  verbose = TRUE
)

Arguments

formula

a formula for specifying manifest items and covariates using the "item" function.

group

an optional vector specifying a group of observations. Given group variable, group covariates can be incorporated.

data

a data frame containing the manifest item, covariates and group variable.

nclass

number of level-1 (individual-level) latent classes.

ncluster

number of level-2 (group-level) latent classes. When group and ncluster (>1) are given the multilevel latent class models will be fitted.

std.err

a logical value for whether calculating standard errors for estimates.

measure.inv

a logical value of the measurement invariance assumption across groups.

coeff.inv

a logical value of the coefficient invariance assumption across groups (random intercept model).

init.param

A set of model parameters to be used as the user-defined initial values for the EM algorithm. It should be list with the named parameters and have same structure of param of the glca output. In default, initial parameters are randomly generated.

n.init

number of randomly generated initial parameter sets to be used for avoiding the problem of local maxima.

decreasing

a logical value for whether reordering the parameters by descending order responding probability for first-category of first manifest item.

testiter

number of iterations in the EM algorithm for each initial parameter set. The initial parameter set that provides the largest log-likelihood will be selected for estimating the model.

maxiter

maximum number of iterations for the EM algorithm.

eps

a convergence tolerance value. When the largest absolute difference between former estimates and current estimates is less than eps, the algorithm will stop updating and consider the convergence to be reached.

na.rm

a logical value for deleting the lines that have at least one missing manifest item. If na.rm = FALSE, MAR procedure will be conducted.

seed

In default, the set of initial parameters is drawn randomly. As the same value for seed guarantees the same initial parameters to be drawn, this argument can be used for reproducibility of estimation results.

verbose

a logical value indicating whether glca should print the estimation procedure onto the screen.

Value

glca returns an object of class "glca".

The function summary prints estimates for parameters and glca.gof function gives goodness of fit measures for the model.

An object of class "glca" is a list containing the following components:

call

the matched call.

terms

the terms object used.

model

a list of model description.

var.names

a list of names of data.

datalist

a list of data used for fitting.

param

a list of parameter estimates.

std.err

a list of standard errors for estimates.

coefficient

a list of logistic regression coefficients for prevalence of level-1 class.

posterior

a data.frame or a list of posterior probablities of each individaul for latent classes and each group for latent clusters.

gof

a list of goodness of fit measures.

convergence

a list containing information about convergence.

Details

The glca is the function for implementing LCA consist of two-type latent categorical variables (i.e., level-1 and level-2 latent class). The level-1 (individual-level) latent class is identified by the association among the individuals' responses to multiple manifest items, but level-2 (group-level) latent class is categorized by the prevalence of level-1 latent class for group variable. The function glca can handle two types of covariates: level-1 and level-2 covariates. If covariates vary across individuals, they are considered as level-1 covariates. When group and ncluster (>1) are given, covariates which are varying across groups are considered as level-2 covariates. Both types of covariates have effect on level-1 class prevalence.

The formula should consist of an ~ operator between two sides. Manifest items should be indicated in LHS of formula using item function and covariates should be specified in RHS of formula. For example,
item(y1, y2, y3) ~ 1
item(y1, y2, y3) ~ x1 + x2
where the first fomula indicates LCA with three manifest variables (y1, y2, and y3) and no covariate, and the second formula includes two covariates (x1 and x2). Two types of covariates (i.e., level-1 and level-2 covariates) will be automatically detected by glca.

The estimated parameters in glca are rho, gamma, delta, and beta. The set of item response probabilities for each level-1 class is rho. The sets of prevalences for level-1 and level-2 class are gamma and delta, respectively. The prevalence for level-1 class (i.e., gamma) can be modeled as logistic regression using level-1 and/or level-2 covariates. The set of logistic regression coefficients is beta in glca output.

References

Vermunt, J.K. (2003) Multilevel latent class models. Sociological Methodology, 33, 213–239. doi:10.1111/j.0081-1750.2003.t01-1-00131.x

Collins, L.M. and Lanza, S.T. (2009) Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences. John Wiley & Sons Inc.

See also

Examples

##
## Example 1. GSS dataset
##
data("gss08")
# LCA
lca = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 1,
            data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#>  DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> 
#> Deleted observation(s) : 
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Latent class analysis Fitting...
#> 
#> . 123 iteration 
#> 
#> Converged at 123 iteration (loglik :-687.4486)
summary(lca)
#> 
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 
#>     1, data = gss08, nclass = 3, n.init = 1)
#> 
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> 
#> Categories for manifest items :
#>        Y = 1 Y = 2
#> DEFECT   YES    NO
#> HLTH     YES    NO
#> RAPE     YES    NO
#> POOR     YES    NO
#> SINGLE   YES    NO
#> NOMORE   YES    NO
#> 
#> Model : Latent class analysis 
#> 
#> Number of latent classes : 3 
#> Number of observations : 352 
#> Number of parameters : 20 
#> 
#> log-likelihood : -687.4486 
#>      G-squared : 29.82695 
#>            AIC : 1414.897 
#>            BIC : 1492.17 
#> 
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3 
#> 0.34467 0.19138 0.46396 
#> 
#> Item-response probabilities (Y = 1) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.8275 0.9453 0.7960 0.0638 0.0390 0.1344
#> Class 2 0.0466 0.3684 0.0949 0.0000 0.0000 0.0000
#> Class 3 1.0000 1.0000 1.0000 0.9813 0.9284 0.9657
#> 
#> Item-response probabilities (Y = 2) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.1725 0.0547 0.2040 0.9362 0.9610 0.8656
#> Class 2 0.9534 0.6316 0.9051 1.0000 1.0000 1.0000
#> Class 3 0.0000 0.0000 0.0000 0.0187 0.0716 0.0343

# LCA with covariate(s)
lcr = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ AGE,
           data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#>  DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Covariates (Level 1) : 
#>  AGE 
#> 
#> Deleted observation(s) : 
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Latent class analysis Fitting...
#> 
#> . 117 iteration 
#> 
#> Converged at 117 iteration (loglik :-686.4118)
summary(lcr)
#> 
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 
#>     AGE, data = gss08, nclass = 3, n.init = 1)
#> 
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Covariates (Level 1) : AGE 
#> 
#> Categories for manifest items :
#>        Y = 1 Y = 2
#> DEFECT   YES    NO
#> HLTH     YES    NO
#> RAPE     YES    NO
#> POOR     YES    NO
#> SINGLE   YES    NO
#> NOMORE   YES    NO
#> 
#> Model : Latent class analysis 
#> 
#> Number of latent classes : 3 
#> Number of observations : 352 
#> Number of parameters : 22 
#> 
#> log-likelihood : -686.4118 
#>      G-squared : 589.4148 
#>            AIC : 1416.824 
#>            BIC : 1501.824 
#> 
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3 
#> 0.18809 0.46397 0.34794 
#> 
#> Logistic regression coefficients :
#>             Class 1/3 Class 2/3
#> (Intercept)    0.1210    0.6065
#> AGE           -0.0156   -0.0066
#> Item-response probabilities (Y = 1) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.0239 0.3719 0.1016 0.0000 0.0000 0.0000
#> Class 2 1.0000 1.0000 1.0000 0.9813 0.9283 0.9656
#> Class 3 0.8338 0.9399 0.7852 0.0628 0.0386 0.1330
#> 
#> Item-response probabilities (Y = 2) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.9761 0.6281 0.8984 1.0000 1.0000 1.0000
#> Class 2 0.0000 0.0000 0.0000 0.0187 0.0717 0.0344
#> Class 3 0.1662 0.0601 0.2148 0.9372 0.9614 0.8670
coef(lcr)
#> Class 1 / 3 :
#>             Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)
#> (Intercept)    1.12860     0.12098     0.51570    0.235     0.815
#> AGE            0.98453    -0.01559     0.01077   -1.447     0.149
#> 
#> Class 2 / 3 :
#>             Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)  
#> (Intercept)   1.834068    0.606537    0.364407    1.664     0.097 .
#> AGE           0.993465   -0.006557    0.007060   -0.929     0.354  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 


# Multiple-group LCA (MGLCA)
mglca = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 1,
             group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#>  DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Grouping variable : DEGREE 
#> 
#> Deleted observation(s) : 
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Multiple-group latent class analysis Fitting...
#> 
#>  91 iteration 
#> 
#> Converged at 91 iteration (loglik :-672.4138)
summary(mglca)
#> 
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 
#>     1, group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> 
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Grouping variable : DEGREE 
#> 
#> Categories for manifest items :
#>        Y = 1 Y = 2
#> DEFECT   YES    NO
#> HLTH     YES    NO
#> RAPE     YES    NO
#> POOR     YES    NO
#> SINGLE   YES    NO
#> NOMORE   YES    NO
#> 
#> Model : Multiple-group latent class analysis 
#> 
#> Number of latent classes : 3 
#> Number of groups : 4 
#> Number of observations : 352 
#> Number of parameters : 26 
#> 
#> log-likelihood : -672.4138 
#>      G-squared : 87.85135 
#>            AIC : 1396.828 
#>            BIC : 1497.282 
#> 
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3 
#> 0.34468 0.46148 0.19384 
#> 
#> Class prevalences by group :
#>             Class 1 Class 2 Class 3
#> <= HS       0.51954 0.16850 0.31196
#> HIGH SCHOOL 0.34862 0.44393 0.20745
#> COLLEGE     0.29846 0.55348 0.14806
#> GRADUATE    0.20439 0.71186 0.08375
#> 
#> Item-response probabilities (Y = 1) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.8318 0.9470 0.7999 0.0690 0.0433 0.1389
#> Class 2 1.0000 1.0000 1.0000 0.9835 0.9309 0.9681
#> Class 3 0.0507 0.3724 0.0995 0.0000 0.0000 0.0000
#> 
#> Item-response probabilities (Y = 2) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.1682 0.0530 0.2001 0.9310 0.9567 0.8611
#> Class 2 0.0000 0.0000 0.0000 0.0165 0.0691 0.0319
#> Class 3 0.9493 0.6276 0.9005 1.0000 1.0000 1.0000

# Multiple-group LCA with covariate(s) (MGLCR)
mglcr = glca(item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ SEX,
             group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> Manifest items :
#>  DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Grouping variable : DEGREE 
#> Covariates (Level 1) : 
#>  SEX 
#> 
#> Deleted observation(s) : 
#> 3 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Multiple-group latent class analysis Fitting...
#> 
#>  98 iteration 
#> 
#> Converged at 98 iteration (loglik :-666.7097)
summary(mglcr)
#> 
#> Call:
#> glca(formula = item(DEFECT, HLTH, RAPE, POOR, SINGLE, NOMORE) ~ 
#>     SEX, group = DEGREE, data = gss08, nclass = 3, n.init = 1)
#> 
#> Manifest items : DEFECT HLTH RAPE POOR SINGLE NOMORE 
#> Grouping variable : DEGREE 
#> Covariates (Level 1) : SEX 
#> 
#> Categories for manifest items :
#>        Y = 1 Y = 2
#> DEFECT   YES    NO
#> HLTH     YES    NO
#> RAPE     YES    NO
#> POOR     YES    NO
#> SINGLE   YES    NO
#> NOMORE   YES    NO
#> 
#> Model : Multiple-group latent class analysis 
#> 
#> Number of latent classes : 3 
#> Number of groups : 4 
#> Number of observations : 352 
#> Number of parameters : 28 
#> 
#> log-likelihood : -666.7097 
#>      G-squared : 149.9656 
#>            AIC : 1389.419 
#>            BIC : 1497.601 
#> 
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3 
#> 0.33996 0.19860 0.46144 
#> 
#> Class prevalences by group :
#>             Class 1 Class 2 Class 3
#> <= HS       0.51010 0.32143 0.16848
#> HIGH SCHOOL 0.34339 0.21275 0.44386
#> COLLEGE     0.29616 0.15036 0.55347
#> GRADUATE    0.20237 0.08580 0.71183
#> 
#> Logistic regression coefficients :
#> Group : <= HS 
#>             Class 1/3 Class 2/3
#> (Intercept)    1.0634   -0.0811
#> SEXFEMALE      0.0834    1.1140
#> 
#> Group : HIGH SCHOOL 
#>             Class 1/3 Class 2/3
#> (Intercept)   -0.2979    -1.424
#> SEXFEMALE      0.0834     1.114
#> 
#> Group : COLLEGE 
#>             Class 1/3 Class 2/3
#> (Intercept)   -0.6686   -2.0173
#> SEXFEMALE      0.0834    1.1140
#> 
#> Group : GRADUATE 
#>             Class 1/3 Class 2/3
#> (Intercept)   -1.2872   -2.6492
#> SEXFEMALE      0.0834    1.1140
#> 
#> Item-response probabilities (Y = 1) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.8342 0.9488 0.8086 0.0700 0.0440 0.1409
#> Class 2 0.0649 0.3825 0.0989 0.0000 0.0000 0.0000
#> Class 3 1.0000 1.0000 1.0000 0.9836 0.9309 0.9682
#> 
#> Item-response probabilities (Y = 2) :
#>         DEFECT   HLTH   RAPE   POOR SINGLE NOMORE
#> Class 1 0.1658 0.0512 0.1914 0.9300 0.9560 0.8591
#> Class 2 0.9351 0.6175 0.9011 1.0000 1.0000 1.0000
#> Class 3 0.0000 0.0000 0.0000 0.0164 0.0691 0.0318
coef(mglcr)
#> Coefficients :
#> 
#> Class 1 / 3 :
#>           Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)
#> SEXFEMALE    1.08693     0.08335     0.06815    1.223     0.222
#> 
#> Class 2 / 3 :
#>           Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)    
#> SEXFEMALE    3.04651     1.11400     0.09062    12.29    <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

# \donttest{
##
## Example 2. NYTS dataset
##
data("nyts18")
# Multilevel LCA (MLCA)
mlca = glca(item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~ 1,
            group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2, n.init = 1)
#> Manifest items :
#>  ECIGT ECIGAR ESLT EELCIGT EHOOKAH 
#> Grouping variable : SCH_ID 
#> 
#> Deleted observation(s) : 
#> 0 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Nonparametric multilevel latent class analysis Fitting...
#> 
#> .. 230 iteration 
#> 
#> Converged at 230 iteration (loglik :-1955.487)
summary(mlca)
#> 
#> Call:
#> glca(formula = item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~ 
#>     1, group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2, 
#>     n.init = 1)
#> 
#> Manifest items : ECIGT ECIGAR ESLT EELCIGT EHOOKAH 
#> Grouping variable : SCH_ID 
#> 
#> Categories for manifest items :
#>         Y = 1 Y = 2
#> ECIGT     Yes    No
#> ECIGAR    Yes    No
#> ESLT      Yes    No
#> EELCIGT   Yes    No
#> EHOOKAH   Yes    No
#> 
#> Model : Nonparametric multilevel latent class analysis 
#> 
#> Number of latent classes : 3 
#> Number of latent clusters : 2 
#> Number of groups : 45 
#> Number of observations : 1734 
#> Number of parameters : 20 
#> 
#> log-likelihood : -1955.487 
#>      G-squared : 768.5035 
#>            AIC : 3950.973 
#>            BIC : 4060.137 
#> 
#> Marginal prevalences for latent classes :
#> Class 1 Class 2 Class 3 
#> 0.76960 0.05961 0.17079 
#> 
#> Marginal prevalences for latent clusters :
#> Cluster 1 Cluster 2 
#>    0.6207    0.3793 
#> 
#> Class prevalences by cluster :
#>           Class 1 Class 2 Class 3
#> Cluster 1 0.92994 0.00876 0.06130
#> Cluster 2 0.51176 0.14137 0.34687
#> 
#> Item-response probabilities (Y = 1) :
#>          ECIGT ECIGAR   ESLT EELCIGT EHOOKAH
#> Class 1 0.0062 0.0043 0.0088  0.0413  0.0057
#> Class 2 0.9112 0.9750 0.5651  0.9778  0.5363
#> Class 3 0.3488 0.2006 0.1236  0.7783  0.0443
#> 
#> Item-response probabilities (Y = 2) :
#>          ECIGT ECIGAR   ESLT EELCIGT EHOOKAH
#> Class 1 0.9938 0.9957 0.9912  0.9587  0.9943
#> Class 2 0.0888 0.0250 0.4349  0.0222  0.4637
#> Class 3 0.6512 0.7994 0.8764  0.2217  0.9557
#> 

# MLCA with covariate(s) (MLCR)
# (SEX: level-1 covariate, SCH_LEV: level-2 covariate)
mlcr = glca(item(ECIGT, ECIGAR, ESLT, EELCIGT, EHOOKAH) ~ SEX + SCH_LEV,
            group = SCH_ID, data = nyts18, nclass = 3, ncluster = 2, n.init = 1)
#> Manifest items :
#>  ECIGT ECIGAR ESLT EELCIGT EHOOKAH 
#> Grouping variable : SCH_ID 
#> Covariates (Level 2) : 
#>  SCH_LEV 
#> Covariates (Level 1) : 
#>  SEX 
#> 
#> Deleted observation(s) : 
#> 0 observation(s) for missing all manifest items
#> 0 observation(s) for missing at least 1 covariates
#> 
#> Nonparametric multilevel latent class analysis Fitting...
#> 
#> .. 240 iteration 
#> 
#> Converged at 240 iteration (loglik :-1921.19)
coef(mlcr)
#> 
#> Level 1 Coefficients :
#> 
#> Class 1 / 3 :
#>           Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)  
#> SEXFemale     1.6859      0.5223      0.2098    2.489    0.0129 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Class 2 / 3 :
#>           Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)
#> SEXFemale     0.8514     -0.1609      0.1951   -0.825      0.41
#> 
#> 
#> Level 2 Coefficients :
#> 
#> Class 1 / 3 :
#>                      Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)
#> SCH_LEVMiddle School     0.1318     -2.0264      1.5545   -1.304     0.193
#> 
#> Class 2 / 3 :
#>                      Odds Ratio Coefficient  Std. Error  t value  Pr(>|t|)    
#> SCH_LEVMiddle School    19.8234      2.9869      0.4142    7.212  8.27e-13 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
# }