### Introduction

### Methods

### Three-level linear mixed effect model

*i*be the index of the level-3 unit with size

*L*(

*i*=1,···,

*L*),

*j*be the index of the level-2 unit with size

*M*

*(*

_{i}*j*=1,…,

*M*

*), and*

_{i}*k*be the index of the level-1 unit with size

*N*

*(*

_{ij}*k*=1,…,

*N*

*). We consider a three-level model, where a level-1 unit is nested within a level-2 unit that is also nested within a level-3 unit. Let*

_{ij}*y*

*be the response variable for level-1 unit*

_{ijk}*k*in the level-2 unit

*j*and level-3 unit

*i*,

*x*

*be the associated covariate variable observed at level-1,*

_{ijk}*x*

*be the associated covariate variable at level-2, and*

_{ij}*x*

*be the associated covariate at level-3. Then a general three-level linear mixed effect model is:*

_{i}##### (1)

*iid*) and capture within-cluster residual variations,

*v**=(*

_{ij}*v*

*,*

_{ij}*v*′

*) are*

_{ij}*iid*random effects at level-2 and captures between-cluster intercept differences at level-2,

*u**=(*

_{i}*u*

*,*

_{i}*u*′

*) are*

_{i}*iid*random effects at level-3 and captures between-cluster intercept differences at level-3.

*α*

_{11}=

*v*′

*=*

_{ij}*γ*

_{11}=

*u*′

*=0. Therefore, the model in Eq. (1) can be rewritten as*

_{i}*β*

_{0}(=

*γ*

_{00}) is the intercept, and

*β*

_{1}(=

*γ*

_{01}),

*β*

_{2}(=

*α*

_{01}), and

*β*

_{3}(=

*γ*

_{10}) are the slope coefficients for the fixed effects at level-3, level-2, and level-1, respectively. Considering random intercepts only in the model, we assume that level-3 and level-2 residuals,

*u*

*and*

_{i}*v*

*, respectively, are multivariate normal with zero means and variances of*

_{ij}**=(**

*y**y*

_{111},

*y*

_{112},…,

*y*

_{L},

*M*

_{L},

*N*

_{LM}) is a block-diagonal covariance matrix given by

*i*th block-diagonal matrix of

*I*_{(ni}is an

*n*

*×*

_{i}*n*

*identity matrix,*

_{i}

*J*_{(}

_{n}_{i)}is an

*n*

*×*

_{i}*n*

*matrix of 1s, and and*

_{i}*n*

*=*

_{i}*M*

*×*

_{i}*N*

_{iM}_{i}is a size of a block-diagonal matrix associates with

*i*th level-3 cluster. The off-diagonal elements are zero.

### Analysis of incomplete multilevel data using single imputation

### Childhood to Adolescence Transition Study

### Simulation data

*β*

_{0}=0.5,

*β*

_{1}=

*β*

_{2}=

*β*

_{3}=0.3, the random effects at level-2 and level-3 of

*σ*

*=1.4 and*

_{u}*σ*

*=1, respectively, and*

_{v}*σ*

*=1.4 for*

_{e}*i*=1,…,

*L*,

*j*=0,1,…,M and

*k*=1,…,

*N*

*. We considered various cluster sizes at each level (Table 1). The number of level-3 clusters varied as*

_{ij}*L*=30,50, and 100. For each level-3 cluster, we assumed

*M*=10,30 clusters. The number of level-1 clusters within each level-2 cluster was randomly determined as

*N*

*~*

_{ij}*Unif*[15,40]. For all cases, the covariates were generated as

*x*

*~*

_{i}*N*(0,1),

*x*

*~exp(1)+1, and*

_{ij}*x*

*~*

_{ijk}*N*(1,1.2

^{2}). Intraclass correlations were set to 0.398 and 0.602 for level-3 and level-2, respectively. For each case, we generated 500 replicates.

*M*level-2 clusters and its related variables such as

*x*

*. The missing data rate in level-2 clusters was set to 10, 25, 50, and 75% of*

_{ij}*M*level-2 clusters.

### Results

### Childhood to Adolescence Transition Study

### Simulation data

*u*

*and*

_{i}*v*

*. M2 considers level-1 and level-3 clusters by assuming that level-1 units are directly connected to level-3 clusters. Therefore, M2 does not include the random effect term of*

_{ij}*v*

*. In summary, M3 only estimates*

_{ij}_{2}of the level-2 associated covariate with missing clusters since M2 ignored the level-2 clusters and partially incorporated the date structure of level-1 and level-3 clusters. Since the M1 did not consider the hierarchical data structure, we expected that it poorly estimated

*β*

_{1}and β

_{2}.

_{2}of the level-2 associated covariate with missing clusters. Figs. 1 and 2 show the MSEs and the coverage probabilities for β

_{2}, respectively, and compared those values by the three models. The MSEs by the M3 ranged from 0.00039 to 0.00557, while those by the M1 and M2 were 0.0011–0.0639 and 0.0004–0.037, respectively (Fig. 1). The coverage probabilities for β

_{2}by the M3 were approximately 0.814–0.964 (Fig. 2). The M1 and M2 models did poorly estimate β

_{2}and their coverage probabilities low as 0.27–0.406 and 0.392–0.498, respectively (Fig. 2). The MSEs for β

_{2}seemed to increase with the missing rate (Fig. 1); however, the coverage probabilities did not change as the missing rate increased.

*β*

_{0}, β

_{1,}and β

_{3}similar to Fig. 1. The differences in MSEs among the three models were small, but M3 performed the best overall. Moreover, MSEs tended to increase with missing rates. Similar to Fig. 2, the coverage probabilities for

*β*

_{0}and β

_{1}were around 0.95 from the M3, while we observed small coverage probabilities from M1 (e.g., 0.06–0.302 for β

_{1}) and comparable coverage probabilities from M2 (e.g., 0.926–0.96 for β

_{1}). The three models yielded similar coverage probabilities for

*β*

_{3}.

*σ*

*,*

_{u}*σ*

*and*

_{v}*σ*

*. However, M2 does not estimate the variance*

_{e}*σ*

*and*

_{u}*σ*

*. We expected that the M1 and M2 models could overestimate variance components by failing to explain the 3-level data structure.*

_{v}*σ*

*from M3 worsened as the missing rate increased for each case of L and M values. At the same missing rate, the larger value of M, the better the coverage. However, more level-3 clusters did not improve MSE and coverage probability. This is because each level-3 cluster contains unobserved level-2 clusters, and hence the number of unobserved level-2 clusters increases as L increases.*

_{v}*σ*

*and*

_{u}*σ*

*from M1 were unbiased with small MSEs and their coverage probabilities were close to the nominal level of 95% (0.918–0.96 for*

_{e}*σ*

*; 0.936–0.974 for*

_{u}*σ*

*). However, the coverage probabilities of M1 and M2 models were very low (e.g., 0–0.344 for*

_{e}*σ*

*and 0–0.006 for*

_{u}*σ*

*from the 2-level model), and*

_{e}*σ*

*and*

_{u}*σe*were overestimated by M1 and M2 models (see Fig. 5 for the average estimates for

*σe*).