Title: | Practical Multilevel Modeling |
---|---|
Description: | Convenience functions and datasets to be used with Practical Multilevel Modeling using R. The package includes functions for calculating group means, group mean centered variables, and displaying some basic missing data information. A function for computing robust standard errors for linear mixed models based on Liang and Zeger (1986) <doi:10.1093/biomet/73.1.13> and Bell and 'McCaffrey' (2002) <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2002002/article/9058-eng.pdf?st=NxMjN1YZ> is included as well as a function for checking for level-one homoskedasticity (Raudenbush & Bryk, 2002, ISBN:076191904X). |
Authors: | Francis Huang [aut, cre] |
Maintainer: | Francis Huang <[email protected]> |
License: | GPL-2 |
Version: | 0.3.2 |
Built: | 2024-11-22 03:51:09 UTC |
Source: | https://github.com/flh3/mlmusingr |
Dataset of 60 observations from 3 clusters.
cdata.ex
cdata.ex
A wide data frame of 60 observations. Used for discussing within and between group effects.
The predictor.
The outcome of interest.
Example data used to investigate missing data (this is the complete dataset).
data(engage)
data(engage)
A data frame with 528 observations from 40 groups and 7 variables:
Student engagement.
Student motivation.
Student grade point average.
Student grade level (6-8; a factor).
School level rural variable indicator; 1 = yes/0 = no.
Percent of students eligible for free or reduced price meals at the school.
School indicator (clustering variable).
Example data used to investigate missing data (this has missing data).
data(engage.miss)
data(engage.miss)
A data frame with 528 observations from 40 groups and 7 variables:
Student engagement.
Student motivation.
Student grade point average.
Student grade level (6-8; a factor).
School level rural variable indicator; 1 = yes/0 = no.
Percent of students eligible for free or reduced price meals at the school.
School indicator (clustering variable).
Helper function used to obtain supporting fit statistics for multilevel models. The R2s are computed using the 'performance' package.
## S3 method for class 'CR2' glance(x, ...)
## S3 method for class 'CR2' glance(x, ...)
x |
A 'CR2' object. |
... |
Unused, included for generic consistency only. |
glance
returns one row with the columns:
nobs |
the number of observations |
sigma |
the square root of the estimated residual variance |
logLik |
the data's log-likelihood under the model |
AIC |
Akaike Information Criterion |
BIC |
Bayesian Information Criterion |
r2.marginal |
marginal R2 based on fixed effects only using method of Nakagawa and Schielzeth (2013) |
r2.conditional |
conditional R2 based on fixed and random effects using method of Nakagawa and Schielzeth (2013) |
Also referred to as centering within cluster (or within context) or demeaning the variable.
By default, uses na.rm = TRUE
when computing group means.
group_center(x, grp)
group_center(x, grp)
x |
Variable to center (e.g., |
grp |
Cluster/grouping variable (e.g., |
A vector of group-mean centered variables.
data(mtcars) #create a group centered variable mtcars$mpg.gpc <- group_center(mtcars$mpg, mtcars$cyl)
data(mtcars) #create a group centered variable mtcars$mpg.gpc <- group_center(mtcars$mpg, mtcars$cyl)
Computes the group means of a variable by a specified cluster/group. Can also be used with factors that have two levels.
group_mean(x, grp, lm = FALSE)
group_mean(x, grp, lm = FALSE)
x |
Variable to compute the mean for (e.g., |
grp |
Cluster/grouping variable (e.g., |
lm |
Compute reliability (lambda) adjusted means. |
Outputs a vector of group means.
data(mtcars) #create a group mean aggregated variable mtcars$mpg.barj <- group_mean(mtcars$mpg, mtcars$cyl)
data(mtcars) #create a group mean aggregated variable mtcars$mpg.barj <- group_mean(mtcars$mpg, mtcars$cyl)
This dataset has a three-level, hierarchical structure with patients nested within doctors within hospitals. The simulation code can be found at <https://stats.idre.ucla.edu/r/codefragments/mesimulation/#setup>.
data(hdp)
data(hdp)
A data frame with 8,525 rows and 17 variables:
Continuous in years but recorded at a higher degree of accuracy.
Binary, married/living with partner or single.
Binary (yes/no), does the patient have a family history (Hx) of cancer?
Categorical with three levels, current smoker, former smoker, never smoked.
Binary (female/male).
Categorical with four levels, stages 1-4.
Count number of days patients stayed in the hospital after surgery.
Continuous, white blood count. Roughly 3,000 is low, 10,000 is middle, and 30,000 per microliter is high.
Continuous, red blood count.
Body mass index given by the formula ().
Continuous, interleukin 6, a proinflammatory cytokine commonly examined as an indicator of inflammation, cannot be lower than zero.
Continuous, C-reactive protein, a protein in the blood also used as an indicator of inflammation. It is also impacted by BMI.
Hospital identifier.
Doctor identifier
Years as a doctor.
Whether the school doctor trained at was high quality or not.
Cancer in remission? 1 = yes, 0 = no.
https://stats.oarc.ucla.edu/r/codefragments/mesimulation/
Based on Raudenbush and Bryk (2002) and Hoffman (2007). A statistically significant Chisq indicates heteroskedasticity. Output shows the H statistic, degrees of freedom, and p value.
Htest(newdata, fml, group)
Htest(newdata, fml, group)
newdata |
data to be used. |
fml |
level 1 formula. |
group |
grouping variable (in quotes). |
Returns a data frame which contains:
H |
The computed H statistic. |
df |
The degrees of freedom. |
p |
The p-value (< .05 indicates heteroskedasticity is present). |
Hoffman, L. (2007). Multilevel models for examining individual differences in within-person variation and covariation over time. Multivariate Behavioral Research, 42(4), 609–629.
Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.
set.seed(123) x1 <- rnorm(400) y <- x1 * .3 + rnorm(400) gr <- rep(1:20, each = 20) dat <- data.frame(x1, y, gr) Htest(dat, y ~ x1, 'gr') #no violation y <- x1 * .3 + rnorm(400, 0, sqrt(x1^2)) #add violation dat <- data.frame(x1, y, gr) Htest(dat, y ~ x1, 'gr')
set.seed(123) x1 <- rnorm(400) y <- x1 * .3 + rnorm(400) gr <- rep(1:20, each = 20) dat <- data.frame(x1, y, gr) Htest(dat, y ~ x1, 'gr') #no violation y <- x1 * .3 + rnorm(400, 0, sqrt(x1^2)) #add violation dat <- data.frame(x1, y, gr) Htest(dat, y ~ x1, 'gr')
From Imbens and Kolesar (2016).
MatSqrtInverse(A)
MatSqrtInverse(A)
A |
The matrix object. |
Amount of missing data per variable
nmiss(dat)
nmiss(dat)
dat |
Data frame that you want to inspect. |
By default, this function will print the following items to the console
The percent of missing data per variable.
The percent of complete cases (range: 0 to 1).
Suggested number of datasets to impute when using multiple imputation.
data(mtcars) mtcars[c(2:3), 4] <- NA #create NAs nmiss(mtcars)
data(mtcars) mtcars[c(2:3), 4] <- NA #create NAs nmiss(mtcars)
Example data for mixPV.
data(pisa2012)
data(pisa2012)
A data frame with 3136 rows and 14 variables:
Plausible value #1 for mathematics
Plausible value #2 for mathematics
Plausible value #3 for mathematics
Plausible value #4 for mathematics
Plausible value #5 for mathematics
Index of economic, social, and cultural status.
School identifier
Maths interest- Look forward to lessons.
Student gender.
Final student weight (total).
School weight.
Shortage- Maths teachers
Student weight (conditional).
Random noise.
https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2014028
Example data for testing the need for a random intercept. Illustrates the need to adjust the p values for a modified LRT.
data(ri_test1)
data(ri_test1)
A data frame with 900 observations from 30 groups and 4 variables:
The outcome variable.
A level-2 predictor.
A level-1 predictor
The cluster identifier
Example data for testing the need for a random intercept. LRT results show that a random slope is not warranted.
data(ri_test2)
data(ri_test2)
A data frame with 3,000 observations from 30 groups and 4 variables:
The outcome variable.
A level-2 predictor.
A level-1 predictor
The cluster identifier
Function to compute the CR2/CR0 cluster robust standard errors (SE) with Bell and McCaffrey (2002) degrees of freedom (dof) adjustments. Suitable even with a low number of clusters. The model based (mb) and cluster robust standard errors are shown for comparison purposes.
robust_mixed(m1, digits = 3, type = "CR2", satt = TRUE, Gname = NULL)
robust_mixed(m1, digits = 3, type = "CR2", satt = TRUE, Gname = NULL)
m1 |
The |
digits |
Number of decimal places to display. |
type |
Type of cluster robust standard error to use ("CR2" or "CR0"). |
satt |
If Satterthwaite degrees of freedom are to be computed (if not, between-within df are used). |
Gname |
Group/cluster name if more than two levels of clustering (does not work with lme). |
A data frame (results
) with the cluster robust adjustments with p-values.
Estimate |
The regression coefficient. |
mb.se |
The model-based (regular, unadjusted) SE. |
cr.se |
The cluster robust standard error. |
df |
degrees of freedom: Satterthwaite or between-within. |
p.val |
p-value using CR0/CR2 standard error. |
stars |
stars showing statistical significance. |
Francis Huang, [email protected]
Bixi Zhang, [email protected]
Bell, R., & McCaffrey, D. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28, 169-182. (link)
Liang, K.Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13-22. (link)
require(lme4) data(sch29, package = 'MLMusingR') robust_mixed(lmer(math ~ male + minority + mses + mhmwk + (1|schid), data = sch29))
require(lme4) data(sch29, package = 'MLMusingR') robust_mixed(lmer(math ~ male + minority + mses + mhmwk + (1|schid), data = sch29))
Function to compute empirical degrees of freedom based on Bell and McCaffrey (2002).
satdf(m1, type = "none", Vinv2, Vm2, br2, Gname = NULL)
satdf(m1, type = "none", Vinv2, Vm2, br2, Gname = NULL)
m1 |
The |
type |
The type of cluster robust correction used (i.e., CR2 or none). |
Vinv2 |
Inverse of the variance matrix. |
Vm2 |
The variance matrix. |
br2 |
The bread component. |
Gname |
The group (clustering variable) name' |
Francis Huang, [email protected]
Bixi Zhang, [email protected]
For examining the association between amount homework done per week and math outcome.
data(sch29)
data(sch29)
A data frame with 648 rows and 8 variables:
The school identifier (the grouping variable)
Student-level socioeconomic status
Total amount of time the student spent on homework per week. 1 = None, 2 = Less than one hour, 3 = 1 hour, 4 = 2 hours, 5 = 3 hours, 6 = 4-6 hours, 7 = 7 - 9 hours, 8 = 10 or more
Mathematics score.
Dummy coded gender, 1 = male, 0 = female
Dummy coded minority status, 1 = yes, 0 = no
Aggregated socioeconomic status at the school level
Aggregated time spent on homework at the school level
https://nces.ed.gov/pubs92/92030.pdf
Data from 8465 students from 100 schools in Virginia
data(suspend)
data(suspend)
Dataset:
School identifier
Percent minority enrollment at school
1 = male, 0 = female
Whether the student was suspended (1 = yes) in the school year or not (0 = no). Self reported.
If the student got into one or more fights (1 = yes) in the school year
Students self-reported GPA; 1 = D to 4 = A
Example data to be used for centering
data(thai)
data(thai)
A data frame with 6606 rows and 18 variables:
First plausible value in mathematics.
Index of economic, social, and cultural status.
Highest parent occupational status.
Student gender. 1 = Female, 2 = Male.
Mathematics interest.
Mathematics self-efficacy.
School identifier
Spoke another language at home other than Thai. 1 = yes, 0 = no.
How many books at home.
Highest parental education in years.
Student weight.
Plausible value #1 for reading.
Plausible value #2 for reading.
Plausible value #3 for reading.
Plausible value #4 for reading.
Plausible value #5 for reading.
Private school. 1 = yes, 0 = no.
Total school enrolment.
https://gpseducation.oecd.org/CountryProfile?primaryCountry=THA
Example data to be used for centering
data(thai)
data(thai)
A data frame with 4271 rows and 7 variables:
First plausible value in mathematics.
Index of economic, social, and cultural status.
Mathematics interest.
School identifier
Spoke another language at home other than Thai. 1 = yes, 0 = no.
Private school. 1 = yes, 0 = no.
Total school enrolment.
https://gpseducation.oecd.org/CountryProfile?primaryCountry=THA
Tidy a CR2 object
## S3 method for class 'CR2' tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'CR2' tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
x |
A 'CR2' object. |
conf.int |
Logical indicating whether or not to include a confidence interval in the tidied output. Defaults to FALSE. |
conf.level |
The confidence level to use for the confidence interval if conf.int = TRUE. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
... |
Unused, included for generic consistency only. |
A tidy [tibble::tibble()] summarizing component-level information about the model
A dataset containing 30 observations with reading scores taken in the fall kindergarten, spring kindergarten, and spring first grade
wide
wide
A wide data frame of 30 observations:
Factor indicating student identification
treatment or control
1 = female, 0 = male
Reading scores in fall kindergarten
Reading scores in spring kindergarten
Reading scores in spring first grade