Title: | Compute Cluster Robust Standard Errors with Degrees of Freedom Adjustments |
---|---|
Description: | Estimate different types of cluster robust standard errors (CR0, CR1, CR2) with degrees of freedom adjustments. Standard errors are computed based on 'Liang and Zeger' (1986) <doi:10.1093/biomet/73.1.13> and Bell and 'McCaffrey' <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2002002/article/9058-eng.pdf?st=NxMjN1YZ>. Functions used in Huang and Li <doi:10.3758/s13428-021-01627-0>, Huang, 'Wiedermann', and 'Zhang' <doi:10.1080/00273171.2022.2077290>, and Huang, 'Zhang', and Li (forthcoming: Journal of Research on Educational Effectiveness). |
Authors: | Francis Huang [aut, cre] |
Maintainer: | Francis Huang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.1 |
Built: | 2025-01-31 04:09:08 UTC |
Source: | https://github.com/flh3/cr2 |
Function to compute the CR0, CR1, CR2 cluster robust standard errors (SE) with Bell and McCaffrey (2002) degrees of freedom (df) adjustments. Useful when dealing with datasets with a few clusters. Shows output using different CR types and degrees of freedom choices (for comparative purposes only). For linear and logistic regression models (as well as other GLMs). Computes the BRL-S2 variant.
clustSE(mod, clust = NULL, digits = 3, ztest = FALSE)
clustSE(mod, clust = NULL, digits = 3, ztest = FALSE)
mod |
The |
clust |
The cluster variable (with quotes). |
digits |
Number of decimal places to display. |
ztest |
If a normal approximation should be used as the naive degrees of freedom. If FALSE, the between-within degrees of freedom will be used. |
A data frame with the CR adjustments with p-values.
estimate |
The regression coefficient. |
se.unadj |
The model-based (regular, unadjusted) SE. |
CR0 |
Cluster robust SE based on Liang & Zeger (1986). |
CR1 |
Cluster robust SE (using an adjustment based on number of clusters). |
CR2 |
Cluster robust SE based on Bell and McCaffrey (2002). |
tCR2 |
t statistic based on CR2. |
dfn |
Degrees of freedom(naive): can be infinite (z) or between-within (default). User specified. |
dfBM |
Degrees of freedom based on Bell and McCaffrey (2002). |
pv.unadj |
p value based on model-based standard errors. |
CR0pv |
p value based on CR0 SE with dfBM. |
CR0pv.n |
p value based on CR0 SE with naive df. |
CR1pv |
p value based on CR1 SE with dfBM. |
CR1pv.n |
p value based on CR1 SE with naive df. |
CR2pv |
p value based on CR2 SE with dfBM. |
CR2pv.n |
p value based on CR2 SE with naive df. |
Bell, R., & McCaffrey, D. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28, 169-182. (link)
Liang, K.Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22. doi:10.1093/biomet/73.1.13
clustSE(lm(mpg ~ am + wt, data = mtcars), 'cyl') data(sch25) clustSE(lm(math ~ ses + minority + mses + mhmwk, data = sch25), 'schid')
clustSE(lm(mpg ~ am + wt, data = mtcars), 'cyl') data(sch25) clustSE(lm(math ~ ses + minority + mses + mhmwk, data = sch25), 'schid')
Synthetic dataset used in the manuscript in the Journal of Research on Educational Effectiveness.
data(crct)
data(crct)
A data frame with 4233 rows and 12 variables:
Unique school identifier (the grouping variable).
School type (elementary, middle, or high school).
Treatment indicator. 1 = intervention; 0 = control.
Office disciplinary referral outcome.
Office disciplinary referral (baseline).
School enrollment size (to the nearest hundred).
Student is female: 1 = yes.
Dummy code for school type; middle school.
Dummy code for school type; elementary school.
Dummy code for school type; high school.
Dummy code for student race/ethnicity; Black student.
Dummy code for student race/ethnicity; Hispanic student.
Function to extract V matrix.
getV(x)
getV(x)
x |
lme4 object |
V matrix (weight) for multilevel models
Helper function used to obtain supporting fit statistics for multilevel models. The R2s are computed using the performance
package.
## S3 method for class 'CR2' glance(x, ...)
## S3 method for class 'CR2' glance(x, ...)
x |
A |
... |
Unused, included for generic consistency only. |
glance
returns one row with the columns:
nobs |
the number of observations |
sigma |
the square root of the estimated residual variance |
logLik |
the data's log-likelihood under the model |
AIC |
Akaike Information Criterion |
BIC |
Bayesian Information Criterion |
r2.marginal |
marginal R2 based on fixed effects only using method of Nakagawa and Schielzeth (2013) |
r2.conditional |
conditional R2 based on fixed and random effects using method of Nakagawa and Schielzeth (2013) |
For investigating heteroskedasticity.
data(gpadat)
data(gpadat)
A data frame with 8,956 rows and 18 variables:
Grade point average. 1 = D ... 4 = A.
Gender. Female = 1.
Student race/ethnicity (factor).
Disability status (1 = yes/0 = no).
Free/reduced price lunch status.
Dummy coded race (White).
Dummy coded race (Asian).
Dummy coded race (Black).
Dummy coded race (Hispanic).
Dummy coded race (Other).
Group-aggregated Asian variable.
Group-aggregated Black variable.
Group-aggregated Hispanic variable.
Group-aggregated Other variable.
Group-aggregated female variable.
Group-aggregated disability variable.
Group-aggregated frpl variable.
School identifier (cluster variable).
From Imbens and Kolesar (2016).
MatSqrtInverse(A)
MatSqrtInverse(A)
A |
The matrix object. |
Returns a matrix.
Function to detect heteroscedasticity in two-level random intercept models.
Uses a generalization of the Breusch-Pagan-type (using squared residuals)
and Levene-type test (using the absolute value of residuals). Note: this will
not tell you if including random slopes are warranted (for that, use the
robust_mixed
) function and compare differences in model-based and
robust standard errors.
ncvMLM(mx, bp = TRUE)
ncvMLM(mx, bp = TRUE)
mx |
The |
bp |
Computes a Breusch-Pagan-type test ( |
A p-value (p < .05 suggests heteroskedasticity).
Huang, F., Wiedermann, W., & Zhang, B. (2022). Accounting for Heteroskedasticity Resulting from Between-group Differences in Multilevel Models. Multivariate Behavioral Research.
require(lme4) data(sch25) ncvMLM(lmer(math ~ byhomewk + male + ses + (1|schid), data = sch25)) #supported ncvMLM(lmer(math ~ byhomewk + male + ses + minority + (1|schid), data = sch25)) #hetero
require(lme4) data(sch25) ncvMLM(lmer(math ~ byhomewk + male + ses + (1|schid), data = sch25)) #supported ncvMLM(lmer(math ~ byhomewk + male + ses + minority + (1|schid), data = sch25)) #hetero
Function to compute the CR2/CR0 cluster robust standard errors (SE) with Bell and McCaffrey (2002) degrees of freedom (dof) adjustments. Suitable even with a low number of clusters. The model based (mb) and cluster robust standard errors are shown for comparison purposes.
robust_mixed(m1, digits = 3, type = "CR2", satt = TRUE, Gname = NULL)
robust_mixed(m1, digits = 3, type = "CR2", satt = TRUE, Gname = NULL)
m1 |
The |
digits |
Number of decimal places to display. |
type |
Type of cluster robust standard error to use ("CR2" or "CR0"). |
satt |
If Satterthwaite degrees of freedom are to be computed (if not, between-within df are used). |
Gname |
Group/cluster name if more than two levels of clustering (does not work with lme). |
A data frame (results
) with the cluster robust adjustments with p-values.
Estimate |
The regression coefficient. |
mb.se |
The model-based (regular, unadjusted) SE. |
cr.se |
The cluster robust standard error. |
df |
degrees of freedom: Satterthwaite or between-within. |
p.val |
p-value using CR0/CR2 standard error. |
stars |
stars showing statistical significance. |
Francis Huang, [email protected]
Bixi Zhang, [email protected]
Bell, R., & McCaffrey, D. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28, 169-182. (link)
Liang, K.Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13-22. (link)
require(lme4) data(sch25, package = 'CR2') robust_mixed(lmer(math ~ male + minority + mses + mhmwk + (1|schid), data = sch25))
require(lme4) data(sch25, package = 'CR2') robust_mixed(lmer(math ~ male + minority + mses + mhmwk + (1|schid), data = sch25))
Function to compute empirical degrees of freedom based on Bell and McCaffrey (2002).
satdf(m1, type = "none", Vinv2, Vm2, br2, Gname = NULL)
satdf(m1, type = "none", Vinv2, Vm2, br2, Gname = NULL)
m1 |
The |
type |
The type of cluster robust correction used (i.e., CR2 or none). |
Vinv2 |
Inverse of the variance matrix. |
Vm2 |
The variance matrix. |
br2 |
The bread component. |
Gname |
The group (clustering variable) name' |
Returns a vector of degrees of freedom.
Francis Huang, [email protected]
Bixi Zhang, [email protected]
For examining the association between amount homework done per week and math outcome.
data(sch25)
data(sch25)
A data frame with 546 rows and 8 variables:
The school identifier (the grouping variable)
Student-level socioeconomic status
Total amount of time the student spent on homework per week. 1 = None, 2 = Less than one hour, 3 = 1 hour, 4 = 2 hours, 5 = 3 hours, 6 = 4-6 hours, 7 = 7 - 9 hours, 8 = 10 or more
Mathematics score.
Dummy coded gender, 1 = male, 0 = female
Dummy coded minority status, 1 = yes, 0 = no
Aggregated socioeconomic status at the school level
Aggregated time spent on homework at the school level
https://nces.ed.gov/pubs92/92030.pdf
Tidy a CR2 object
## S3 method for class 'CR2' tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'CR2' tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
x |
A |
conf.int |
Logical indicating whether or not to include a confidence interval in the tidied output. Defaults to FALSE. |
conf.level |
The confidence level to use for the confidence interval if conf.int = TRUE. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
... |
Unused, included for generic consistency only. |
A tidy tibble::tibble()
summarizing component-level
information about the model