Interfaces for estimating standard ANOVAs with any number or
combination of within-subjects or between-subjects variables (the
ANOVA functions are aov_car(), aov_ez(), and aov_4() which all
fit the same model but differ in the way to specify the ANOVA
model).
Function mixed() provides an interface for mixed models analysis
(estimated via lme4lmer or glmer) that automatically obtains
p-values for fixed effects model terms (i.e., main effects and
interactions).
afex_plot() visualizes results from factorial experiments
combining estimated marginal means and uncertainties associated with
the estimated means in the foreground with a depiction of the raw
data in the background.
All afex model objects (i.e., ANOVA and mixed models) can be
passed to emmeans for follow-up/post-hoc/planned contrast
analysis.
afex is available from CRAN so the current stable version can be
installed directly via: install.packages("afex")
To install the latest development version you will need the
devtools package:
devtools::install_github("singmann/afex@master")
ANOVA functionality
To calculate an ANOVA, afex requires the data to be in the long format
(i.e., one row per data point/observation). An ANOVA can then be
calculated via one of three functions that only differ in how the model
components are specified, but not in the output. Note that in contrast
to base lm or aov, afex ANOVA functions always require the
specification of a subject identifier column (the id-column), because in
case there are multiple observations per participant and cell of the
design, these multiple observations are aggregated (i.e., averaged) per
default.
In aov_ez the columns containing id variable, dependent variable,
and factors need to be specified as character vectors.
aov_car behaves similar to standard aov and requires the ANOVA to
be specified as a formula containing an Error term (at least to
identify the id variable).
aov_4 allows the ANOVA to be specified via a formula similar to
lme4::lmer (with one random effects term).
The following code provides a simple example for an ANOVA with both
between- and within-subject factors. For this we use the
lexical-decision and word naming latencies reported by Freeman,
Heathcote, Chalmers, and Hockley (2010), see also ?fhch2010. As is
commonly done, we use the natural logarithm of the response times,
log_rt, as dependent variable. As independent variable we will
consider the between-subjects factor task ("naming" or "lexdec")
as well as the within-subjects-factors stimulus ("word" or
"nonword") and length (with 3 levels, 3, 4, or 5 letters).
# estimate mixed ANOVA on the full design:
aov_ez("id", "log_rt", fhch, between = "task", within = c("stimulus", "length"))
aov_car(log_rt ~ task * stimulus * length + Error(id/(stimulus * length)),
data = fhch)
## equivalent: aov_car(log_rt ~ task + Error(id/(stimulus * length)), data = fhch)
aov_4(log_rt ~ task * stimulus * length + (stimulus * length|id), data = fhch)
## equivalent: aov_4(log_rt ~ task + (stimulus * length|id), data = fhch)
# the three calls return the same ANOVA table:
#> Warning: More than one observation per design cell, aggregating data using `fun_aggregate = mean`.
#> To turn off this warning, pass `fun_aggregate = mean` explicitly.
#> Contrasts set to contr.sum for the following variables: task
#> Anova Table (Type 3 tests)
#>
#> Response: log_rt
#> Effect df MSE F ges p.value
#> 1 task 1, 43 0.23 13.38 *** .221 <.001
#> 2 stimulus 1, 43 0.01 173.25 *** .173 <.001
#> 3 task:stimulus 1, 43 0.01 87.56 *** .096 <.001
#> 4 length 1.83, 78.64 0.00 18.55 *** .008 <.001
#> 5 task:length 1.83, 78.64 0.00 1.02 <.001 .358
#> 6 stimulus:length 1.70, 72.97 0.00 1.91 <.001 .162
#> 7 task:stimulus:length 1.70, 72.97 0.00 1.21 <.001 .298
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
#>
#> Sphericity correction method: GG
Plotting with afex_plot
ANOVA models can be used for plotting via afex_plot:
a <- aov_ez("id", "log_rt", fhch, between = "task", within = c("stimulus", "length"))
#> Warning: More than one observation per design cell, aggregating data using `fun_aggregate = mean`.
#> To turn off this warning, pass `fun_aggregate = mean` explicitly.
afex_plot(a, "task", "stimulus", "length")
#> Warning: Panel(s) show a mixed within-between-design.
#> Error bars do not allow comparisons across all means.
#> Suppress error bars with: error = "none"
afex_plot returns a ggplot2 plot object which allows simple
customization:
library("ggplot2")
afex_plot(a, "task", "stimulus", "length") +
theme_bw()
#> Warning: Panel(s) show a mixed within-between-design.
#> Error bars do not allow comparisons across all means.
#> Suppress error bars with: error = "none"
Follow-up Tests with emmeans
Follow-up tests with emmeans need to be specified in two steps.
Decide which factors of model should be involved in tests. Use these
factors to set-up reference grid of marginal means using
emmeans().
Specify set of tests on reference grid from step 1. Either custom
contrasts as a list and using contrast() or a convenience
function such as pairs().
suppressPackageStartupMessages(library("emmeans"))
## set up reference grid using only length
em1 <- emmeans(a, "length")
em1
#> length emmean SE df lower.CL upper.CL
#> X4 -0.1087 0.0299 43 -0.169 -0.04834
#> X5 -0.0929 0.0296 43 -0.153 -0.03310
#> X6 -0.0653 0.0290 43 -0.124 -0.00679
#>
#> Results are averaged over the levels of: task, stimulus
#> Confidence level used: 0.95
## test all pairwise comparisons on reference grid:
pairs(em1)
#> contrast estimate SE df t.ratio p.value
#> X4 - X5 -0.0159 0.00768 43 -2.065 0.1092
#> X4 - X6 -0.0434 0.00782 43 -5.555 <.0001
#> X5 - X6 -0.0276 0.00602 43 -4.583 0.0001
#>
#> Results are averaged over the levels of: task, stimulus
#> P value adjustment: tukey method for comparing a family of 3 estimates
## only test specified tests
con <- list(
"4vs5" = c(-1, 1, 0),
"5vs6" = c(0, -1, 1)
)
contrast(em1, con, adjust = "holm")
#> contrast estimate SE df t.ratio p.value
#> 4vs5 0.0159 0.00768 43 2.065 0.0449
#> 5vs6 0.0276 0.00602 43 4.583 0.0001
#>
#> Results are averaged over the levels of: task, stimulus
#> P value adjustment: holm method for 2 tests
Mixed Models
Function mixed() fits a mixed model with lme4::lmer (or
lme4::glmer if a family argument is passed) and then calculates
p-values for fixed effects model terms using a variety of methods. The
formula to mixed needs to be the same as in a call to lme4::lmer.
The default method for calculation of p-values is 'S'
(Satterthwaite) which only works for linear mixed models (i.e., no
family argument). A similar method that provides a somewhat better
control of Type I errors for small data sets is 'KR' (Kenward-Roger),
but it can require considerable RAM and time. Other methods are ,
similar to 'KR' but requires less RAM), 'PB' (parametric bootstrap),
and 'LRT' (likelihood-ratio test).
More examples are provided in the
vignette,
here we use the same example data as above, the lexical decision and
word naming latencies collected by Freeman et al. (2010). To avoid long
computation times we only consider the two factors task and length
(omitting stimulus is probably not a fully sensible model). Because
mixed models easily allow it, we will consider crossed-random effects
for participants (id) and items (tem).
For the random-effects grouping factors we begin with the maximal random
effect structure justified by the design (see Barr, Levy, Scheepers, &
Tily, 2013). In this case this is by-subject random intercepts and
by-subjects random slopes for stimulus and by-item random intercepts
and by-item random slopes for task.
m1 <- mixed(log_rt ~ task * length + (length | id) + (task | item),
fhch)
#> Contrasts set to contr.sum for the following variables: task, length, id, item
#> boundary (singular) fit: see help('isSingular')
Fitting this model produces a critical convergence warning, that the fit
is singular. This warning usually indicates that the data does not
provide enough information for the request random effect parameters. In
a real analysis it would therefore be a good idea to iteratively reduce
the random effect structure until the warning disappears. A good first
step would be to remove the correlations among random effect terms as
shown below.
This warning is also shown if we simply print the model object, but not
if we call the nice() method.
If we call the anova() method a slightly different output is shown in
which the p-values are not rounded in the same way and the warning is
shown again.
We can also get the default lme4 output if we call the summary
method. However, note that in contrast to the previous methods, results
are shown for factor-levels and not model-terms which is usually not
interpretable for factors with more than two levels. This is the case
for length here. The problem is that factors with k levels are
mapped to k−1 parameters and at the same time the intercept represent
the (unweighted) grand mean. This means that factor-levels cannot be
mapped in a 1-to-1 manner to the parameters and thus cannot be uniquely
interpreted.
Because of the singular fit warning, we reduce the random effect
structure. Usually a good starting point is removing the correlations
among the random effects parameters. This can be done in afex::mixed
even for factors by combining the double bar notation || with
expand_re = TRUE. We do so for both random effects terms.
m2 <- mixed(log_rt ~ task * length + (length || id) + (task || item),
fhch, expand_re = TRUE)
#> Contrasts set to contr.sum for the following variables: task, length, id, item
#> boundary (singular) fit: see help('isSingular')
However, the singular fit warning remains. We therefore inspect the
random effect estimates to see which random effect parameter is
estimated to be near to zero.
summary(m2)$varcor
#> Groups Name Std.Dev.
#> item re2.task1 0.101196
#> item.1 (Intercept) 0.106845
#> id re1.length2 0.000000
#> id.1 re1.length1 0.012291
#> id.2 (Intercept) 0.193395
#> Residual 0.304374
As shown above, one parameter of the by-participant random slope for
length is estimated to be almost zero, re1.length2. We therefore
remove the by-participant random slope for length in the next model
which does not show any convergence warnings.
Objects returned by mixed can be used for plotting with afex_plot.
However, two things need to be considered.
The id argument of afex_plot allows specifying over which random
effects grouping factors the data plotted in the background should be
averaged over. Per default this uses all random effects grouping
factors. In the present case this would mean that all data points are
shown resulting in a very busy plot. When choosing only one of the
random effects grouping factor, data points in the background show
average response for each level of that factor. For example, when
setting id = "id" here each data point in the background shows the
mean log_rt of one participant (i.e., level of id).
Estimated marginal means in the foreground are estimated via emmeans
which per default attempts to estimate the degrees of freedom using
the expensive Kenward-Roger method unless the number of data points is
high (as here). This can produce quite some status messages (not shown
here). Use emmeans::emm_options(lmer.df = "asymptotic") to suppress
this calculation.
library("ggplot2")
## all data points shown
afex_plot(m3, "task", "length") +
theme_bw()
## data points show IDs
afex_plot(m3, "task", "length", id = "id") +
theme_bw()
## data points show items
afex_plot(m3, "task", "length", id = "item") +
theme_bw()
Follow-up Tests with emmeans
Follow-up tests with emmeans need to be specified in two steps.
Decide which factors of model should be involved in tests. Use these
factors to set-up reference grid of marginal means using
emmeans().
Specify set of tests on reference grid from step 1. Either custom
contrasts as a list and using contrast() or a convenience
function such as pairs().
For mixed models, emmeans attempts to estimate the degrees of freedom.
The method can be set via emm_options(lmer.df = ...). Here we use
"asymptotic" which does not estimate the degrees of freedom, but sets
them to infinity.
library("emmeans")
emm_options(lmer.df = "asymptotic")
## set up reference grid using only length
em2 <- emmeans(m3, "length")
#> NOTE: Results may be misleading due to involvement in interactions
em2
#> length emmean SE df asymp.LCL asymp.UCL
#> 4 -0.1099 0.0304 Inf -0.169 -0.05040
#> 5 -0.0924 0.0304 Inf -0.152 -0.03296
#> 6 -0.0642 0.0304 Inf -0.124 -0.00469
#>
#> Results are averaged over the levels of: task
#> Degrees-of-freedom method: asymptotic
#> Confidence level used: 0.95
## test all pairwise comparisons on reference grid:
pairs(em2)
#> contrast estimate SE df z.ratio p.value
#> length4 - length5 -0.0175 0.0126 Inf -1.384 0.3495
#> length4 - length6 -0.0457 0.0126 Inf -3.618 0.0009
#> length5 - length6 -0.0282 0.0126 Inf -2.238 0.0649
#>
#> Results are averaged over the levels of: task
#> Degrees-of-freedom method: asymptotic
#> P value adjustment: tukey method for comparing a family of 3 estimates
## only test specified tests
con <- list(
"4vs5" = c(-1, 1, 0),
"5vs6" = c(0, -1, 1)
)
contrast(em2, con, adjust = "holm")
#> contrast estimate SE df z.ratio p.value
#> 4vs5 0.0175 0.0126 Inf 1.384 0.1665
#> 5vs6 0.0282 0.0126 Inf 2.238 0.0504
#>
#> Results are averaged over the levels of: task
#> Degrees-of-freedom method: asymptotic
#> P value adjustment: holm method for 2 tests
References
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random
effects structure for confirmatory hypothesis testing: Keep it maximal.
Journal of Memory and Language, 68(3), 255-278.
https://doi.org/10.1016/j.jml.2012.11.001
Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item
effects in recognition memory for words. Journal of Memory and
Language, 62(1), 1-18. https://doi.org/10.1016/j.jml.2009.09.004
Code of Conduct
Please note that afex is released with a Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.
afex: Analysis of Factorial EXperiments
The main functionalities provided by
afexare:aov_car(),aov_ez(), andaov_4()which all fit the same model but differ in the way to specify the ANOVA model).mixed()provides an interface for mixed models analysis (estimated vialme4lmerorglmer) that automatically obtains p-values for fixed effects model terms (i.e., main effects and interactions).afex_plot()visualizes results from factorial experiments combining estimated marginal means and uncertainties associated with the estimated means in the foreground with a depiction of the raw data in the background.afexmodel objects (i.e., ANOVA and mixed models) can be passed toemmeansfor follow-up/post-hoc/planned contrast analysis.For
afexsupport visit: afex.singmann.scienceInstallation
afexis available from CRAN so the current stable version can be installed directly via:install.packages("afex")To install the latest development version you will need the
devtoolspackage:devtools::install_github("singmann/afex@master")ANOVA functionality
To calculate an ANOVA,
afexrequires the data to be in the long format (i.e., one row per data point/observation). An ANOVA can then be calculated via one of three functions that only differ in how the model components are specified, but not in the output. Note that in contrast to baselmoraov,afexANOVA functions always require the specification of a subject identifier column (the id-column), because in case there are multiple observations per participant and cell of the design, these multiple observations are aggregated (i.e., averaged) per default.aov_ezthe columns containing id variable, dependent variable, and factors need to be specified as character vectors.aov_carbehaves similar to standardaovand requires the ANOVA to be specified as a formula containing anErrorterm (at least to identify the id variable).aov_4allows the ANOVA to be specified via a formula similar tolme4::lmer(with one random effects term).A further overview is provided by the vignette.
The following code provides a simple example for an ANOVA with both between- and within-subject factors. For this we use the lexical-decision and word naming latencies reported by Freeman, Heathcote, Chalmers, and Hockley (2010), see also
?fhch2010. As is commonly done, we use the natural logarithm of the response times,log_rt, as dependent variable. As independent variable we will consider the between-subjects factortask("naming"or"lexdec") as well as the within-subjects-factorsstimulus("word"or"nonword") andlength(with 3 levels, 3, 4, or 5 letters).Plotting with
afex_plotANOVA models can be used for plotting via
afex_plot:afex_plotreturns aggplot2plot object which allows simple customization:Follow-up Tests with
emmeansFollow-up tests with
emmeansneed to be specified in two steps.emmeans().listand usingcontrast()or a convenience function such aspairs().Mixed Models
Function
mixed()fits a mixed model withlme4::lmer(orlme4::glmerif afamilyargument is passed) and then calculates p-values for fixed effects model terms using a variety of methods. The formula tomixedneeds to be the same as in a call tolme4::lmer. The default method for calculation of p-values is'S'(Satterthwaite) which only works for linear mixed models (i.e., nofamilyargument). A similar method that provides a somewhat better control of Type I errors for small data sets is'KR'(Kenward-Roger), but it can require considerable RAM and time. Other methods are , similar to'KR'but requires less RAM),'PB'(parametric bootstrap), and'LRT'(likelihood-ratio test).More examples are provided in the vignette, here we use the same example data as above, the lexical decision and word naming latencies collected by Freeman et al. (2010). To avoid long computation times we only consider the two factors
taskandlength(omittingstimulusis probably not a fully sensible model). Because mixed models easily allow it, we will consider crossed-random effects for participants (id) and items (tem).For the random-effects grouping factors we begin with the maximal random effect structure justified by the design (see Barr, Levy, Scheepers, & Tily, 2013). In this case this is by-subject random intercepts and by-subjects random slopes for
stimulusand by-item random intercepts and by-item random slopes fortask.Fitting this model produces a critical convergence warning, that the fit is singular. This warning usually indicates that the data does not provide enough information for the request random effect parameters. In a real analysis it would therefore be a good idea to iteratively reduce the random effect structure until the warning disappears. A good first step would be to remove the correlations among random effect terms as shown below.
This warning is also shown if we simply print the model object, but not if we call the
nice()method.If we call the
anova()method a slightly different output is shown in which the p-values are not rounded in the same way and the warning is shown again.We can also get the default
lme4output if we call thesummarymethod. However, note that in contrast to the previous methods, results are shown for factor-levels and not model-terms which is usually not interpretable for factors with more than two levels. This is the case forlengthhere. The problem is that factors with k levels are mapped to k−1 parameters and at the same time the intercept represent the (unweighted) grand mean. This means that factor-levels cannot be mapped in a 1-to-1 manner to the parameters and thus cannot be uniquely interpreted.Reducing the Random Effect Structure
Because of the singular fit warning, we reduce the random effect structure. Usually a good starting point is removing the correlations among the random effects parameters. This can be done in
afex::mixedeven for factors by combining the double bar notation||withexpand_re = TRUE. We do so for both random effects terms.However, the singular fit warning remains. We therefore inspect the random effect estimates to see which random effect parameter is estimated to be near to zero.
As shown above, one parameter of the by-participant random slope for
lengthis estimated to be almost zero,re1.length2. We therefore remove the by-participant random slope forlengthin the next model which does not show any convergence warnings.Plotting with
afex_plotObjects returned by
mixedcan be used for plotting withafex_plot. However, two things need to be considered.idargument ofafex_plotallows specifying over which random effects grouping factors the data plotted in the background should be averaged over. Per default this uses all random effects grouping factors. In the present case this would mean that all data points are shown resulting in a very busy plot. When choosing only one of the random effects grouping factor, data points in the background show average response for each level of that factor. For example, when settingid = "id"here each data point in the background shows the meanlog_rtof one participant (i.e., level ofid).emmeanswhich per default attempts to estimate the degrees of freedom using the expensive Kenward-Roger method unless the number of data points is high (as here). This can produce quite some status messages (not shown here). Useemmeans::emm_options(lmer.df = "asymptotic")to suppress this calculation.Follow-up Tests with
emmeansFollow-up tests with
emmeansneed to be specified in two steps.emmeans().listand usingcontrast()or a convenience function such aspairs().For mixed models,
emmeansattempts to estimate the degrees of freedom. The method can be set viaemm_options(lmer.df = ...). Here we use"asymptotic"which does not estimate the degrees of freedom, but sets them to infinity.References
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278. https://doi.org/10.1016/j.jml.2012.11.001
Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. Journal of Memory and Language, 62(1), 1-18. https://doi.org/10.1016/j.jml.2009.09.004
Code of Conduct
Please note that
afexis released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.