- design your experiment well and execute it well:
you needn’t worry too much in advance about statistics- don’t: you’re doomed, statistics can’t save you
- randomization, control, replication
| Source of confusion | Features of an experimental design that reduce or eliminate confusion |
|---|---|
| Temporal change | Control treatments |
| Procedure effects | Control treatments |
| Experimenter bias | Randomized assignment of experimental units to treatments Randomization in conduct of other procedures “Blind” procedures* |
| Experimenter-generated variability (random error) |
Replication of treatments |
| Initial or inherent variability among experimental units |
Replication of treatments Interspersion of treatments Concomitant observations |
| Nondemonic intrusion | Replication of treatments Interspersion of treatments |
| Demonic intrusion | Eternal vigilance, exorcism, human sacrifices, etc. |
- how big does your experiment need to be? (Lakens 2022)
- power: probability of detecting an effect of a particular size,
if one exists- more generally: how much information? what kinds of mistakes? (Gelman and Carlin 2014)
- underpowered studies - failure is likely - cheating is likely - significance filter \(\to\) biased estimates
- overpowered studies waste time, lives, $
- pseudoreplication (Hurlbert 1984; Davies and Gray 2015) confounding sampling units with treatment units
if you can’t guess an effect size you shouldn’t be doing an experiment
apropos("^power\\.") ## base-R functions
## [1] "power.anova.test" "power.prop.test" "power.t.test"
a1 <- available.packages(repos="https://cran.rstudio.com")
pow_pkgs <- grepv("power", rownames(a1), ignore.case=TRUE)
length(pow_pkgs)
## [1] 60
head(pow_pkgs, 10)
## [1] "agpower" "BayesianPower" "BayesPower" "CoRpower"
## [5] "crt2power" "depower" "easypower" "ecopower"
## [9] "extraSuperpower" "InteractionPoweR"
pwr packagelibrary("pwr")
apropos("^pwr")
## [1] "pwr.2p.test" "pwr.2p2n.test" "pwr.anova.test" "pwr.chisq.test"
## [5] "pwr.f2.test" "pwr.norm.test" "pwr.p.test" "pwr.r.test"
## [9] "pwr.t.test" "pwr.t2n.test"
also: library("sos"); findFn("{power analysis}")
dd <- read.csv("../data/ants.csv")
power.t.test(n=c(10,10),delta=2,sd=1)
power.t.test(power=0.8,delta=2,sd=1)
nvec <- 2:15
powfun <- function(n) {
power.t.test(n, delta=2, sd=1)$power
}
powvec <- sapply(nvec,powfun)
plot(nvec, powvec, type="b",
xlab="sample size (each group)",
ylab="power",
main = "delta = 2, sd = 1",
ylim = c(0,1))
From Russ Lenth FAQs:
for a “medium” effect size, you’ll choose the same \(n\) regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects
avoid: retrospective power analysis
- running
- tautological: high \(p\)-value \(\leftrightarrow\) low power
- essentially useless
- instead:
- show confidence intervals
- (if necessary) pretend you’re doing prospective analysis
- push back: Thomas (1997), Gerard, Smith, and Weerakkody (1998), Hoenig and Heisey (2001)
what to do about bad news?
- simplify the question
- use simpler designs (e.g. low/high vs continuous range)
- push treatments harder
- ask a different question
what if your analysis is more complex?
- simplify
- simulate
- see chap 5 (Bolker 2008)
- much more flexible - e.g. simulate effects of lack of balance - endpoints other than power (e.g. CV)
simulation (linear regression example)
## experimental design N <- 20; x_min <- 0; x_max <- 2 x <- runif(N, min=x_min, max=x_max) ## model world a <- 2; b <- 1; sd_y <- 1 ## setup nsim <- 1000; pval <- numeric(N); set.seed(101) for (i in 1:nsim) { y_det <- a + b * x ## deterministic y y <- rnorm(N, mean = y_det, sd = sd_y) m <- lm(y ~ x) pval[i] <- coef(summary(m))["x", "Pr(>|t|)"] ## extract p-value } mean(pval<0.05)## [1] 0.688power of clarity
references
Bolker, Benjamin M. 2008. Ecological Models and Data in R. Princeton University Press.Davies, G. Matt, and Alan Gray. 2015. “Don’t Let Spurious Accusations of Pseudoreplication Limit Our Ability to Learn from Natural Experiments (and Other Messy Kinds of Ecological Monitoring).” Ecology and Evolution, October. https://doi.org/10.1002/ece3.1782.Faul, Franz, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. “Statistical Power Analyses Using G*Power 3.1: Tests for Correlation and Regression Analyses.” Behavior Research Methods 41 (4): 1149–60. https://doi.org/10.3758/BRM.41.4.1149.Gelman, Andrew, and John Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. https://doi.org/10.1177/1745691614551642.Gerard, Patrick D., David R. Smith, and Govinda Weerakkody. 1998. “Limits of Retrospective Power Analysis.” The Journal of Wildlife Management 62 (2): 801–7. http://www.jstor.org/stable/3802357.Hoenig, John M., and Dennis M. Heisey. 2001. “The Abuse of Power.” The American Statistician 55 (1): 19–24. https://doi.org/10.1198/000313001300339897.Hurlbert, Stuart H. 1984. “Pseudoreplication and the Design of Ecological Field Experiments.” Ecological Monographs 54 (2): 187–211. https://doi.org/10.2307/1942661.Lakens, Daniël. 2022. “Sample Size Justification.” Collabra: Psychology 8 (1): 33267. https://doi.org/10.1525/collabra.33267.Lenth, R. V. 2006. “Java Applets for Power and Sample Size [Computer Software].” http://www.stat.uiowa.edu/~rlenth/Power.Thomas, Len. 1997. “Retrospective Power Analysis.” Conservation Biology 11 (1): 276–80. https://doi.org/10.1046/j.1523-1739.1997.96102.x.