The key frequentist question is:
Permutation tests provide a natural, intuitive way to test a wide range of such hypotheses:
Permutation tests are a rock-solid, conceptually simple way to get P values
They allow flexibility in constructing statistics of interest
They provide a straightforward way of dealing with certain types of structure
We don’t want to focus on P values!
It is often possible to associate confidence intervals with permutation tests, but:
Additional assumptions are (almost) always needed
Flexibility is dramatically reduced
This is not always well supported
Use a permutation P value as a sanity check on your more-powerful, assumption-based approach
If P values look pretty similar (especially if they’re not too small), your main approach may be fine
I’ve never seen somebody use a statistical journal to track criteria for this – you could be a trend-setter
Choose a statistic that reflects the effect you are trying to measure
Compare the observed value of the statistic with a null distribution, generated by interchanging things that should be interchangeable under your null hypothesis
Enumerate all of the possibilities (if possible)
Simulate at least 1999 possibilities (also, if possible, you can do less if it’s really necessary)
Use a classical analytical approximation (from a package)
“Ties” (permutations with a statistic equal to the observed statistic) “count” against significance
Opinion: The best way to do a two-tailed test is to calculate a one-tailed P value for the observed effect, and then double the P value
We have assumed nothing about distributions of ant nests
What is the best way to interpret a significant permutation result?
If the difference in growth had been significant we would conclude that that difference is due to something real about the systems (i.e., not due to chance).
If we want to conclude that mean growth is greater in the treatment group, we already need to assume something about distributions!
We can use any statistic we want, and get a valid test
Means tend to have more power than medians
Transformations that make the data more normal also tend to increase power
Using the geometric mean is equivalent to what transformation?
We are not allowed to try different statistics until one works. Why not?
You can test anything, if you can:
Measure it with a statistic
Come up with a permutation approach that reflects a scientific question
We measure correlations between a species of algae and nitrogen and phosphorous levels in natural ponds. Thus, we have a data frame showing N, P and A (for algae).
What kinds of tests could we do to see whether the algae are correlated with nutrient levels?
Observe behavior of different individual animals. Evaluate observed statistics of (for example) tendency of bachelor zebras to wander off from groups
Classic tests don’t account for individual propensities
Switch whole “timelines” from one individual to another
If your model is “linear enough” you can get confidence intervals by essentially shifting your null distribution to be centered around the observed mean!
Pers comm., Dushoff
but you can also test it in your case with simulations
One-sample test (assuming symmetry)
Two-sample test (assuming a shift relationship)
Regression slopes (assuming a linear relationship)
As with traditional models, it’s good to consider transformations before you test.
General applicability
Conceptual clarity
Flexibility
Fewer assumptions
Hard to implement
May take a lot of computer time
Often hard to obtain confidence intervals