Permutational Multivariate Analysis of Variance (PERMANOVA)

Permutational multivariate analysis of variance (PERMANOVA) is a geometric partitioning of variation across a multivariate data cloud, defined explicitly in the space of a chosen dissimilarity measure, in response to one or more factors in an analysis of variance design. Statistical inferences are made in a distribution-free setting using permutational algorithms. The PERMANOVA framework is readily extended to accommodate random effects, hierarchical models, mixed models, quantitative covariates, repeated measures, unbalanced and/or asymmetrical designs, and, most recently, heterogeneous dispersions among groups. Plots to accompany PERMANOVA models include ordinations of either fitted or residualized distance matrices, including multivariate analogues to main effects and interaction plots, to visualize results.

1 Introduction

PERMANOVA is an acronym for “permutational multivariate analysis of variance”¹. It is best described as a geometric partitioning of multivariate variation in the space of a chosen dissimilarity measure according to a given ANOVA design, with p-values obtained using appropriate distribution‐free permutation techniques (see Permutation Based Inference; Linear Models: Permutation Methods). The method is semiparametric, motivated by the desire to perform a classical partitioning, as in ANOVA (hence allowing tests and estimation of sizes of main effects, interaction terms, hierarchical structures, random components in mixed models, etc.), while simultaneously retaining important robust statistical properties of rank-based nonparametric multivariate methods, such as the analysis of similarities (ANOSIM²), namely, (1) the flexibility to base the analysis on a dissimilarity measure of choice (such as Bray–Curtis, Jaccard, etc.) and (2) distribution-free inferences achieved by permutations, with no assumption of multivariate normality. Thus, PERMANOVA opens the door for formal partitioning of multivariate data in response to complex experimental designs in a wide variety of contexts: there may be more response variables than sampling units, data may be severely non-normal, zero‐inflated, ordinal or qualitative (e.g., responses to questionnaires, DNA/RNA sequences, allele frequencies, amino acids, or protein data). Although originally motivated by ecological studies, where variables usually consist of counts of abundances (or percentage cover, frequencies, or biomass) for a large number of species, PERMANOVA is now used across many fields, including chemistry, social sciences, agriculture, medicine, genetics, psychology, economics, and more.

2 Derivation

2.1 The Pseudo F Statistic

Let Y be a matrix of N rows (sampling units) by p columns (variables). Let D = {d_ij}, i = 1,…, N; j = 1,…, N consist of the distances or dissimilarities between every pair (i, j) of sampling units. The first major milestone in the development of PERMANOVA was the basic achievement of direct partitioning of dissimilarity-based multivariate spaces in response to multiway ANOVA designs^3-9. Most similarities S = {s_ij} can also be reexpressed¹⁰ as dissimilarities d_ij = 1.0 − s_ij. A direct partitioning of distance matrices having Euclidean metric properties^10-12 was extended to semimetric dissimilarities^{6, 9}, having only the properties¹³ of symmetry (i.e., d_ij = d_ji), and that d_ij ≥ 0 and d_ii = 0 ∀ i, j.

Consider matrix $A = \{a_{i j}\} = \{- \frac{1}{2} d_{i j}^{2}\}$ . Centering the elements of this matrix by row-means, ${\overline{a}}_{i \cdot}$ , column-means, ${\overline{a}}_{\cdot j}$ , and the overall mean, ${\overline{a}}_{\cdot \cdot}$ , gives Gower's matrix G = {g_ij} = { $a_{ij} - {\overline{a}}_{i \cdot} - {\overline{a}}_{\cdot j} + {\overline{a}}_{\cdot \cdot}$ }. Let X be an (N × g) design matrix whose columns are indicators coding for groups (levels) of an ANOVA factor in a one-way design (e.g., an intercept and orthogonal contrasts⁷). We can then obtain a projection (hat) matrix H = X[X′X]^{− 1}X′, derived from the normal equations used in linear regression^{14, 15}.

The among-group sum-of-squares (SS_A) and the within-group (or residual) sum-of-squares (SS_R) for PERMANOVA, corresponding to a direct geometric partitioning (Figure 1) of the total variation (SS_T), are achieved as SS_T = tr(G); SS_A = tr(HG); and SS_R = tr[(I − H)G], where I is an (N × N) identity matrix, “tr” indicates the trace of a matrix, and the usual ANOVA identity: SS_T = SS_A + SS_R holds. The PERMANOVA pseudo F statistic for testing the null hypothesis of no differences in the positions of the group centroids in the space of the chosen dissimilarity measure is given by

F = (S S_{A} / S S_{R}) \cdot [(N - g) / (g - 1)]

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Schematic diagram of geometric partitioning for PERMANOVA, shown for g = 3 groups of n = 10 sampling units per group in two-dimensional (bivariate, p = 2) Euclidean space. The total variation in the data cloud (SS_T) is the sum of two parts: SS_T = SS_A + SS_R, where the residual (within-group) sum-of-squares (SS_R) is the sum of the squared distances to centroids from individual sampling units (replicates) to their own group centroid (colored dotted lines) and where the among-group sum-of-squares (SS_A) is the sum of the squared distances from individual group centroids to the overall centroid (solid black lines).

When Euclidean distances are used, then the PERMANOVA sums-of-squares are each equal to the sum of the classical univariate sums of squares across the original variables^{7, 16}; that is, $S S_{A} = \sum_{k = 1}^{p} S S_{A}^{[k]}$ , where $S S_{A}^{[k]}$ is the univariate among-group sum-of-squares for variable k = 1,…, p (and similarly for each of SS_R and SS_T). Hence, pseudo F on Euclidean distances is the same as the F statistic used in classical redundancy analysis (RDA)^{17, 18}. Distance-based redundancy analysis (dbRDA^{7, 8}), equivalent to PERMANOVA, is an RDA on the orthogonal principal coordinates (PCOs) of matrix G based on a chosen (potentially non-Euclidean) dissimilarity measure. However, direct partitioning of matrix G obviates the need for any eigenvector decomposition into PCOs or any corrections for negative eigenvalues that might arise for semimetric (or non-Euclidean-embeddable) dissimilarities⁸. Furthermore, PERMANOVA on one response variable using Euclidean distance yields the classical univariate F statistic¹⁹. So, PERMANOVA can also be used to do univariate ANOVA, but where p values are obtained by permutation²⁰, thus avoiding the assumption of normality.

PERMANOVA partitioning (Figure 1) can also be obtained directly using sums of squared dissimilarities⁹. Specifically,

S S_{T} = \sum_{i = 1}^{(N - 1)} \sum_{j = (i + 1)}^{N} d_{i j}^{2} / N

and for the one-way case,

S S_{W} = \sum_{ℓ = 1}^{g} W_{ℓ}

, where W_ℓ is the within-group sum-of-squares for group ℓ:

W_{ℓ} = \sum_{i = 1}^{(N - 1)} \sum_{j = (i + 1)}^{N} ϵ_{i j}^{[ℓ]} d_{i j}^{2} / n_{ℓ}

n_ℓ is the sample size for group ℓ (thus,

N = \sum_{ℓ = 1}^{g} n_{ℓ}

) and

ϵ_{i j}^{[ℓ]}

is an indicator such that

ϵ_{i j}^{[ℓ]} = 1

if samples i and j are both in group ℓ, else

ϵ_{i j}^{[ℓ]} = 0

. Then, SS_A = SS_T − SS_W, and the partitioning required for the construction of pseudo F is achieved.

2.2 Inference

The second major milestone, achieved more-or-less simultaneously, was the development of appropriate permutation algorithms^{21, 22} for achieving rigorous tests of individual terms in complex ANOVA designs, specifically by conditioning on other terms in the model^{23, 24} and using expectations of mean squares to identify correct permutable units^{25, 26}. In some cases (e.g., tests of interactions), only asymptotically exact p-values are readily obtainable, for example, via permutation of residuals under a reduced model²⁷, although synchronized permutations²⁸ might be possible.

For cases where there are too few possible permutations to achieve a precise p-value for inferences at a suitably small level of significance, approximate p-values can be obtained using Monte Carlo random draws from the asymptotic permutation distribution²⁹. Specifically, each of the numerator and denominator of pseudo F is asymptotically distributed as a linear form in chi‐square random variables, with coefficients being the eigenvalues from a PCO of matrix G²⁹.

3 Assumptions

3.1 Exchangeability and the Linear Model

PERMANOVA makes no explicit assumptions regarding either the distributions of original variables in Y or the distributions of dissimilarities in D. For a given test, PERMANOVA assumes only exchangeability^{30, 31} of permutable units under a true null hypothesis. PERMANOVA is only “nonparametric”⁹ for the one-way case. Inferences remain distribution-free, but PERMANOVA applies a linear model to the dissimilarity space; interactions are defined by reference to additive main effects. Indeed, a key motivation for the development of PERMANOVA was to perform tests for interaction^{7, 8}; PERMANOVA is a classical partitioning in the Euclidean space defined by the full set of PCO axes^{7, 8}. Consequently, the dissimilarity measure will have an important bearing on results, much more so than for rank-based tests, such as ANOSIM². Hence, in practice, values in matrix D should be neither over-ridden with repeated values at either an upper or lower bound nor dominated by undefined or erratic values, as can occur for sparse data^{32, 33}. Judicious choice of dissimilarity measure^{32, 34} or pooling of small-scale replicates³³ is often well advised.

3.2 Homogeneity of Multivariate Dispersions

Both ANOSIM and the Mantel test are extremely sensitive to differences in dispersions among groups; however, PERMANOVA (like ANOVA) is very robust to heterogeneity for balanced designs but not unbalanced designs³⁵. Furthermore, PERMANOVA, unlike ANOSIM or traditional MANOVA, is not sensitive to differences in correlation structure (shape) among groups^35-37.

A test for homogeneity of multivariate dispersions (PERMDISP) in the space of the chosen dissimilarity measure can be done, either to accompany PERMANOVA or in its own right³⁸. This test compares within-group spread among groups using the average value of the distances from individual observations to their own group centroid^38-41. The specific directions of distances are not taken into account, so PERMDISP, like PERMANOVA, does not identify differences in the shapes of data clouds among groups, only their relative spread.

4 Extensions

Anderson et al.⁴² have recently provided a modification to pseudo F that allows for heterogeneity in multivariate dispersions. Specifically, to test the null hypothesis of no differences in the centroids among groups, given any differences in within-group dispersions, a modified PERMANOVA pseudo F statistic⁴² is

F_{2} = t r (H G) / [\sum_{ℓ = 1}^{g} (1 - \frac{n_{ℓ}}{N}) V_{ℓ}]

where the within-group dispersion for group ℓ is V_ℓ = W_ℓ/(n_ℓ − 1). This is a direct multivariate analogue to the solution to the Behrens–Fisher problem suggested by Brown and Forsythe⁴³; F₂ is equivalent to the usual PERMANOVA pseudo F statistic for balanced (but not for unbalanced) designs. An approximate p-value is obtained using either permutations or separate-sample bootstrapped residuals⁴².

From a PERMANOVA partitioning, direct multivariate dissimilarity-based analogues to familiar univariate statistical constructs in classical ANOVA and regression models are easily identified. This includes partitioning for unbalanced designs (e.g., Type I, Type II, or Type III SS⁴⁴), pairwise comparisons or a priori contrasts¹, inclusion of quantitative covariates¹, measurements of precision³³, or information criteria for model selection (such as AIC, AIC_c, or BIC^{1, 45}).

Importantly, PERMANOVA models can include random effects, interaction terms, and hierarchical (nested) structures, with concomitant attendant logical inferences, but in a distribution-free setting. Construction and interpretation of PERMANOVA models rely heavily on the notion of expectations of mean squares (EMS⁴⁶). For example, classical EMS^47-49 are used to correctly construct pseudo F statistics¹, identify correct permutable units for a given null hypothesis²⁶, and estimate components of variation⁴⁶.

Components of variation in PERMANOVA models are calculated using direct analogues to univariate unbiased ANOVA estimators of variance components⁴⁶ and expressed in units of dissimilarity⁵⁰. To maintain a distribution-free approach, bootstrapping can be used (each term requiring a specific algorithm within the context of the full multifactor model⁵¹) to estimate and compare the sizes of these components⁵².

All of these constructs boil down to their univariate counterparts for one variable in Euclidean space but, importantly, allow for broad utility when considered for high‐dimensional multivariate systems based on a dissimilarity measure of choice. In ecology, PERMANOVA facilitates the analysis of beta diversity (variation in community structure) across multiple spatial or temporal scales, which is quantified directly by such components of variation⁵³.

5 Plots to Accompany PERMANOVA

Patterns among sampling units can be visualized by a suitable ordination of matrix D, including (i) PCO analysis⁵⁴; (ii) metric multidimensional scaling (mMDS^{55, 56}); or (iii) nonmetric MDS (nMDS⁵⁷). Patterns may be readily apparent in such plots for small-to-modest sampling designs, but additional plots described here – specifically, dissimilarity-based analogues to main-effects plots, interaction plots and residual plot – can greatly improve one's understanding of “what PERMANOVA sees” when performing the partitioning and associated tests.

5.1 Distances among Centroids

The number of individual sampling units in complex ANOVA designs may become quite large. Thus, visualizing patterns across factor levels can be difficult; stress (see shepard diagram) in metric or nMDS plots of replicates often exceeds the rule of thumb for interpretability (i.e., stress >0.2). Typically, interest lies in examining the relative positions of group centroids in the space of the dissimilarity measure; this is precisely as in univariate analysis, where plots of means are generally more useful than are plots of raw data. This is achieved by ordination of distances among centroids.

If D contains Euclidean distances, then the distances among centroids are equivalent to Euclidean distances among the arithmetic averages calculated separately for each variable. This equivalence does not hold, however, for non-Euclidean dissimilarities. Distances among centroids based on some other chosen dissimilarity measure are calculated as follows: (i) obtain the full set of PCO axes from matrix G, with each axis having been standardized by the absolute value of its respective eigenvalue; (ii) calculate arithmetic averages for each group separately along each PCO axis; (iii) for every pair of centroids (ℓ, ℓ′), (ℓ = 1,…, g) and (ℓ′ = 1,…, g), calculate Euclidean distances separately in each of the two sets: one based on PCO axes corresponding to non-negative eigenvalues $(d_{ℓ ℓ^{'}}^{+})$ and one based on those corresponding to negative eigenvalues $(d_{ℓ ℓ^{'}}^{-})$ , if any; and (iv) the (g × g) matrix of distances among centroids in the space of the dissimilarity measure is then $D^{[C]} = \{d_{ℓ ℓ^{'}}^{[C]}\}$ , where $d_{ℓ ℓ^{'}}^{[C]} = \sqrt{{(d_{ℓ ℓ^{'}}^{+})}^{2} - {(d_{ℓ ℓ^{'}}^{-})}^{2}}$ .

Distances among centroids can also be calculated directly from matrix G. For example, consider a one-way ANOVA model with n_ℓ sampling units in group ℓ and $N = \sum_{ℓ = 1}^{g} n_{ℓ}$ . Let (i ∈ ℓ) denote an indicator for the subset of the i = 1,…, N observations that occur in group ℓ. An “averaged” (g × g) Gower matrix can be obtained as $\overline{G} = \{{\overline{g}}_{ℓ ℓ^{'}}\}$ , where ${\overline{g}}_{ℓ ℓ^{'}} = \frac{1}{n_{ℓ} n_{ℓ^{'}}} \sum_{(i \in ℓ)} \sum_{(j \in ℓ^{'})} g_{i j}$ . A suitable back-transformation yields the desired distances among every pair of centroids: $d_{ℓ ℓ^{'}}^{[C]} = \sqrt{{\overline{g}}_{ℓ ℓ} - 2 {\overline{g}}_{ℓ ℓ^{'}} + {\overline{g}}_{ℓ^{'} ℓ^{'}}}$ , obviating the need for PCO axes.

5.2 “Main Effects” Plots, “Interaction” Plots, and “Residual” Plots

In multifactor ANOVA, interest lies in visualizing differences among factor-level centroids as well as the amount of variation attributable to different factors. One may calculate distances among centroids corresponding to the main-effect levels for each factor in turn and place all of these in a single ordination. Such a plot of “main effects,” particularly using metric MDS to preserve the original dissimilarity scale, shows the relative importance of factors, generally reflecting the relative sizes of estimated components of variation from the PERMANOVA partitioning.

Another natural ordination plot of interest is the multivariate direct analogue to an “interaction” plot in the space of the dissimilarity measure. One can do metric or nonmetric MDS (or PCO) on distances among centroids that correspond to individual cells defined by combinations of factor levels (averaging replicates within cells). Such a plot will not only show effects of individual factors but also (potentially) provide insights into the nature of any interactions detected between the factors.

A factor of primary interest may be statistically significant in a PERMANOVA partitioning of the full model, but its effects may be totally obscured by some other dominant factor(s) or covariate(s) in an ordination. Although plots of distances among centroids may shed some light on minor effects, a dissimilarity-based multivariate analogue to a residual plot, from which the variation due to dominant (but perhaps nuisance) factors or covariates has been removed, is desirable.

An (N × N) “residualized” Gower matrix $G^{[R]} = \{g_{i j}^{[R]}\}$ is attainable directly as G^[R] = (I − H)G(I − H), where H is the usual “hat” matrix calculated on some model matrix X containing all terms (factors, covariates, etc.) the effects of which one wishes to remove. A “residualized” distance matrix is then obtained as $D^{[R]} = \{d_{i j}^{[R]}\}$ , where $d_{i j}^{[R]} = \sqrt{g_{i i}^{[R]} - 2 g_{i j}^{[R]} + g_{j j}^{[R]}}$ . Any of the usual ordination methods (PCO, mMDS, or nMDS) may then be constructed in the usual way from this matrix of residualized distances.

6 Ecological Examples

6.1 Four-Factor Mixed Model: Okura Estuary

Consider a study of potential effects of sedimentation on intertidal soft-sediment fauna in the Okura estuary, near Auckland, New Zealand⁵⁸. Sites were classified a priori from earlier hydrological studies as having a high, medium, or low probability of sediment deposition. Thus, n = 6 sediment cores (13 cm diameter × 15 cm deep) were sampled randomly from each of the 15 sites, 5 within each of these 3 depositional types of environments. Sampling was repeated a total of six times in 2001–2002: once after a relatively dry period and once 7–10 days after a heavy rainfall event in each of the three seasons: winter, spring, and summer. Abundances of p = 73 taxa were recorded from a total of N = 540 cores in a four-factor design: Season (fixed with three levels: winter, spring, or summer), Rainfall (fixed with two levels: rain or dry), Deposition (fixed with three levels: high, medium, or low), and Site (random with 15 levels, nested within Deposition).

Ordination (nMDS) done directly on the full dissimilarity matrix (Bray–Curtis on fourth-root transformed data) has high stress (∼0.20), and although one may discern a pattern of differences between assemblages from different depositional environments, there is a lot of overlap (Figure 2), and any labeling scheme attempting to show season or rainfall effects is simply uninterpretable, due to high residual variation.

PERMANOVA partitioning shows that spatial effects are the strongest, with Deposition, Site, and the Residual contributing the largest components of variation to the overall model (Table 1). Small-scale spatiotemporal variation, identifiable from statistically significant high-order interactions of temporal factors with Site (Table 1, p < 0.001), was also apparent. Although there was no evidence for a three-way interaction (Season × Rainfall × Deposition; p > 0.48), effects of depositional environments varied across seasons and in different weather conditions (Season × Deposition and Season × Rainfall; p < 0.01).

Table 1. PERMANOVA Partitioning and Analysis of Soft-Sediment Assemblages (73 Taxa) from the Okura Estuary, Based on Fourth-Root Transformed Abundances and Bray–Curtis Dissimilarities

Source	df	SS	MS	Pseudo F	p	Component	Var	SD
Season	2	25 092	12 546	8.419	0.0001	Fixed	61.42	7.84
Rainfall	1	3965	3965	2.940	0.0138	Fixed	9.69	3.11
Deposition	2	217 580	108 790	5.443	0.0003	Fixed	493.34	22.21
Site (Deposition)	12	239 850	19 987	23.050	0.0001	Random	531.12	23.05
Season × Rainfall	2	11 206	5603	4.236	0.0001	Fixed	47.58	6.90
Season × Deposition	4	11 126	2782	1.867	0.0051	Fixed	21.52	4.64
Rainfall × Deposition	2	3400	1700	1.261	0.2574	Fixed	3.91	1.98
Season × Site(Deposition)	24	35 763	1490	1.719	0.0001	Random	51.92	7.21
Rainfall × Site(Deposition)	12	16 182	1349	1.555	0.0004	Random	26.74	5.17
Season × Rainfall × Deposition	4	5232	1308	0.989	0.4807	Fixed	0.00	0.00
Season × Rainfall × Site(Deposition)	24	31 745	1323	1.525	0.0001	Random	75.58	8.69
Residual	450	390 200	867	–	–	Random	867.12	29.45
Total	539	991 340	–	–	–		–	–

Pseudo F statistics were calculated for each term using direct analogues to univariate expectations of mean squares (EMS); p-values were obtained using 9999 permutations under a reduced model. Each term is identified as contributing either a fixed or random component to the overall model; “Var” gives the estimated sizes of components of variation, based on multivariate analogues to the classical ANOVA unbiased estimators; “SD” gives the square root of these values, so is in Bray–Curtis units. Nb: Estimates of components of variation (Var) were calculated after pooling (removing) the term “Season × Rainfall × Deposition,” which originally had a negative estimate, so its contribution was set to zero⁵⁹.

A main-effects plot clearly shows that depositional effects are the strongest, especially contrasting high versus either medium or low depositional areas along mMDS axis 1 (Figure 3a). Seasonal effects are apparent along mMDS axis 2 (from winter to spring to summer), while the contrast of rain versus dry was relatively much smaller (Figure 3a). This mirrors the relative sizes of these main effects in the PERMANOVA partitioning: Deposition, followed by Season, then Rainfall (Table 1). An interaction plot of the 18 Season × Rainfall × Deposition cell centroids not only shows the same pattern but also shows that seasonal effects (along mMDS axis 2) are much larger in high depositional environments (right-hand side of the plot) than for either the medium or low depositional environments. Pairwise comparisons (not shown here) further support these general conclusions.

6.2 Hierarchical Design with a Quantitative Covariate: Kelp Holdfasts

Anderson et al.⁵⁰ studied organisms colonizing holdfasts of the kelp, Ecklonia radiata; samples were collected according to a spatially structured hierarchical sampling design along the northeast coast of New Zealand. There were four locations (separated by hundreds of kilometers from north to south: Berghan Point, Home Point, Leigh, and Hahei); two sites within each location (separated by hundreds of meters to kilometers); two areas within each site (separated by tens of meters); and n = 5 replicate holdfasts (separated by meters) within each area. A total of 351 taxa from 15 different phyla were enumerated/quantified from these N = 80 holdfasts.

Interest lies in quantifying multivariate variation in assemblage structure at each spatial scale (beta diversity^{34, 53}) while taking into account natural variation in the sizes of holdfasts; the volume of each holdfast was measured using water displacement. Bray–Curtis dissimilarities were calculated on fourth-root transformed values.

PERMANOVA partitioning using a sequential (Type I) sum of squares showed the greatest component of variation was the residual (smallest scale), followed by locations, areas, and then sites; the latter two scales were comparable in their effect sizes (Table 2). Volume, although statistically significant (p < 0.001), was less important than any of the spatial factors. For this example, stress is relatively high even for a “main-effects” plot of distances among centroids (Figure 4), reflecting the high dimensionality of this multivariate system. Nevertheless, the distinctiveness of the four locations is quite apparent, the centroids for sites within them appear like satellites around each one, while area-level centroids appear, in turn, as satellites around each site centroid (Figure 4). Similar patterns were shown in a nMDS plot (stress = 0.12) and in a three-dimensional mMDS plot (stress = 0.13) of these distances (not shown). The “balance” of the two satellites (i.e., being opposite and equidistant) around any particular centroid in this fully nested design was seen more perfectly in the 3D mMDS plot.

Table 2. PERMANOVA Partitioning and Analysis of Invertebrate Assemblages (351 Taxa) Colonizing Holdfasts of the Kelp, Ecklonia Radiata, using Type I Sums of Squares for this Spatially Hierarchical Nested Sampling Design along the North-Eastern Coast of New Zealand, and Including the Quantitative Covariate of the Volume for Each Holdfast (Measured by Water Displacement)

Source	df	SS	MS	Pseudo F	p	Component	Var	SD
Volume	1	8720	8720	2.467	0.0003	Fixed	64.81	8.05
Location	3	23 256	7752	3.289	0.0001	Random	304.47	17.45
Site (Location)	4	10 139	2535	1.798	0.0002	Random	113.28	10.64
Area (Site(Location))	8	11 286	1411	1.750	0.0001	Random	121.63	11.03
Residual	63	50 778	806	–	–	Random	806.00	28.39
Total	79	104 180	–	–	–		–	–

The analysis was based on Bray–Curtis dissimilarities of fourth-root transformed values and pseudo F statistics were calculated for each term using direct analogues to univariate expectations of mean squares (EMS); p-values were obtained using 9999 permutations under a reduced model. Each term is identified as contributing either a fixed or random component to the overall model; “Var” gives the estimated sizes of components of variation, based on multivariate analogues to the classical ANOVA unbiased estimators; “SD” gives the square root of these values, so is in Bray–Curtis units. Nb: Interactions of each of the ANOVA factors (Location, Site, or Area) with Volume were omitted from the model, as the p-values for all of these interaction terms were >0.23.

6.3 Randomized Block Design: Plankton

A study by Winsor and Clarke^{60, 61} investigated the catch of several types of plankton using two nets hauled horizontally, one being 2 m below the other. Ten hauls were made with the pair of nets at depth positions of 29 and 31 m, respectively. Data were transformed to logarithms of the catch numbers for each of p = 5 plankton types.

This yielded a randomized block design. Ordination of individual sampling units showed no strong effect of position on these plankton assemblages – the two different symbols are quite well mixed (Figure 5a). However, PERMANOVA clearly detected a significant effect of “Position” (Table 3a). The key is PERMANOVA's ability to partition, hence partial out, the (random) effects due to different hauls (equivalently, to identify the “paired” nature of the nets in this design), thereby diminishing the residual and yielding greater power for the factor of interest: the position of the nets (upper vs lower). A useful plot here is the nMDS plot of residualized distances, having removed the (dominating) effects of hauls, in which the contrast (separation) between plankton assemblages in upper versus lower nets is now quite obvious (Figure 5b). Not surprisingly (given Figure 5a), no statistically significant effects of “position” would be detected if variation among hauls were to be ignored (Table 3b).

Table 3. PERMANOVA Partitioning and Analyses Based on Bray–Curtis Dissimilarities of Log-Transformed Abundances for p = 5 Classes of Plankton Obtained from 10 Different Hauls of Paired Nets at Two Depth Positions: Upper (29 m) and Lower (31 m)

Source	df	SS	MS	Pseudo F	p	Component	Var	SD
(a) Including Hauls
Haul	9	434.59	48.29	6.605	0.0001	Random	20.49	4.53
Position	1	43.11	43.11	5.897	0.0270	Fixed	3.58	1.89
Residual	9	65.79	7.31	–	–	Random	7.31	2.70
Total	19	543.49	–	–	–		–	–
(b) Ignoring Hauls
Position	1	43.11	43.11	1.551	0.2167	Fixed	1.53	1.24
Residual	18	500.38	27.80	–	–	Random	27.80	5.27
Total	19	543.49	–	–	–		–	–

First, an analysis is done (a) that takes into account the variation among hauls (as in a randomized block design); a second (erroneous) analysis (b) is done as a simple one-way design, treating the hauls merely as replicates, so ignoring the pairing of the nets. Pseudo F statistics were calculated for each term using direct analogues to univariate expectations of mean squares (EMS); p-values were obtained using 9999 permutations under a reduced model for analysis (a) and 9999 raw data permutations for analysis (b). Each term is identified as contributing either a fixed or random component to the overall model; “Var” gives the estimated sizes of components of variation, based on multivariate analogues to the classical ANOVA unbiased estimators and “SD” gives the square root of these values, so is in Bray–Curtis units.

7 Conclusions

PERMANOVA provides a useful statistical tool for the analysis of multivariate data on the basis of Euclidean distances or non-Euclidean-embeddable dissimilarity measures. Its utility, in many ways, simply mirrors that of classical ANOVA⁶², yet, as a geometric partitioning, it extends to something much broader and even more widely applicable, allowing rigorous meaningful analysis of high-dimensional systems, even those having variables with extremely non-normal or overdispersed behavior. It is not restricted by distributional assumptions and has recently been extended also to accommodate heterogeneity of within-group dispersions.

Acknowledgments

This work was supported by a James Cook Fellowship from the Royal Society of New Zealand. PERMANOVA would not exist were it not for B. H. McArdle, P. Legendre, K. R. Clarke, C. J. F. ter Braak, J. Robinson, and A. J. Underwood. I especially thank R. N. Gorley, without whom many of the most useful and elegant extensions to PERMANOVA and associated ordinations could not have been so readily achieved.

Multiresponse Permutation Procedures; Permutation Tests: Multivariate; Similarity, Dissimilarity, and Distance, Measures of; Multivariate Analysis of Variance (MANOVA); Redundancy Analysis; Multivariate Behrens–Fisher Problem; Mantel and Valand's Nonparametric MANOVA; Computer Intensive Sampling Methods in Ecology; Analysis of Variance Through Examples.

References

1Anderson, M.J., Gorley, R.N., and Clarke, K.R. (2008) PERMANOVA+ for PRIMER: Guide to Software and Statistical Methods, PRIMER-E, Plymouth, UK.
PubMedWeb of Science®Google Scholar
2Clarke, K.R. (1993) Non-parametric multivariate analyses of changes in community structure. Aust. J. Ecol., 18, 117–143.
10.1111/j.1442-9993.1993.tb00438.x
PubMedWeb of Science®Google Scholar
3McArdle, B.H. (1990) Detecting and Displaying Impacts of Biological Monitoring: Spatial Problems and Partial Solutions. Proceedings of Invited Papers, XVth International Biometrics Conference, IBC, Budapest, Hungary, pp. 249–255.
Google Scholar
4McArdle, B.H. (1994) BACI for Community Ecologists: Permutation Tests for Interaction Terms in Multivariate Analysis of Variance on Dissimilarity Matrices. Invited paper presented at the 4th Conference of The International Environmetrics Society (TIES), Burlington, Ontario, Canada.
Google Scholar
5Pillar, V.D.P. and Orlóci, L. (1996) On randomization testing in vegetation science: multifactor comparisons of relevé groups. J. Veg. Sci., 7, 585–592.
10.2307/3236308
Web of Science®Google Scholar
6Gower, J.C. and Krzanowski, W.J. (1999) Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. Appl. Stat., 48, 505–519.
10.1111/1467-9876.00168
Web of Science®Google Scholar
7Legendre, P. and Anderson, M.J. (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol. Monogr., 69, 1–24.
10.1890/0012-9615(1999)069[0001:DBRATM]2.0.CO;2
Web of Science®Google Scholar
8McArdle, B.H. and Anderson, M.J. (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology, 82, 290–297.
10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2
CASWeb of Science®Google Scholar
9Anderson, M.J. (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol., 26, 32–46.
10.1111/j.1442-9993.2001.01070.pp.x
PubMedWeb of Science®Google Scholar
10Edgington, E.S. (1995) Randomization Tests, 3rd edn, Marcel Dekker, New York.
Google Scholar
11Edgington, E.S. and Onghena, P. (2007) Randomization Tests, 4th edn, CRC Press, Taylor & Francis Group, LLC, Boca Raton, FL, USA.
10.1201/9781420011814
Google Scholar
12Excoffier, L., Smouse, P.E., and Quattro, J.M. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131, 479–491.
10.1111/j.1365-2656.2006.01186.x
CASPubMedWeb of Science®Google Scholar
13Gower, J.C. and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. J. Classification, 3, 5–48.
10.1007/BF01896809
Web of Science®Google Scholar
14Johnson, R.A. and Wichern, D.W. (1992) Applied Multivariate Statistical Analysis, 3rd edn, Prentice-Hall, Englewood Cliffs, NJ, USA.
Google Scholar
15Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. (1996) Applied Linear Statistical Models, 4th edn, Irwin, Chicago, IL, USA.
Web of Science®Google Scholar
16Verdonschot, P.F.M. and ter Braak, C.J.F. (1994) An experimental manipulation of oligochaete communities in mesocosms treated with chlorpyrifos or nutrient additions: multivariate analyses with Monte Carlo permutation tests. Hydrobiologia, 278, 251–266.
10.1007/BF00142333
Web of Science®Google Scholar
17Legendre, P. and Legendre, L. (2012) Numerical Ecology, Elsevier, Amsterdam, the Netherlands, 3rd English edition.
10.1016/B978-0-444-53868-0.50018-6
Web of Science®Google Scholar
18Gittins, R. (1985) Canonical Analysis: A Review with Applications in Ecology, Springer-Verlag, Berlin, Germany.
10.1007/978-3-642-69878-1
Google Scholar
19Fisher, R.A. (1924) On a distribution yielding the error functions of several well-known statistics. Proc. Int. Congr. Math. Toronto, 2, 805–813.
Google Scholar
20Anderson, M.J. and Millar, R.B. (2004) Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand. J. Exp. Mar. Biol. Ecol., 305, 191–221.
10.1016/j.jembe.2003.12.011
Web of Science®Google Scholar
21Manly, B.F.J. (1997) Randomization, Bootstrap and Monte Carlo Methods in Biology, 2nd edn, Chapman & Hall, London, UK.
Google Scholar
22Manly, B.F.J. (2006) Randomization, Bootstrap and Monte Carlo Methods in Biology, 3rd edn, Chapman & Hall, London, UK.
10.1111/j.1365-2745.2005.01082.x
Google Scholar
23Anderson, M.J. and Legendre, P. (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J. Stat. Comput. Simul., 62, 271–303.
10.1080/00949659908811936
Web of Science®Google Scholar
24Anderson, M.J. and Robinson, J. (2001) Permutation tests for linear models. Aust. N. Z. J. Stat., 43, 75–88.
10.1111/1467-842X.00156
Web of Science®Google Scholar
25Anderson, M.J. (2001) Permutation tests for univariate or multivariate analysis of variance and regression. Can. J. Fish. Aquat. Sci., 58, 626–639.
10.1139/f01-004
PubMedWeb of Science®Google Scholar
26Anderson, M.J. and ter Braak, C.J.F. (2003) Permutation tests for multi-factorial analysis of variance. J. Stat. Comput. Simul., 73, 85–113.
10.1080/00949650215733
Web of Science®Google Scholar
27Freedman, D. and Lane, D. (1983) A nonstochastic interpretation of reported significance levels. J. Bus. Econom. Stat., 1, 292–298.
10.2307/1391660
Google Scholar
28Pesarin, F. (2001) Multivariate Permutation Tests with Applications in Biostatistics, John Wiley & Sons, New York, USA.
Web of Science®Google Scholar
29Anderson, M.J. and Robinson, J. (2003) Generalized discriminant analysis based on distances. Aust. N. Z. J. Stat., 45, 301–318.
10.1111/1467-842X.00285
Web of Science®Google Scholar
30Fisher, R.A. (1935) Design of Experiments, Oliver & Boyd, Edinburgh, Scotland.
Google Scholar
31Kempthorne, O. (1966) Some aspects of experimental inference. J. Am. Stat. Assoc., 61, 11–34.
10.1080/01621459.1966.10502007
Web of Science®Google Scholar
32Clarke, K.R., Somerfield, P.J., and Chapman, M.G. (2006) On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. J. Exp. Mar. Biol. Ecol., 330, 55–80.
10.1016/j.jembe.2005.12.017
Web of Science®Google Scholar
33Anderson, M.J. and Santana-Garcon, J. (2015) Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecol. Lett., 18, 66–73.
10.1111/ele.12385
PubMedWeb of Science®Google Scholar
34Anderson, M.J., Ellingsen, K.E., and McArdle, B.H. (2006) Multivariate dispersion as a measure of beta diversity. Ecol. Lett., 9, 683–693.
10.1111/j.1461-0248.2006.00926.x
PubMedWeb of Science®Google Scholar
35Anderson, M.J. and Walsh, D.C.I. (2013) What null hypothesis are you testing? PERMANOVA, ANOSIM and the Mantel test in the face of heterogeneous dispersions. Ecol. Monogr., 83, 557–574.
10.1890/12-2010.1
Web of Science®Google Scholar
36Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979) Multivariate Analysis, Academic Press, London, UK.
Google Scholar
37Seber, G.A.F. (1984) Multivariate Observations, John Wiley & Sons, New York, USA.
10.1002/9780470316641
Web of Science®Google Scholar
38Anderson, M.J. (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics, 62, 245–253.
10.1111/j.1541-0420.2005.00440.x
PubMedWeb of Science®Google Scholar
39Levene, H. (1960) Robust tests for equality of variances, in Contributions to Probability and Statistics (eds I. Olkin, S.G. Ghurye, W. Hoeffding, W.G. Madow, and H.B. Mann), Stanford University Press, Stanford, California, USA, pp. 278–292.
Google Scholar
40van Valen, L. (1978) The statistics of variation. Evol. Theory, 4, 33–43. (Erratum Evolutionary Theory 4, 202)
Google Scholar
41Manly, B.F.J. (1994) Multivariate Statistical Methods: A Primer, 2nd edn, Chapman and Hall, Boca Raton, FL, USA.
Google Scholar
42Anderson, M.J., Walsh, D.C.I., Clarke, K.R., et al. (2017) Some solutions to the multivariate Behrens–Fisher problem for dissimilarity-based analyses. Aust. N. Z. J. Stat., 59, 57–79.
10.1111/anzs.12176
Web of Science®Google Scholar
43Brown, M.B. and Forsythe, A.B. (1974) The small sample behaviour of some statistics which test the equality of several means. Technometrics, 16, 129–132.
10.1080/00401706.1974.10489158
Web of Science®Google Scholar
44Searle, S.R. (1987) Linear Models for Unbalanced Data, John Wiley & Sons, New York, USA.
Google Scholar
45Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multi-model Inference: A Practical Information-theoretic Approach, 2nd edn, Springer, New York.
Web of Science®Google Scholar
46Searle, S.R., Casella, G., and McCulloch, C.E. (1992) Variance Components, John Wiley & Sons, New York, USA.
10.1002/9780470316856
Google Scholar
47Cornfield, J. and Tukey, J.W. (1956) Average values of mean squares in factorials. Ann. Math. Stat., 27, 907–949.
10.1214/aoms/1177728067
Web of Science®Google Scholar
48Hartley, H.O. (1967) Expectations, variances and covariances of ANOVA mean squares by ‘synthesis’. Biometrics, 23, 105–114.
10.2307/2528284
PubMedWeb of Science®Google Scholar
49Rao, J.N.K. (1968) On expectations, variances, and covariances of ANOVA mean squares by ‘synthesis’. Biometrics, 24, 963–978.
10.2307/2528883
Web of Science®Google Scholar
50Anderson, M.J., Diebel, C.E., Blom, W.M., and Landers, T.J. (2005) Consistency and variation in kelp holdfast assemblages: spatial patterns of biodiversity for the major phyla at different taxonomic resolutions. J. Exp. Mar. Biol. Ecol., 320, 35–56.
10.1016/j.jembe.2004.12.023
Web of Science®Google Scholar
51Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and their Application, Cambridge University Press, Cambridge, UK.
10.1017/CBO9780511802843
Google Scholar
52Terlizzi, A., Anderson, M.J., Fraschetti, S., and Benedetti-Cecchi, L. (2007) Scales of spatial variation in Mediterranean subtidal sessile assemblages at different depths. Mar. Ecol. Prog. Ser., 332, 25–39.
10.3354/meps332025
Web of Science®Google Scholar
53Anderson, M.J., Crist, T.O., Chase, J.M., et al. (2011) Navigating the multiple meanings of β diversity: a roadmap for the practicing ecologist. Ecol. Lett., 14, 19–28.
10.1111/j.1461-0248.2010.01552.x
PubMedWeb of Science®Google Scholar
54Gower, J.C. (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika, 53, 325–338.
10.1093/biomet/53.3-4.325
Web of Science®Google Scholar
55Sammon, J.W. (1969) A nonlinear mapping for data structure analysis. IEEE Trans. Comput., 18, 401–409.
10.1109/T-C.1969.222678
Web of Science®Google Scholar
56Borg, I. and Groenen, P.J.F. (2005) Modern Multidimensional Scaling, 2nd edn, Springer, New York.
Google Scholar
57Kruskal, J.B. and Wish, M. (1978) Multidimensional Scaling, Sage Publications, Beverly Hills, CA, USA.
10.4135/9781412985130
Google Scholar
58Anderson, M.J., Ford, R.B., Feary, D.A., and Honeywill, C. (2004) Quantitative measures of sedimentation in an estuarine system and its relationship with intertidal soft-sediment infauna. Mar. Ecol. Prog. Ser., 272, 33–48.
10.3354/meps272033
Web of Science®Google Scholar
59Fletcher, D.J. and Underwood, A.J. (2002) How to cope with negative estimates of components of variance in ecological field studies. J. Exp. Mar. Biol. Ecol., 273, 89–95.
10.1016/S0022-0981(02)00142-9
Web of Science®Google Scholar
60Winsor, C.P. and Clarke, G.L. (1940) A statistical study of variation in the catch of plankton nets. J. Mar. Res., 3, 1–34.
PubMedGoogle Scholar
61Snedecor, G.W. (1946) Statistical Methods, 4th edn, Iowa State College Press, Ames, IA, USA.
Google Scholar
62Gelman, A. (2005) Analysis of variance – why it is more important than ever. Ann. Stat., 33, 1–53.
10.1214/009053604000001048
Web of Science®Google Scholar

Citing Literature

Wiley StatsRef: Statistics Reference Online

Browse other articles of this reference work: