Robustness of CCA



Like all analytical techniques, CCA makes certain assumptions about nature. In particular, the interpretation of species scores as species "optima" assumes that the abundance of a species is a symmetrical unimodal function of position along environmental gradients (ter Braak 1987). Of course, we cannot assume a priori that species are distributed in this way. Therefore, the true test of CCA is whether it performs well when the assumptions are violated. In a study of simulated species distributions, I determined that CCA performs quite well even when the distributions of species along environmental gradients were skewed (Palmer 1993).

One particular case where the unimodal model fails is where the prime differences between sites are due to differences in the species pool caused by vicariance events.  If different valleys, for example, have different species due to their geomorphological history rather than their environments, we would not expect valleys (and species) to be neatly arranged along a gradient.  In general, CCA will be relatively successful in placing such sites in relation to their similarity, but we cannot interpret the axes to represent gradients.

The traditional view of CA and CCA as "unimodal methods" has been questioned by ter Braak and Smilauer (1998), who demonstrate that CA and CCA  have two "faces": a unimodal face and a linear face.  Therefore, we do not need to be too concerned whether our underlying model is linear vs. unimodal before applying CA and CCA.  However, we do need to be aware that the linear face of CA/CCA focusses on species composition (i.e. relative data), rather than overall trends in abundance.  PCA and RDA would be more appropriate for such data.

There will always be some inherent variability or "noise" in vegetation data, even for plots with identical environmental conditions (Gauch 1982). This noise could arise from errors in data collection, stochastic variation in the location of individuals within a stand, or site-specific variation in history. Alternatively, noise can be considered unexplained or residual variance in species composition. Regardless of its cause, noise in species abundance does not seriously affect the performance of CCA (Palmer 1993).

Noise in environmental variables, however, is another matter. In regression, it is assumed that the independent variables are measured without error, and that the dependent variable has some error associated with it. Since CCA is a special case of multiple linear regression, noise in the environmental variables (i.e. the independent variables) can affect CCA. According to McCune (1997), these effects can be quite strong.  This is, of course, not surprising, as the CCA axes are linear combinations of the environmental variables - they will be a function of the noise as well as of the "true" or noiseless values of the variables.

It is arguable that in constrained ordination, the species scores and the environmental variable scores are much more interesting than the sample scores.  This is because our samples in gradient analysis are merely 'tools' to uncover the relationship between species and the environment (and because sample scores are simple functions of the environmental variables, they do not offer much unique information).   It has not yet been investigated how severely noise in environmental variables affects species scores and environmental scores.

Strong correlation among variables often poses a problem for multivariate analysis. For example, soil pH is often strongly correlated with soil calcium, and because of this, species distributions along a calcium gradient will be almost identical to distributions along a pH gradient, even if one of the two variables was ecologically unimportant. Fortunately, the arrangement of plots and species in a CCA diagram is not much affected by strong intercorrelations.

Unfortunately, neither CCA nor any other analytical methods are able to tell you exactly which variable is the "real" one.

It is possible that highly correlated variables explain slightly different aspects of species composition. For example, even though soil pH and soil calcium are highly correlated, it is possible that the slight amount of variation present in calcium at a fixed level of pH is strong enough to cause differences in species composition. CCA is capable of detecting independent effects of such highly intercorrelated variables.

CCA's indirect gradient analysis analogue, Correspondence Analysis (CA), suffers from a mathematical artifact known as the arch effect. This lead to the development of Detrended Correspondence Analysis (DCA), which eliminates the arch effect (Gauch 1982). Fortunately, it appears that CCA may not be seriously affected by the arch effect (Palmer 1993), so it is not in general advisable to perform a Detrended Canonical Correspondence Analysis (DCCA). However, since I wrote my 1993 paper, I have become aware (i.e. my colleagues finally overrode my stubbornness) that there are circumstances in which the arch effect can arise as an artifact.

CCA eigenvalues can in some senses be interpreted as variances. This has lead to the practice of variance partitioning (Borcard et al 1992, ěkland and Eilertsen 1994), which is directly analogous to ANOVA. I believe there are serious problems in doing so, because the "arch effect" has some variance associated with it. This arch variance may be in principle impossible to factor out.  See variance explained and variance partitioning for more details.

Stepwise CCA is a very useful tool for determining a limited subset of environmental variables which explains species composition well. However, users are warned not to read too much into the "significance" of environmental variables in stepwise CCA or any other stepwise analyses. Indeed, purely random variables will frequently turn up "significant" in a stepwise CCA.  See Reducing the number of environmental variables for more details.

McCune (1997) pointed out that adding purely random variables will increase the "variation explained" by CCA.  This is exactly the same situation that occurs with multiple regression, and should not be considered a problem.

Certainly, there are many other limitations to CCA. Almost all of these problems are shared with any multiple regression method, and therefore by any form of direct gradient analysis.


References cited

(see also suggested references for self-education)

Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling out the spatial component of ecological variation. Ecology 73:1045-55.

Gauch, H. G., Jr. 1982. Multivariate Analysis and Community Structure. Cambridge University Press, Cambridge.

Mccune, B. 1997. Influence of noisy environmental data on canonical correspondence analysis. Ecology 78:2617-2623

Palmer, M. W. 1993. Putting things in even better order: the advantages of canonical correspondence analysis. Ecology 74:2215-30.

ter Braak, C. J. F. 1987. Unimodal models to relate species to environment. Agricultural mathematics group, the Netherlands.

ter Braak, C. J. F., and P. Smilauer. 1998. CANOCO Reference Manual and User's Guide to Canoco for Windows:
Software for Canonical Community Ordination (version 4). Microcomputer Power (Ithaca, NY USA) 352 pp.

ěkland, R. H., and O. Eilertsen. 1994. Canonical correspondence analysis with variation partitioning: some comments and an application. J. Veg. Sci. 5:117-26.



This page was created and is maintained by Michael Palmer.
 To the ordination web page