A GLOSSARY OF ORDINATION-RELATED TERMS

Jump to:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Literature Cited

Are there other terms I should add to the list? Can you provide any additional or better definitions? If so, please let me know at: mike.palmer@okstate.edu

Note: Pielou (1984) has a good glossary. Although they do not contain glossaries, Legendre and Legendre (1998) and ter Braak and Šmilauer (1998) have good definitions of key terms.

The Glossary

Abundance: any measure of the amount of an organism. Can include density, biomass, frequency, cover, presence/absence, etc. See species abundances in ordination.

Arch effect - a distortion or artifact in an ordination diagram, in which the second axis is an arched function of the first axis. It is caused by the unimodal distribution of species along gradients. The arch appears in Correspondence Analysis and other ordination techniques. One of the main purposes of Detrended Correspondence Analysis is to remove the arch effect. Principal Components Analysis creates a more serious artifact called the horseshoe effect.

Axis - I haven't the foggiest idea how to define this. Any suggestions? Axes are the basic structure of the Cartesian coordinate system, and are usually portrayed as being at right angles (i.e. orthogonal) to each other, though non-Euclidean coordinate systems also exist. With respect to the use of more exotic coordinate systems in ordination, see Camiz (1991)

Beta Diversity - Also called species turnover or differentiation diversity. Beta diversity is a measure of how different samples are from each other, and/or how far apart they are on gradients of species composition. Alternatively, it is a measure of the "length" of an ecological gradient or ordination axis, in terms of species composition. Total beta diversity can be compared among gradients, but not per unit (e.g. one cannot compare whether the rate of change is higher along a pH gradient than along a moisture gradient, but the total change along the gradients can be assessed). An axis or gradient with high beta diversity will have completely different species compositions (i.e. share no species) at opposite ends (indeed, the ends might be completely different from the middle). An axis or gradient with low beta diversity will be similar in species composition at both ends. Some ordination techniques (e.g. PCA) behave best at low beta diversity, and others (e.g. DCA, CCA) behave best at high beta diversity. Beta diversity is one of the most misunderstood concepts in community ecology, and it has been defined in numerous ways in the past. The rescaling algorithm of DCA provides a measure of beta diversity. For more discussion, see Explorations in Coenospace.

BACI Design - "Before - After - Controlled - Impact" design. Refers to a study in which quadrats (or other samples) are studied through time, and some of the quadrats are subjected to an experimental treatment or treatments. Ideally, the statistics used for a BACI design will be able to distinguish the effects of treatment and time.

Biplot - an ordination diagram which simultaneously plots species scores and sample scores. It is only relevant for ordination techniques such as CCA, DCA, CA, PCA, and RDA. Distance-based ordination techniques do not result in the simultaneous ordination of both species and samples.

Biplot arrow - a representation of a variable (usually an environmental variable) on a biplot. The arrow points in the direction of maximum correlation, and the length of the arrow is related to the strength of the correlation. In general, the longer the arrow, the more highly related that variable is to species composition.

Bootstrap - A reasonably new computer-intensive method to obtain confidence intervals, to estimate parameters, or in some cases to test hypotheses. The bootstrap is considered a "Resampling method", and is allied to the Jackknife and to randomization tests. Introductions to bootstrapping for ecologists are given in Manly (1993) and Potvin and Roff (1993) . Knox and Peet (1989) apply bootstrapping to DCA.

Bray-Curtis Ordination - A synonym of polar ordination

CA - The acronym for Correspondence Analysis

CANOCO - A computer program, modified from DECORANA by C.J.F. ter Braak, which performs a wide variety of ordinations such as Canonical Correspondence Analysis, Correspondence Analysis, Detrended Correspondence Analysis, etc.

CANODRAW - A computer program written by Petr Šmilauer designed to graph the output of CANOCO. CANODRAW's functions are now incorporated into CANOCO.

CanoImp - A computer program written by Petr Šmilauer that converts spreadsheet blocks, copied into the Windows clipboard, into proper format for input into CANOCO. Canoimp is the console version, and WCanoImp is the Windows95 version. These functions are now seamlessly integrated into CANOCO.

Canonical Analysis - A term which appears in the literature a number of times, often with different meanings. It has been used as a synonym for canonical correlation and Canonical Correspondence Analysis. It is probably best reserved as a generic term referring to any method which links one set of variables to at least one other set of variables, and would thus include canonical correspondence analysis, canonical correlation, redundancy analysis, etc.

Canonical Correlation - Often confused with Canonical Correspondence Analysis. It is a technique which finds the linear combination of one set of variables which is maximally correlated with a linear combination of another set of variables. Canonical correlation is closely related to PCA. See Gittens (1987) for an ecological application.

Canonical Correspondence Analysis - A widely used method for direct gradient analysis, best developed by C.J.F. ter Braak (see Jongman et al. 1987, ter Braak and Prentice 1988, and many of the links on the ordination web page). CCA assumes that species have unimodal distributions along environmental gradients.

CANOPOST - A Windows program that takes the results of CANODRAW and produces publication-quality output.

Categorical Variable - A variable that is represented by several different types; for example: lake/river/stream, farm/pasture/unmanaged, pitfall trap/fence trap/direct sighting. For most multivariate analyses, categorical variables must be converted to k-1 dummy variables (where k = the number of categories). See Environmental variables in CCA

CCA - The Acronym for Canonical Correspondence Analysis

Centroid - the (weighted) mean of a multivariate data set. Can be represented by a vector. For many ordination techniques, the centroid is a vector of zeros (that is, the scores are centered and standardized). In a direct gradient analysis, a categorical variable is often best represented by a centroid in the ordination diagram. See Centroids and Inertia.

Classification - The act of putting things in groups. Most commonly in community ecology, the "things" are samples or communities. Classification can be completely subjective, or it can be objective and computer-assisted (even if arbitrary). Hierarchical classification means that the groups are nested within other groups. There are two general kinds of hierarchical classification: divisive and agglomerative. A Divisive method starts with the entire set of samples, and progressively divides it into smaller and smaller groups. An agglomerative method starts with small groups of few samples, and progressively groups them into larger and larger clusters, until the entire data set is sampled. Pielou (1984) gives a good introduction to various classification methods.

Clustering - sometimes simply a synonym of classification, but more usually referring to agglomerative classification.

Coenocline - a simultaneous portrayal of all species response curves along an environmental gradient (presumably, an important one). This is probably the most common category of graphs in all of community ecology. Ecological continuum is a synonym. See Explorations in Coenospace.

Coenoplane - a simultaneous portrayal of all species response surfaces along two (presumably important) environmental gradients.

Coenospace - a simultaneous portrayal of all species response surfaces along an unspecified number of gradients. It is difficult for mere mortals to visualize more than three such gradients simultaneously. Fortunately, there are rarely more than three important axes or dimensions in most ecological data sets. The concept of coenospace is closely allied to Hutchinson's multidimensional niche.

Correlation - A method which determines the strength of the relationship between variables, and/or a means to test whether the relationship is stronger than expected due to the null hypothesis. Usually, we are interested in the relationship between two variables, x and y. The correlation coefficient r is one measure of the strength of the relationship.

Correlation Coefficient - usually abbreviated r. A number which reflects the strength of the relationship between two variables. It varies between -1 (for a perfect negative relationship) to +1 (for a perfect positive relationship). If variables are standardized to have zero mean and a unit standard deviation, then r will also be the slope of the relationship. The value r² is known as the coefficient of determination; it varies between 0 and 1. The coefficient of determination is loosely interpreted as "the proportion of variance in y which can be explained by x".

Correlation Matrix - a square, symmetric matrix consisting of nothing but correlation coefficients. The rows and the columns represent the variables. The diagonal elements are all equal to 1, for the simple reason that the correlation coefficient of a variable with itself equals 1. The correlation matrices given in CANOCO usually differ slightly from those calculated in basic statistical packages. This is because CANOCO uses weighted correlations (i.e. samples with a higher summed abundance of all species will have more influence in the calculation).

Correspondence Analysis - An eigenanalysis-based ordination method, also known as reciprocal averaging. See Correspondence Analysis.

Correspondence Analysis has been discovered independently by different scientists.
Reciprocal Averaging means that sample scores are calculated as a weighted average (or centroid) of species scores, and species scores are calculated as a weighted average (or centroid) of sample scores, and iterations continue until there is no change. However, other algorithms are possible.
Correspondence Analysis simultaneously ordinates species and samples. There are as many axes as there are species or samples, whichever is less.
The number of axes worth interpreting is a matter of taste, but the size of eigenvalues can be a guide.
Correspondence Analysis maximizes the correlation between species scores and sample scores. The eigenvalue is equal to the correlation coefficient. The eigenvectors are either species scores or sample scores.
An eigenvalue of 1.0 implies that one sample (or group of samples) shares no species with all other samples.
One can put new points in a Correspondence Analysis without affecting the ordination.
As with all the other eigenanalysis techniques, it is possible to define "passive samples" or "passive species".
Correspondence Analysis has a problem: the arch effect. This effect is caused by nonlinearity of species response curves.
The arch is not as serious as the horseshoe effect of PCA, because the ends of the gradient are not convoluted.
Another related problem of Correspondence Analysis is that the ends of the gradient are compressed.
Detrended Correspondence Analysis was designed to correct for the arch effect and gradient compression, as described above.

Covariable - refers to a variable (in the context of DGA, an environmental variable) which for some reason the investigator wishes to "factor out". This is usually either a nuisance variable, or an important variable which is not of immediate interest. A covariable can be used to specify a block effect or site effect (in which case it is usually a dummy variable), if treatments are of most interest. See Partial Analysis or Partial Ordination

Covariance Matrix - a square, symmetric matrix in which the rows and columns are variables, and the entries are covariances. The diagonal elements (i.e. the covariance between a variable and itself) will equal the variances.

DCA - The acronym for Detrended Correspondence Analysis

DECORANA - A computer program, now outdated by CANOCO, for performing Detrended Correspondence Analysis and Correspondence Analysis.

Detrended Canonical Correspondence Analysis (DCCA) - The detrended form of Canonical Correspondence Analysis.

Detrended Correspondence Analysis (DCA) - an eigenanalysis-based ordination technique derived from correspondence analysis (Hill and Gauch 1980) DCA performs detrending to counteract the arch effect, a defect of correspondence analysis. DCA also (optionally) performs rescaling of ordination axes, so that the spacing of sample (and species) scores along the axes are scaled in units of beta diversity. See Detrended Correspondence Analysis.

Detrending - A method employed in DCA and DCCA to remove the arch effect. Axes are divided into segments, and the sample scores of higher axes are reassigned to be centered around the centroid. See Detrended Correspondence Analysis for a brief description. More thorough descriptions are given in Gauch 1982, Pielou 1984 and Kent and Coker 1992.

Dimension - This is a difficult term to define precisely in a comprehensible way. However, it is possible to grasp at a more intuitive level. It is the number of axes in a Cartesian coordinate system or the number of variables (unless some variables are linear combinations of other variables). Even though there are often a large number of dimensions, there are usually only a small number of important dimensions. A related concept to dimension is the rank of a matrix. The rank is "the number of dimensions of a space in which the data points lie" (Pielou 1984)

Direct Gradient Analysis - Any gradient analysis in which the important gradients are known and measured. Direct gradient analysis is commonly performed using nonlinear regression, or using a technique such as Canonical Correspondence Analysis. In contrast, see indirect gradient analysis.

Discriminant Analysis - A technique related to ordination, which is used in many fields other than ecology. Digby and Kempton (1987) provide a good discussion. Discriminant Analysis tells us whether a particular set of variables is useful in discriminating previously delineated groups. Canonical Variates Analysis (CVA) is a form of discriminant analysis which is actually a special case of Canonical Correspondence Analysis in which the classes are coded as dummy variables.

Dissimilarity Matrix - see distance matrix.

Distance Decay - the property by which two nearby points have more similar characteristics than two distant points. Distance decay violates the basic statistical assumption that samples are independent, and is therefore a special case of pseudoreplication. Distance decay can be quantified using geostatistics.

Distance Matrix - A square and (usually) symmetric matrix in which the rows and the columns represent (usually) samples. The entries represent some index of the difference between samples; the measure could be Euclidean distance, Manhattan (City Block) Distance, Bray-Curtis dissimilarity, the Jaccard coefficient, or any of a huge number of possibilities. The diagonal elements (the difference between a sample and itself) is usually zero. Distance matrices are necessary prerequisites for distance-based ordination methods such as Polar Ordination and Nonmetric Multidimensional Scaling. Distances matrices are closely related to (and easily converted to) similarity matrices.

Downweighting - An option in many ordination programs to dampen the effects of rare species. Downweighting gives weights to species which are related to their abundances. Correspondence Analysis and its derivatives are sensitive to rare species which occur in species-poor areas (see, e.g. ter Braak 1987); downweighting reduces but does not eliminate this problem.

Dummy Variable - a binary variable of 1's and 0's, which is one if the observation belongs to a category and zero if it does not. See also categorical variable and environmental variables in CCA.

Eigenanalysis - the process of finding eigenvectors and eigenvalues. See eigenvector and eigenvalue.

Eigenvalue - a central concept in linear algebra (i.e. matrix algebra). A semitechnical definition is as follows:

"Let A be a p by p matrix and w a p-element vector. If it is true that Aw = l w for some scalar l , then w is an eigenvector of A and l is the corresponding eigenvalue. That is, an eigenvector of a matrix is a vector such that when we multiply the matrix by the vector we get the vector back again except that it has been multiplied by a particular constant, called the eigenvalue." - Cliff (1987).
The process of finding the eigenvectors and eigenvalues of a matrix is known as eigenanalysis. For a square matrix, there are as many eigenvectors and eigenvalues as there are rows and columns in the matrix. The eigenvalues are usually ranked from highest to lowest, and termed the first, second, third, etc. eigenvalues or latent roots.
Why do we care about the eigenvectors and eigenvalues? It turns out that many ordination techniques are based on eigenanalysis. For example, PCA is based upon an eigenanalysis of the correlation or covariance matrix. For eigenanalysis-based methods, the sample scores (or species scores) are typically the eigenvectors of some matrix, and eigenvalues measure the strength of an ordination axis.

Eigenvector - a central concept in linear algebra. Sample scores are often eigenvectors. See eigenanalysis.

Environmental Gradient - a spatially varying aspect of the environment which is expected to be related to species composition. Environmental gradients are the x-axes for coenoclines. I do not know whether gradients which vary only through time could properly be called environmental gradients, though to some degree they can be treated as such in ordination methods. Differences in resource use within a site cannot be considered gradients. Human-imposed effects can be considered environmental gradients.

Environmental Variable - a measure of the environment which is presumably related to an environmental gradient. Environmental variables can be continuous, or they can be represented by dummy variables.

Euclidean Distance - the straight line distance between two points in a Cartesian coordinate system. The Euclidean distance can be determined using the Pythagorean Theorem. In two dimensions, the Euclidean distance is [(x₁-x₂)² + (y₁-y₂)²]^0.5. Usually, the points represent samples and the axes of the Cartesian coordinate system represent the abundances of species. Gauch (1982) has a good description of the various kinds of data space.

Exploratory Analysis - a general term for an analysis in which the chief objectives is to find pattern in the data. Often, exploratory analysis conflicts with hypothesis testing. For example, stepwise regression is permissible in exploratory analysis, but can cause serious problems if you are interested in testing hypotheses. See Hypothesis-Driven and Exploratory Data Analysis.

Factor Analysis - This is a term which has been variously defined. In some treatments it seems to be a synonym of ordination. Sometimes (as in some statistical software) it includes principal components analysis. The following discussion is from Morrison (1967); my comments are in brackets.

"It would seem clear that a new class of techniques [Morrison had just finished discussing partial correlation, multiple correlation, and canonical correlation] will be required for picking apart the dependence structure when the responses are symmetric in nature or no a priori patterns of causality are available [amongst all variables of interest]. Those methods fall under the general heading of factor analysis, for by them one attempts to descry those hidden factors which have generated the dependence or variation in the responses. That is, the observable, or manifest variates are represented as functions of a smaller number of latent factor variates.

According to Gauch (1982), there is a subtle distinction between ordination and factor analysis, which appears consistent with Morrison:

"Factor analysis is similar to principal components analysis, except that instead of trying to account for as much of the total variance as possible, only correlations between variables are of interest as reflecting putative underlying causes or factors"

FORTRAN - one of the earliest computer languages in widespread use in ecology. Most ordination programs were originally written in FORTRAN, including the Cornell Ecology Programs such as DECORANA.

Fuzzy Sets and Fuzzy Set Ordination - Fuzzy sets are sets which allow grades of membership. For example, the set of all high-elevation plots may include no plots at sea level, and all plots on mountain tops, but what about plots at intermediate elevations? Classical set theory would have us define an arbitrary elevation or threshold, above which all plots must belong, and below which no plots belong. Fuzzy set theory would allow a plot to belong with 25% membership (for a relatively low elevation) or 75% (for a relatively high elevation). Fuzzy set theory is currently being used in robotics, computer vision, and artificial intelligence. The application of fuzzy set theory to ecology was developed by Roberts (1986). Fuzzy set ordination is probably best classified as a direct gradient analysis technique, and it bears strong similarities to polar ordination.

Gaussian Curve - The simplest model for a unimodal species response curve (see explorations in coenospace). It has only three parameters, and the equation is:

y = Ae^{-(x-B)^2/C}

where A is the maximum height of the curve, B is the modal location of the curve, and C is a measure of the breadth of the curve (often called niche breadth, tolerance, or standard deviation). The curve is bell-shaped. The difference between a Gaussian Curve and a Normal Distribution is that the latter is a statistical distribution, and hence the area under the curve is constrained to be one, and the y-axis represents frequency.

Gaussian Ordination - A little-used ordination technique which arranges samples along ordination axes such that the fit of the species response curves to the Gaussian curve is maximized. The fit can be measured by r². Ter Braak (1987) shows that CCA can be considered (asymptotically) a special case of Gaussian Ordination.

Geostatistics - A body of analytical techniques for the study of spatial pattern. Geostatistics were largely developed for the mining industry, but they are now widely used in ecology. For an introduction to geostatistics in ecology, see Burrough (1987). There are two interrelated components to geostatistics: variography and spatial interpolation (kriging). See Rossi et al. (1992) for a good introduction to geostatistical applications in ecology.

Gradient - see Environmental Gradient

Gradient Analysis - the study of species distributions along gradients. See direct gradient analysis and indirect gradient analysis.

Guttman Effect - A synonym for the horseshoe effect.

Horseshoe Effect - a distortion in ordination diagrams. It is more extreme than the arch effect because the ends of the first gradient are involuted. The horseshoe effect can be observed for very long gradients in PCA.

Indirect Gradient Analysis - : gradients are unknown a priori, and are inferred from species composition data. The species tell us what the gradients are. Usually performed using an ordination technique such as Detrended Correspondence Analysis.

Inertia - a measure of the total amount of variance in a data set. It is directly related to the physical concept of inertia, which is the tendency for an object in motion to stay in motion, and the tendency for an object at rest to stay at rest. For weighted averaging methods such as DCA and CCA, the inertia is related to the spread of species modes (or optima) in ordination space, rather than the variance in species abundance.

Iteration - Often, a mathematical operation must be repeated again and again (using the output of the operation as the input into the operation the next time around). Each incidence of this operation is termed an iteration.

Jackknife - A (usually) computer-intensive method to estimate parameters, and/or to gauge uncertainty in these estimates. The name is derived from the method that each observation is removed (i.e. cut with the knife) one at a time (or two at a time for the second-order Jackknife, and so on) in order to get a feeling for the spread of data. See Manly (1993) and Dixon (1993) for reviews of the use of the Jackknife and similar methods in ecology.

Kriging - a method of spatial interpolation based upon geostatistics. By "spatial interpolation", we mean estimating the value of a variable at an unsampled location based upon measured values of the same value at known locations. The most common application of kriging is mapping. For example, one can use kriging produce maps of DCA Axis 1 across a landscape even if the landscape is incompletely sampled. CANODRAW is capable of performing some kinds of kriging, mostly for the purpose of drawing isoclines in ordination space. See Burrough (1987) for a brief introduction to kriging.

Latent Root - another name for eigenvalue.

Latent Value - another name for eigenvalue.

Linear Combination - a linear combination of a set of variables is a new variable (y_i) which can be expressed as follows: y_i=å(b_jx_ij), where b_j is the "coefficient" of variable j, and x_ij is value of observation i of variable j. In multiple regression, predicted values of the dependent variable are linear combinations of the independent (or explanatory) variables. In CCA and RDA, sample scores are linear combinations of the environmental variables.

Linear Least Squares - the principle or method by which the fit of a function to data is such that the sum of the squared residuals is minimized. In linear regression, the function is a line.

Mantel Test - a method for comparing matrices to each other, also called "matrix correlation". See Legendre and Fortin (1989) for an introduction to Mantel tests for spatial pattern. Significance can be evaluated using randomization methods.

Matrix - a set of numbers arranged in rows and columns. "An n by m matrix is a rectangular array of elements with n rows and m columns in which not only is the value of an element important, but also its position in the array" (Burden et al. 1981). It is very common to encounter a matrix with the same number of rows as columns; this is called a square matrix. A square symmetric matrix is one which is identical if you "transpose" the matrix (i.e. switch the rows and the columns). The correlation matrix is an example of a square symmetric matrix.

MCPT - an acronym for Monte Carlo Permutation Test
MDS - an acronym for Multidimensional Scaling, but perhaps it is better to avoid this acronym since it has been variously used in the past. See Terminology in Ordination.

Monotonic Distributions - describes species response curves in which species only increase along environmental gradients, or only decrease along environmental gradients. A monotonic distribution can be linear or more complex. Also, species with unimodal distributions may appear to have monotonic distributions may appear to have short gradients if only a small portion of the gradient is sampled. See Explorations in Coenospace. If most species have a monotonic distribution, then it is best to use PCA and RDA, but if most species have unimodal distributions, then it is best to use DCA and CCA.

Monte Carlo Tests - a synonym of randomization tests (at least as commonly used by ecologists). A Monte Carlo permutation test is when the actual data values are maintained, but they are randomly permuted in order to obtain the distribution of the test statistic. Exactly how they are permuted depends on the null hypothesis to be tested. In the simplest use of Monte Carlo permutation tests in CCA, the values for the environmental variables are randomly reassigned to the values for the species data.

MRPP - an acronym for Multiresponse Permutation Procedure.

Multicolinearity - describes the situation in which a number of variables (or perhaps all of them) are highly correlated with each other. This is often considered a problem, and indeed it makes inferential statistics difficult. But it can also be considered a blessing, because redundant data are useful in identifying patterns.

Multidimensional Scaling - nowadays, this is often a synonym for nonmetric multidimensional scaling, but it previously referred to Principal Coordinates Analysis.

Multiple Regression - See multiple regression. A method (usually based on the least squares principle) which attempts to describe or "fit" a measured dependent variable as a function of multiple measured independent variables.

Multiresponse Permutation Procedure- usually abbreviated MRPP. A randomization test that evaluates differences in species composition, based on some distance measure. See, for example Biondini et al. (1988).

Multiscale Ordination - an ordination method which analyses species composition at multiple spatial scales simultaneously. See, for example, Ver Hoef and Glenn-Lewin (1989).

Multivariate Analysis - any analysis which attempts to simultaneously examine the behavior of more than one dependent variable. A multiple regression is not considered a multivariate analysis, since only one dependent (response) variable is studied at a time. Ordination, classification, canonical correlation, and factor analysis are considered multivariate methods. Why use a multivariate analysis instead of multiple univariate analyses? For several reasons:

If there are numerous variables (for example, hundreds of species) multiple univariate analyses are tedious, and the problem of multiple comparisons emerges
Multivariate methods take advantage of joint structure (e.g. intercorrelations) between variables.
Some multivariate methods provide statistical tests of all response variables simultaneously.
If one is interested in community ecology, one must be interested in all species simultaneously, rather than one at a time.
Of course, if one only has a few variables (e.g. 1-3) then it is somewhat artificial to force a multivariate analysis to process the data. As with all of statistics, one should use the simplest analysis possible to answer the question posed.

MRPP - an acronym for Multiresponse Permutation Procedures
NMDS - an acronym for Nonmetric MultiDimensional Scaling

Noise - This term is very difficult to define, but in general it refers to chance variation in nature which interferes with our ability to see pattern and infer processes. In its simplest form, noise is the same thing as statistical error (e.g. the error term in a regression). See Gauch (1982) for a more thorough discussion. Ideally, an ordination method will represent real, important gradients as its first, second, third, etc. axes. Axes which predominantly summarize noise should be among the last axes.

Nominal Variable - A variable which can be represented as a binary: yes/no, on/off, present/absent. A Nominal variable is usually summarized by a dummy variable.

Nonmetric Multidimensional Scaling (NMDS) - The most widely used distance-based ordination method. The user needs to prespecify the number of dimensions, and then the method will minimize the stress (a measure of poorness of fit between the ordination and measured ecological distances). See also distance matrix.

Normal Equations - The equations by which the solution to regression problems are found. The "normal" comes from the concept of "normal lines" in physics, i.e. vectors which are at right angles, and therefore uncorrelated.

Ordination - The simplest definition is "Putting Things in Order", which explains the titles of a series of papers (Wartenberg et al. 1987, Peet et al. 1988, Jackson and Somers 1991, Palmer 1993). For some opinions on what makes a good ordination method, see The ideal ordination method. The origin of the term "ordination" in ecology is attributed to Goodall (1954).

"Ordination is the collective term for multivariate techniques that arrange sites along axes on the basis of data on species composition" (ter Braak 1987)
"The term 'ordination' derives from early attempts to order a group of objects, for example in time or along an environmental gradient. Nowadays the team is used more generally and refers to an 'ordering' in any number of dimensions (preferably few) that approximates some pattern of response of the set of objects. The usual objective of ordination is to help generate hypotheses about the relationship between the species composition at a site and the underlying environmental gradients" (Digby and Kempton 1987)
"Ordination - The ordering of a set of data points with respect to one or more axes. Alternatively, the displaying of a swarm of data points in a two or three-dimensional coordinate frame so as to make the relationships among the points in many-dimensional space visible on inspection" (Pielou 1984).

Orthogonal - At right angles to, or completely uncorrelated with. Usually in ordination, axes are orthogonal to each other. Two orthogonal variables will have a correlation coefficient, (and, for that matter, covariance), equal exactly zero. If two orthogonal variables are standardized, the sum of the products of the variables will equal zero. In many ordinations, it may appear that two axes are correlated with each other (this often creeps up in DCA). However, note it is the WEIGHTED correlations will equal zero - so a single sample with a high weight (i.e. high abundance of all species combined) can counteract the effects of a number of samples with low weight.

Partial Analysis - an analysis (e.g. regression, correlation, ANOVA, ordination) in which the effects of covariables are "factored out" or nullified. Examples of partial analysis include partial correlation, partial DCA, partial CCA, ANCOVA, etc. See Partial Ordination

PC-ORD - A computer program developed by Bruce McCune which provides a wide variety of statistical tests and analyses.

PCA - The acronym for "Principal Components Analysis"

PCoA - The acronym for "Principal Coordinates Analysis"

Permutation Test - a special case of randomization test

Phytosociology - following Kent and Coker (1992), "[the process of] recognizing and defining plant communities". According to some (such as Kent and Coker), the discipline requires a Clementsian world view. However, some would argue that phytosociology is possible within a Gleasonian framework, and that it is necessary for mapping vegetation. Worldwide, the Braun-Blanquet method is the most widely practiced kind of phytosociology. In it, communities are given Latin names just like species are in the Botanical Code and the Zoological Code.

Polar Ordination - Also known as Bray-Curtis ordination. See distance-based ordination methods. One of the first ordination methods to be widely used in ecology. Two sites are chosen as endpoints for each axis (or artificial endpoints can be established), and all the other sites are ordinated relative to these endpoints, based upon their similarity to these endpoints.

Principal Components - The axes of a Principal Components Analysis. The first Principal Component will, ideally, represent the dominant gradient. The second Component will be orthogonal to the first, and will explain some of the residual variation. The third will be orthogonal to the first and second components, and so on.

Principal Components Analysis - see Eigenanalysis-based ordination methods and Principal Components Analysis. Principal Components Analysis (PCA) is an ordination technique which involves an eigenanalysis of the correlation matrix or the covariance matrix. PCA suffers from a serious problem for gradient analysis: the horseshoe effect. This problem is caused by unimodality in the species response curve.

Principal Coordinates Analysis (PCoA) - A distance-based ordination method in which the distances between sites in the ordination diagram is maximally correlated with the ecological distances. Almost any distance matrix can be used, (see Similarity, Difference and Distance) but if the distance measure is Euclidean, PCoA = PCA.

Procrustes Analysis - See Procrustes rotation

Procrustes Rotation - Suppose you had the same objects arranged in two different coordinate systems (e.g. based on two different ordination procedures, or based on different years of data). How can you figure out how well the different ordinations correspond to each other?

The most obvious solution is to test whether the first axis of one ordination is correlated with the first axis of the second, and so on with the second axis, third axis, and higher axes. However, this can be problematic. For example, if the first axis of one technique is highly correlated with the second axis of another, and vice versa, you might still have two ordination which are very similar, but just rotated.
Procrustes rotation solves this problem. Procrustes rotates both data sets, and expands and contracts axes, such that the distance between data points in the two data sets is minimized.
"Procrustes was the leader of a band of robbers in Greek mythology. He was in the habit of putting his victims in a bed - whether they fit in this bed or not. If they were too long for it, he performed radical surgery on their legs to improve the fit. If they were too short, he stretched their limbs so that they became the right length. One of the most important fundamentals of multivariate analysis is respect for the data. The investigator should therefore try to ensure that his rotation to a target is not literally Procrustean. When tempted by the very human desire to confirm our expectations, it may help to remember Procrustes's fate: The hero Theseus "fitted" Procrustes to his own bed as Procrustes had fitted others" - Cliff (1987).
A good discussion of Procrustes rotation of ordination diagrams is given in Digby and Kempton (1987).

Pseudoreplication: a term popularized by Hurlbert (1984). It refers to (usually) field data in which samples are not independent. A hypothetical extreme example is a study of the diatoms in a polluted lake and an unpolluted lake If 1000 samples are taken from each lake, we cannot consider these to be true replicates to test for a pollution effect. This is because we do not know whether the lakes are different for reasons other than pollution (actually, we do know: no two lakes can be identical!). A less extreme example of pseudoreplication is ordinary spatial dependence. Pseudoreplicated data are rampant in ecology, and the problem is to some degree unavoidable.

Q-Mode - Q-mode and R-mode refer to ordinations of sites and species, respectively. These terms are most used in the context of correspondence analysis and related methods. It turns out that R-mode and Q-mode analyses give identical results in correspondence analysis; see Digby and Kempton (1987) and ter Braak (1987).

R-Mode - see Q-Mode

RA - an acronym for "Reciprocal Averaging"

Randomization Test - See also Randomization Tests. The purpose of inferential statistics is to evaluate whether a number which summarizes something of interest, is greater than (or less than) one would expect just due to chance (i.e. if H0 is true). This number can be one of the well-known parametric statistics (t, F, chi-squared, r, etc.), or nonparametric statistics (Mann-Whitney U, Spearman r, etc.), BUT

Sometimes there is no theory relating a statistic to a distribution, or
The problem is too difficult for nonparametric statistics to be developed, or
Distributional assumptions cannot be met.

Often it is possible to get around this problem by the use of randomization tests (also called Monte Carlo tests or permutation tests). These are related, but not the same thing, as the Bootstrap and the Jackknife methods. Randomization tests are good for statistical inference, but not so good for developing confidence intervals or for model building.

The procedure for a randomization test is:

Devise a test statistic which is large if your hypothesized process is strong, and small if it is weak (you could do it the other way around, but let us ignore this for now).
Define your null hypothesis.
Create a new data set consisting of your data, randomly rearranged. Exactly how it is rearranged depends on your null hypothesis.
Calculate your test statistic for this data set, and compare it to your true value.
Repeat steps 3 and 4 many times (preferably several hundred).
If your true test statistic is greater than 95% of the random values, then you can reject the null hypothesis at p<0.05. (be careful about whether you are performing a one tailed vs. two tailed test - if the latter, you will need to use a 97.5% cutoff).

This method may seem somewhat magical, or even circular - how can you get any information out of randomness? It is because you are answering the question directly: "How likely is it that if the null hypothesis were true, I would observe a value this extreme just due to chance." It is worth knowing that Fisher used randomization tests to test the value of the t-test, F-tests, etc. Many people are now promoting the use of randomization tests even when parametric and nonparametric tests exist. Statistical Educators are beginning to use randomization tests as the introduction to statistics, because in many ways it is easier to grasp. See Manly (1992) for more information about randomization tests in ecology.

RDA - an acronym for Redundancy Analysis

Reciprocal Averaging - another name for Correspondence Analysis, OR one particular algorithm for obtaining the Correspondence Analysis solution.

Redundancy - The property of data with much repeated information. Ecological data are typically quite redundant. For example, a stream which has a certain fish species which likes fast currents is likely to have other fish species which like fast currents. Also, it is somewhat less likely to have species which like slow currents. A statistical tendency for certain groups of species to be either negatively or positively associated causes this redundancy, and the most common cause of redundancy is that species have particular environmental requirements. If redundancy did not exist, multivariate methods would fail.

Redundancy Analysis (RDA) - a multivariate direct gradient analysis method in which species are presumed to have linear relationships to environmental gradients (i.e. linear species response curves). Like CCA, the results of RDA can be expressed in a triplot, i.e. a plot of sample scores, species scores, and environmental arrows. Unlike CCA, the species scores in RDA are most accurately represented by arrows (that is, the direction in which that species is increasing in abundance).

Regression - A function (alternatively, a method by which this function is found) relating one or more dependent variables to one or more independent variables. CCA and multiple regression are only two of many kinds of regression. The original use of the term "regression" refers to "regression back to the mean". For example, it was observed that the sons of tall fathers tended to be shorter, on average, than their fathers. This regression effect poses some serious problems for ecological monitoring (See Palmer 1993b).

Regression Coefficient - A parameter which is estimated in most kinds of regression. In multiple regression, CCA, and RDA, there is a regression coefficient associated with each independent variable.

Rescaling - The stretching and compression of coenoclines to a standardized beta diversity. The ability to rescale allows us to convert skewed species response curves to symmetrical ones. It can also allow us to change leptokurtic (tightly peaked) or platykurtic (flat-topped) curves to normal ones. This is why we sometimes don't need to care too much if the Gaussian model doesn't work perfectly. A form of rescaling is performed in Detrended Correspondence Analysis (optionally).

Residual - the observed value minus the expected, predicted, or modeled value. In least-squares regression methods, a line is fit to data such that the sum of the squares of the residuals is minimized.

Sample Score - same as site score and stand score. A coordinate along an ordination axis specifying the location of a sample. It is the goal of an ecologist to determine whether sample scores are related to environmental gradients. Ideally, sample scores represent the position of communities along the coenocline.

Segments - In DCA, axes are divided into segments prior to detrending. It is thought that the choice of number of segments will have a large impact on the results of DCA. Also, there has been a minor bug reported in the detrending algorithm of DECORANA and CANOCO. See the ordination web page for links related to this bug.

Semivariogram - see variogram

Similarity Index - A measure of the similarity of species composition between two samples. Examples include the Sørensen coefficient and the Jaccard coefficient. Most similarity indices have values of zero for samples that share absolutely no species, and 1 or 100% for samples which have identical species composition.

Similarity Matrix - a square and (usually) symmetric matrix in which the entries are similarities between samples. Similarity matrices are easily produced from, or converted into, distance matrices. The diagonal entries are usually 1 or 100%, meaning a sample is usually 100% similar to itself.

Singular Matrix - a square matrix which cannot be inverted. In multivariate methods, a singular matrix can occur if one variable is precisely a linear combination of the other variables. This may occur if data are expressed in a percentage basis, or there are is a categorical variable expressed as a series of dummy variables. See environmental variables in CCA. Most multivariate methods are not able to cope with singular matrices; this is the matrix equivalent of dividing by zero. CANOCO is able to recognize and make corrections for singular matrices, but many other software packages are not.

Singular Value Decomposition - a way of manipulating matrices which is similar to, and ultimately equivalent to, eigenanalysis. This is the approach illustrated in Digby and Kempton (1987).

Site Score - same as sample score.

Spatial Autocorrelation - a synonym for spatial dependence.

Spatial Dependence - the value of a variable at a given point depends on the value of that variable at other points. Spatial dependence violates the basic assumption of most statistics that observations are independent. It thus can lead to pseudoreplication. The most common form of spatial dependence is distance decay. Although problematic in some senses, spatial dependence is indispensable for geostatistics and spatial interpolation techniques such as kriging.

Species Response Curve - a graphical portrayal of the abundance or performance of a species as a function of an environmental gradient. Some ordination methods assume that species response curves are linear, others assume they are unimodal. See, for example, ter Braak and Prentice (1988). A species response surface related the species abundance to two or more gradients simultaneously.

Species Score - A coordinate along an ordination axis specifying the location of a species. In weighted-averaging ordination methods such as CA, CCA, and DCA, the species score represents the centroid of the species, or the mode of the unimodal species response curve. Species scores help one interpret ordination axes in indirect gradient analysis.

Stand Score - a synonym of sample score.

Standardization - a way of scaling variables so that different variables, measured in different units, can be compared. See also the end of Basic Statistical Concepts.

The most common forms of standardization include ranking, logarithmic transformations, placing on a 0-1 scale (according to the formula [x-min]/[max-min]; this is used in Fuzzy Set Ordination), and subtracting the mean and dividing by the standard deviation. The last two kinds of standardization produce variables which are perfectly correlated (r=1) with the raw data. The last kind is by far the most common, and unless otherwise stated, is what should be assumed when you hear "standardized variables".
Standardization is a form of transformation, but not all transformations are standardizations. For example, a square-root transformation retains units (e.g. if the raw data are in km, the square root transform will result in units of the square root of km). However, true standardization results in dimensionless numbers. See transformation. Also see Schneider (1994) for a more detailed (though difficult) discussion about units.

Stepwise Analysis - A multiple regression method (including RDA and CCA, which are special cases of multiple regression) in which explanatory (independent) variables are selected on the basis of whether they explain a "significant" amount of variation in your dependent variable(s). There are several flavors of stepwise analysis:

Forward selection (implemented in CANOCO), in which variables are entered one at a time, until no more variables explain significant variation
Backwards selection, in which nonsignificant variables are dropped one at a time.
Combined Analysis, which contains elements of both forward and backward selection.

There are serious problems to the use of stepwise analysis coupled with inferential statistics (see Hypothesis Driven and Exploratory Data Analysis).

STRESS - 1) a measure of the optimality of an ordination solution (i.e. the relationship between the similarity in species composition and the closeness in ordination space), used as part of the algorithm of NMDS. 2) What one often feels while performing multivariate analyses

t-Value Biplot - An infrequently used biplot in RDA and CCA of species scores and environmental variable scores in low-dimensional (conventionally, 2-dimensional) space, in which it is possible to infer the strength of the relationship between species and the environment.

Tongue Effect - a possible statistical artifact in DCA, in which one end of the first axis is artificially compressed along the second axis. The importance of this effect is disputed, since many believe that most real data sets should exhibit "true" compression along a secondary axis. That is, the most important secondary gradient is different at opposite ends of the first gradient.

Trace - the sum of the diagonal elements of a square, symmetric matrix. In a correlation matrix, since the diagonals must equal one, the trace equals the number of variables. The sum of the eigenvalues of a matrix will usually equal the trace of the matrix. In the context of ordination, inertia is a synonym for trace. In CCA, the trace will be related to the amount of variation explained by all ordination axes. CANOCO performs a randomization test on the trace statistic, to test whether the measured variables significantly explain species composition.

Transformation - A mathematical operation performed on a variable (e.g. species abundances or environmental variables), usually with the goal of making that variable more useful in a subsequent analysis. Transformations are performed for a number of purposes, including:

To make the variable conform to a statistical distribution (usually the normal distribution) for hypothesis tests that require such a distribution
To dampen the effects of outliers
To put different variables on a more common footing (see standardization)
To make a measured variable more biologically meaningful
For more discussion about transformations, see Species Abundances in Ordination, On the transformation of species abundances, and Environmental variables in CCA.

Triplot - In CCA and RDA, we have three sets of scores (species scores, sample scores, and environmental variable scores). If we decide to plot all three simultaneously, we have a triplot. Sample scores and species scores are usually indicated by symbols or labelled points. Continuous environmental variables are indicated by arrows, while categorical variables are indicated by points on their centroids.

TWINSPAN - The Acronym, Algorithm, AND Computer Program for Two Way Indicator Species Analysis - A classification method derived from Correspondence Analysis.

Unimodal Distribution - A distribution with one mode. In the case of species response curves, a unimodal distribution means the species has one optimal environmental condition. If any aspect of the environment is greater or lesser than this optimum, the species will perform more poorly (i.e. it will have a lesser abundance). Some ordination techniques (such as DCA and CCA) perform best when species have unimodal distributions, others (such as PCA and RDA) perform better when species have monotonic distributions along gradients (i.e. the species either increase or decrease, but not both, as a function of environmental factors).

Variogram - Also known as semivariogram. A plot of variance as a function of distance of separation. The variogram will tell you expected variance at large scales (i.e. the SILL), the amount of variance at infinitesimally small spatial scales (the NUGGET) and the spatial scale at which samples can be considered independent (the RANGE). The variogram is used in the field of geostatistics.

Variography - the art of interpreting variograms. Variography, a branch of geostatistics, tells you the spatial scales at which your variable(s) varies, whether you have nested scales of variation, and whether you have unresolved "nugget" variation at fine spatial scales. Variograms were developed as part of geostatistics. See Rossi et al. (1992) for an introduction to the use of geostatistics in ecology.

Vector - a matrix which consists of either one row (i.e. a row vector), or one column (a column vector). "A vector is a set of values, commonly denoting a point in a multidimensional space.." - ter Braak (1987)

WCanoImp - The Windows95 version of CanoImp, for inputting data from a spreadsheet.

Weighted average - an average, except that different observations are given differing importances or "weights". In ordination, the weights are typically the abundances (perhaps transformed) of species. The weighted average environment of a species (e.g. the weighted average soil pH) can be used as an index of that species' environmental preference. Weighted averages are intrinsic to Correspondence Analysis and related methods.

X -
Y -
Z - The symbol used to represent a "regionalized variable", which is a variable that varies spatially. Regionalized variables are used in geostatistics.

Literature Cited

Biondini, M.E., P.W. Mielke, and K.J. Berry. 1988. Data dependent permutation for the analysis of ecological
data. Vegetatio 75:161-168

Burrough, P.A. 1987. Spatial aspects of ecological data. Pp. 213-251 in Jongman, R.H., C.J.F. ter Braak and O.F.R. van Tongeren, editors. Data Analysis in Community Ecology. Pudoc, Wageningen, The Netherlands.

Camiz, S. 1991. Reflections on spaces and relationships in ecological data analysis: effects, problems, possible solution. Coenoses 6: 3-14.

Cliff, N. 1987. Analyzing Multivariate Data. Harcourt Brace Jovanovich, San Diego

Digby, P. G. N., and R. A. Kempton. 1987. Population and Community Biology Series: Multivariate Analysis of Ecological Communities. Chapman and Hall, London.

Dixon, P. M. 1993. The bootstrap and the jackknife: describing the precision of ecological indices. Pages 290-318 in S. M. Scheiner and J. Gurevitch, editors. Design and Analysis of Ecological Experiments. Chapman and Hall, New York.

Draper, N.R. & Smith, H. 1981. Applied Regression Analysis. 2^nd ed. Wiley, NY.

Gauch, H. G., Jr. 1982. Multivariate Analysis in Community Structure. Cambridge University Press, Cambridge.

Gittens, R. 1981. Towards the analysis of vegetation succession. Vegetatio 46:37-59

Goodall, D.W. 1954. Objective methods for the classification of vegetation. III. An essay on the use of factor analysis. Australian Journal of Botany 1:39-63.

Hill, M.O. and H.G. Gauch, Jr. 1980. Detrended Correspondence Analysis: an improved ordination technique. Vegetatio 42:47-58.

Hurlbert, R.H. 1984. Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54:187-211.

Jackson, D. A., and K. M. Somers. 1991. Putting things in order: the ups and downs of detrended correspondence analysis. Am. Nat. 137:704-12.

Jongman, R. H. G., C. J. F. ter Braak, and O. F. R. van Tongeren, editors. 1987. Data Analysis in Community and Landscape Ecology. Pudoc, Wageningen, The Netherlands.

Kent, M., and P. Coker. 1992. Vegetation description and analysis: a practical approach. Belhaven Press, London.

Knox, R. G. and R. K. Peet. 1989. Boostrapped ordination: a method for estimating sampling effects in indirect gradient analysis. Vegetatio 80: 153-165.

Legendre, P. and M.-J. Fortin. 1989. Spatial pattern and ecological analysis. Vegetatio 80:107-138.

Legendre, P. and L. Legendre. 1998. Numerical Ecology. 2^nd English edition. Elsevier, Amsterdam. 853 pages.

Manly, B.F.J. 1992. Randomization and Monte Carlo methods in biology. Chapman and Hall, New York. 281 pp.

Manly, B. F. J. 1993. A review of computer intensive multivariate methods in ecology. Pages 307 in G. P. Patil and C. R. Rao, editors. Multivariate Environmental Statistics. Elsevier, .

Morrison, D.F. 1967. Multivariate Statistical Methods. McGraw-Hill, New York. 415pp.

Økland, R. H. 1990. Vegetation ecology: theory, methods and applications with reference to Fennoscandia. Sommerfeltia Supplement 1:1-233.

Palmer, M. W. 1993a. Putting things in even better order: the advantages of canonical correspondence analysis. Ecology 74:2215-30.

Palmer, M. W. 1993b. Potential biases in site and species selection for ecological monitoring. Environmental Monitoring and Assessment 26:277-282.

Peet, R. K., R. G. Knox, J. S. Case, and R. B. Allen. 1988. Putting things in order: the advantages of detrended correspondence analysis. Am. Nat. 131:924-34.

Pielou, E. C. 1984. The Interpretation of Ecological Data: A Primer on Classification and Ordination. Wiley, New York.

Potvin, C. and D. A. Roff. 1993. Distribution-free and robust statistical methdos: viable alternatives to parametric statistics? Ecology 74: 1617-1628

Roberts, D. 1986. Ordination on the basis of fuzzy set theory. Vegetatio 66:123-131

Rossi, RE, DJ Mulla, AG Journel, and EH Franz. 1992. Geostatistical tools for modeling and interpreting ecological spatial dependence. Ecol. Monogr. 62(2):277-314.

Schneider, D. C. 1994. Quantitative Ecology: Spatial and Temporal Scaling. Academic Press, New York.

ter Braak, C. J. F. 1987. Ordination. P. 91-173 in Jongman, R.H., C.J.F. ter Braak and O.F.R. van Tongeren, editors. Data Analysis in Community Ecology. Pudoc, Wageningen, The Netherlands.

ter Braak, C. J. F., and I. C. Prentice. 1988. A theory of gradient analysis. Adv. Ecol. Res. 18:271-313.

ter Braak, C. J. F., and P. Šmilauer. 1998. CANOCO Reference Manual and User's Guide to Canoco for Windows: Software for Canonical Community Ordination (version 4). Microcomputer Power (Ithaca, NY USA) 352 pp.

Ver Hoef, J.M. and D.C. Glenn-Lewin. 1989. Multiscale ordination: a method for detecting pattern at several scales. Vegetatio 82: 59-67.

Wartenberg, D., S. Ferson, and F. J. Rohlf. 1987. Putting things in order: a critique of detrended correspondence analysis. Am. Nat. 129:434-48.

This page was created and is maintained by Michael Palmer.
To the ordination web page

Jump to:
Top A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Literature Cited