Are there other terms I should add to the list? Can you provide any additional or better definitions? If so, please let me know at: firstname.lastname@example.org
Abundance: any measure of the amount of an organism. Can include density, biomass, frequency, cover, presence/absence, etc. See species abundances in ordination.
Arch effect - a distortion or artifact in an ordination diagram, in which the second axis is an arched function of the first axis. It is caused by the unimodal distribution of species along gradients. The arch appears in Correspondence Analysis and other ordination techniques. One of the main purposes of Detrended Correspondence Analysis is to remove the arch effect. Principal Components Analysis creates a more serious artifact called the horseshoe effect.
Axis - I haven't the foggiest idea how to define this. Any suggestions? Axes are the basic structure of the Cartesian coordinate system, and are usually portrayed as being at right angles (i.e. orthogonal) to each other, though non-Euclidean coordinate systems also exist. With respect to the use of more exotic coordinate systems in ordination, see Camiz (1991)
Beta Diversity - Also called species turnover or differentiation diversity. Beta diversity is a measure of how different samples are from each other, and/or how far apart they are on gradients of species composition. Alternatively, it is a measure of the "length" of an ecological gradient or ordination axis, in terms of species composition. Total beta diversity can be compared among gradients, but not per unit (e.g. one cannot compare whether the rate of change is higher along a pH gradient than along a moisture gradient, but the total change along the gradients can be assessed). An axis or gradient with high beta diversity will have completely different species compositions (i.e. share no species) at opposite ends (indeed, the ends might be completely different from the middle). An axis or gradient with low beta diversity will be similar in species composition at both ends. Some ordination techniques (e.g. PCA) behave best at low beta diversity, and others (e.g. DCA, CCA) behave best at high beta diversity. Beta diversity is one of the most misunderstood concepts in community ecology, and it has been defined in numerous ways in the past. The rescaling algorithm of DCA provides a measure of beta diversity. For more discussion, see Explorations in Coenospace.
BACI Design - "Before - After - Controlled - Impact" design. Refers to a study in which quadrats (or other samples) are studied through time, and some of the quadrats are subjected to an experimental treatment or treatments. Ideally, the statistics used for a BACI design will be able to distinguish the effects of treatment and time.
Biplot - an ordination diagram which simultaneously plots species scores and sample scores. It is only relevant for ordination techniques such as CCA, DCA, CA, PCA, and RDA. Distance-based ordination techniques do not result in the simultaneous ordination of both species and samples.
Biplot arrow - a representation of a variable (usually an environmental variable) on a biplot. The arrow points in the direction of maximum correlation, and the length of the arrow is related to the strength of the correlation. In general, the longer the arrow, the more highly related that variable is to species composition.
Bootstrap - A reasonably new computer-intensive method to obtain confidence intervals, to estimate parameters, or in some cases to test hypotheses. The bootstrap is considered a "Resampling method", and is allied to the Jackknife and to randomization tests. Introductions to bootstrapping for ecologists are given in Manly (1993) and Potvin and Roff (1993) . Knox and Peet (1989) apply bootstrapping to DCA.
Bray-Curtis Ordination - A synonym of polar ordination
CA - The acronym for Correspondence Analysis
CANOCO - A computer program, modified from DECORANA by C.J.F. ter Braak, which performs a wide variety of ordinations such as Canonical Correspondence Analysis, Correspondence Analysis, Detrended Correspondence Analysis, etc.
CanoImp - A computer program written by Petr Šmilauer that converts spreadsheet blocks, copied into the Windows clipboard, into proper format for input into CANOCO. Canoimp is the console version, and WCanoImp is the Windows95 version. These functions are now seamlessly integrated into CANOCO.
Canonical Analysis - A term which appears in the literature a number of times, often with different meanings. It has been used as a synonym for canonical correlation and Canonical Correspondence Analysis. It is probably best reserved as a generic term referring to any method which links one set of variables to at least one other set of variables, and would thus include canonical correspondence analysis, canonical correlation, redundancy analysis, etc.
Canonical Correlation - Often confused with Canonical Correspondence Analysis. It is a technique which finds the linear combination of one set of variables which is maximally correlated with a linear combination of another set of variables. Canonical correlation is closely related to PCA. See Gittens (1987) for an ecological application.
Canonical Correspondence Analysis - A widely used method for direct gradient analysis, best developed by C.J.F. ter Braak (see Jongman et al. 1987, ter Braak and Prentice 1988, and many of the links on the ordination web page). CCA assumes that species have unimodal distributions along environmental gradients.
CANOPOST - A Windows program that takes the results of CANODRAW and produces publication-quality output.
Categorical Variable - A variable that is represented by several different types; for example: lake/river/stream, farm/pasture/unmanaged, pitfall trap/fence trap/direct sighting. For most multivariate analyses, categorical variables must be converted to k-1 dummy variables (where k = the number of categories). See Environmental variables in CCA
CCA - The Acronym for Canonical Correspondence Analysis
Centroid - the (weighted) mean of a multivariate data set. Can be represented by a vector. For many ordination techniques, the centroid is a vector of zeros (that is, the scores are centered and standardized). In a direct gradient analysis, a categorical variable is often best represented by a centroid in the ordination diagram. See Centroids and Inertia.
Classification - The act of putting things in groups. Most commonly in community ecology, the "things" are samples or communities. Classification can be completely subjective, or it can be objective and computer-assisted (even if arbitrary). Hierarchical classification means that the groups are nested within other groups. There are two general kinds of hierarchical classification: divisive and agglomerative. A Divisive method starts with the entire set of samples, and progressively divides it into smaller and smaller groups. An agglomerative method starts with small groups of few samples, and progressively groups them into larger and larger clusters, until the entire data set is sampled. Pielou (1984) gives a good introduction to various classification methods.
Clustering - sometimes simply a synonym of classification, but more usually referring to agglomerative classification.
Coenocline - a simultaneous portrayal of all species response curves along an environmental gradient (presumably, an important one). This is probably the most common category of graphs in all of community ecology. Ecological continuum is a synonym. See Explorations in Coenospace.
Coenoplane - a simultaneous portrayal of all species response surfaces along two (presumably important) environmental gradients.
Coenospace - a simultaneous portrayal of all species response surfaces along an unspecified number of gradients. It is difficult for mere mortals to visualize more than three such gradients simultaneously. Fortunately, there are rarely more than three important axes or dimensions in most ecological data sets. The concept of coenospace is closely allied to Hutchinson's multidimensional niche.
Correlation - A method which determines the strength of the relationship between variables, and/or a means to test whether the relationship is stronger than expected due to the null hypothesis. Usually, we are interested in the relationship between two variables, x and y. The correlation coefficient r is one measure of the strength of the relationship.
Correlation Coefficient - usually abbreviated r. A number which reflects the strength of the relationship between two variables. It varies between -1 (for a perfect negative relationship) to +1 (for a perfect positive relationship). If variables are standardized to have zero mean and a unit standard deviation, then r will also be the slope of the relationship. The value r2 is known as the coefficient of determination; it varies between 0 and 1. The coefficient of determination is loosely interpreted as "the proportion of variance in y which can be explained by x".
Correlation Matrix - a square, symmetric matrix consisting of nothing but correlation coefficients. The rows and the columns represent the variables. The diagonal elements are all equal to 1, for the simple reason that the correlation coefficient of a variable with itself equals 1. The correlation matrices given in CANOCO usually differ slightly from those calculated in basic statistical packages. This is because CANOCO uses weighted correlations (i.e. samples with a higher summed abundance of all species will have more influence in the calculation).
Covariable - refers to a variable (in the context of DGA, an environmental variable) which for some reason the investigator wishes to "factor out". This is usually either a nuisance variable, or an important variable which is not of immediate interest. A covariable can be used to specify a block effect or site effect (in which case it is usually a dummy variable), if treatments are of most interest. See Partial Analysis or Partial Ordination
Covariance Matrix - a square, symmetric matrix in which the rows and columns are variables, and the entries are covariances. The diagonal elements (i.e. the covariance between a variable and itself) will equal the variances.
DCA - The acronym for Detrended Correspondence Analysis
Detrended Correspondence Analysis (DCA) - an eigenanalysis-based ordination technique derived from correspondence analysis (Hill and Gauch 1980) DCA performs detrending to counteract the arch effect, a defect of correspondence analysis. DCA also (optionally) performs rescaling of ordination axes, so that the spacing of sample (and species) scores along the axes are scaled in units of beta diversity. See Detrended Correspondence Analysis.
Detrending - A method employed in DCA and DCCA to remove the arch effect. Axes are divided into segments, and the sample scores of higher axes are reassigned to be centered around the centroid. See Detrended Correspondence Analysis for a brief description. More thorough descriptions are given in Gauch 1982, Pielou 1984 and Kent and Coker 1992.
Dimension - This is a difficult term to define precisely in a comprehensible way. However, it is possible to grasp at a more intuitive level. It is the number of axes in a Cartesian coordinate system or the number of variables (unless some variables are linear combinations of other variables). Even though there are often a large number of dimensions, there are usually only a small number of important dimensions. A related concept to dimension is the rank of a matrix. The rank is "the number of dimensions of a space in which the data points lie" (Pielou 1984)
Direct Gradient Analysis - Any gradient analysis in which the important gradients are known and measured. Direct gradient analysis is commonly performed using nonlinear regression, or using a technique such as Canonical Correspondence Analysis. In contrast, see indirect gradient analysis.
Discriminant Analysis - A technique related to ordination, which is used in many fields other than ecology. Digby and Kempton (1987) provide a good discussion. Discriminant Analysis tells us whether a particular set of variables is useful in discriminating previously delineated groups. Canonical Variates Analysis (CVA) is a form of discriminant analysis which is actually a special case of Canonical Correspondence Analysis in which the classes are coded as dummy variables.
Dissimilarity Matrix - see distance matrix.
Distance Decay - the property by which two nearby points have more similar characteristics than two distant points. Distance decay violates the basic statistical assumption that samples are independent, and is therefore a special case of pseudoreplication. Distance decay can be quantified using geostatistics.
Distance Matrix - A square and (usually) symmetric matrix in which the rows and the columns represent (usually) samples. The entries represent some index of the difference between samples; the measure could be Euclidean distance, Manhattan (City Block) Distance, Bray-Curtis dissimilarity, the Jaccard coefficient, or any of a huge number of possibilities. The diagonal elements (the difference between a sample and itself) is usually zero. Distance matrices are necessary prerequisites for distance-based ordination methods such as Polar Ordination and Nonmetric Multidimensional Scaling. Distances matrices are closely related to (and easily converted to) similarity matrices.
Downweighting - An option in many ordination programs to dampen the effects of rare species. Downweighting gives weights to species which are related to their abundances. Correspondence Analysis and its derivatives are sensitive to rare species which occur in species-poor areas (see, e.g. ter Braak 1987); downweighting reduces but does not eliminate this problem.
Eigenvector - a central concept in linear algebra. Sample scores are often eigenvectors. See eigenanalysis.
Environmental Gradient - a spatially varying aspect of the environment which is expected to be related to species composition. Environmental gradients are the x-axes for coenoclines. I do not know whether gradients which vary only through time could properly be called environmental gradients, though to some degree they can be treated as such in ordination methods. Differences in resource use within a site cannot be considered gradients. Human-imposed effects can be considered environmental gradients.
Environmental Variable - a measure of the environment which is presumably related to an environmental gradient. Environmental variables can be continuous, or they can be represented by dummy variables.
Euclidean Distance - the straight line distance between two points in a Cartesian coordinate system. The Euclidean distance can be determined using the Pythagorean Theorem. In two dimensions, the Euclidean distance is [(x1-x2)2 + (y1-y2)2]0.5. Usually, the points represent samples and the axes of the Cartesian coordinate system represent the abundances of species. Gauch (1982) has a good description of the various kinds of data space.
Exploratory Analysis - a general term for an analysis in which the chief objectives is to find pattern in the data. Often, exploratory analysis conflicts with hypothesis testing. For example, stepwise regression is permissible in exploratory analysis, but can cause serious problems if you are interested in testing hypotheses. See Hypothesis-Driven and Exploratory Data Analysis.
Factor Analysis - This is a term which has been variously defined. In some treatments it seems to be a synonym of ordination. Sometimes (as in some statistical software) it includes principal components analysis. The following discussion is from Morrison (1967); my comments are in brackets.
According to Gauch (1982), there is a subtle distinction between ordination and factor analysis, which appears consistent with Morrison:
FORTRAN - one of the earliest computer languages in widespread use in ecology. Most ordination programs were originally written in FORTRAN, including the Cornell Ecology Programs such as DECORANA.
Fuzzy Sets and Fuzzy Set Ordination - Fuzzy sets are sets which allow grades of membership. For example, the set of all high-elevation plots may include no plots at sea level, and all plots on mountain tops, but what about plots at intermediate elevations? Classical set theory would have us define an arbitrary elevation or threshold, above which all plots must belong, and below which no plots belong. Fuzzy set theory would allow a plot to belong with 25% membership (for a relatively low elevation) or 75% (for a relatively high elevation). Fuzzy set theory is currently being used in robotics, computer vision, and artificial intelligence. The application of fuzzy set theory to ecology was developed by Roberts (1986). Fuzzy set ordination is probably best classified as a direct gradient analysis technique, and it bears strong similarities to polar ordination.
y = Ae-(x-B)^2/C
where A is the maximum height of the curve, B is the modal location of the curve, and C is a measure of the breadth of the curve (often called niche breadth, tolerance, or standard deviation). The curve is bell-shaped. The difference between a Gaussian Curve and a Normal Distribution is that the latter is a statistical distribution, and hence the area under the curve is constrained to be one, and the y-axis represents frequency.
Gaussian Ordination - A little-used ordination technique which arranges samples along ordination axes such that the fit of the species response curves to the Gaussian curve is maximized. The fit can be measured by r2. Ter Braak (1987) shows that CCA can be considered (asymptotically) a special case of Gaussian Ordination.
Geostatistics - A body of analytical techniques for the study of spatial pattern. Geostatistics were largely developed for the mining industry, but they are now widely used in ecology. For an introduction to geostatistics in ecology, see Burrough (1987). There are two interrelated components to geostatistics: variography and spatial interpolation (kriging). See Rossi et al. (1992) for a good introduction to geostatistical applications in ecology.
Gradient - see Environmental Gradient
Guttman Effect - A synonym for the horseshoe effect.
Horseshoe Effect - a distortion in ordination diagrams. It is more extreme than the arch effect because the ends of the first gradient are involuted. The horseshoe effect can be observed for very long gradients in PCA.
Indirect Gradient Analysis - : gradients are unknown a priori, and are inferred from species composition data. The species tell us what the gradients are. Usually performed using an ordination technique such as Detrended Correspondence Analysis.
Inertia - a measure of the total amount of variance in a data set. It is directly related to the physical concept of inertia, which is the tendency for an object in motion to stay in motion, and the tendency for an object at rest to stay at rest. For weighted averaging methods such as DCA and CCA, the inertia is related to the spread of species modes (or optima) in ordination space, rather than the variance in species abundance.
Iteration - Often, a mathematical operation must be repeated again and again (using the output of the operation as the input into the operation the next time around). Each incidence of this operation is termed an iteration.
Jackknife - A (usually) computer-intensive method to estimate parameters, and/or to gauge uncertainty in these estimates. The name is derived from the method that each observation is removed (i.e. cut with the knife) one at a time (or two at a time for the second-order Jackknife, and so on) in order to get a feeling for the spread of data. See Manly (1993) and Dixon (1993) for reviews of the use of the Jackknife and similar methods in ecology.
Kriging - a method of spatial interpolation based upon geostatistics. By "spatial interpolation", we mean estimating the value of a variable at an unsampled location based upon measured values of the same value at known locations. The most common application of kriging is mapping. For example, one can use kriging produce maps of DCA Axis 1 across a landscape even if the landscape is incompletely sampled. CANODRAW is capable of performing some kinds of kriging, mostly for the purpose of drawing isoclines in ordination space. See Burrough (1987) for a brief introduction to kriging.
Latent Root - another name for eigenvalue.
Latent Value - another name for eigenvalue.
Linear Combination - a linear combination of a set of variables is a new variable (yi) which can be expressed as follows: yi=å(bjxij), where bj is the "coefficient" of variable j, and xij is value of observation i of variable j. In multiple regression, predicted values of the dependent variable are linear combinations of the independent (or explanatory) variables. In CCA and RDA, sample scores are linear combinations of the environmental variables.
Mantel Test - a method for comparing matrices to each other, also called "matrix correlation". See Legendre and Fortin (1989) for an introduction to Mantel tests for spatial pattern. Significance can be evaluated using randomization methods.
Matrix - a set of numbers arranged in rows and columns. "An n by m matrix is a rectangular array of elements with n rows and m columns in which not only is the value of an element important, but also its position in the array" (Burden et al. 1981). It is very common to encounter a matrix with the same number of rows as columns; this is called a square matrix. A square symmetric matrix is one which is identical if you "transpose" the matrix (i.e. switch the rows and the columns). The correlation matrix is an example of a square symmetric matrix.
MCPT - an acronym for Monte Carlo Permutation Test
MDS - an acronym for Multidimensional Scaling, but perhaps it is better to avoid this acronym since it has been variously used in the past. See Terminology in Ordination.
Monotonic Distributions - describes species response curves in which species only increase along environmental gradients, or only decrease along environmental gradients. A monotonic distribution can be linear or more complex. Also, species with unimodal distributions may appear to have monotonic distributions may appear to have short gradients if only a small portion of the gradient is sampled. See Explorations in Coenospace. If most species have a monotonic distribution, then it is best to use PCA and RDA, but if most species have unimodal distributions, then it is best to use DCA and CCA.
Monte Carlo Tests - a synonym of randomization tests (at least as commonly used by ecologists). A Monte Carlo permutation test is when the actual data values are maintained, but they are randomly permuted in order to obtain the distribution of the test statistic. Exactly how they are permuted depends on the null hypothesis to be tested. In the simplest use of Monte Carlo permutation tests in CCA, the values for the environmental variables are randomly reassigned to the values for the species data.
MRPP - an acronym for Multiresponse Permutation Procedure.
Multicolinearity - describes the situation in which a number of variables (or perhaps all of them) are highly correlated with each other. This is often considered a problem, and indeed it makes inferential statistics difficult. But it can also be considered a blessing, because redundant data are useful in identifying patterns.
Multiple Regression - See multiple regression. A method (usually based on the least squares principle) which attempts to describe or "fit" a measured dependent variable as a function of multiple measured independent variables.
Multiresponse Permutation Procedure- usually abbreviated MRPP. A randomization test that evaluates differences in species composition, based on some distance measure. See, for example Biondini et al. (1988).
Multiscale Ordination - an ordination method which analyses species composition at multiple spatial scales simultaneously. See, for example, Ver Hoef and Glenn-Lewin (1989).
Multivariate Analysis - any analysis which attempts to simultaneously examine the behavior of more than one dependent variable. A multiple regression is not considered a multivariate analysis, since only one dependent (response) variable is studied at a time. Ordination, classification, canonical correlation, and factor analysis are considered multivariate methods. Why use a multivariate analysis instead of multiple univariate analyses? For several reasons:
Noise - This term is very difficult to define, but in general it refers to chance variation in nature which interferes with our ability to see pattern and infer processes. In its simplest form, noise is the same thing as statistical error (e.g. the error term in a regression). See Gauch (1982) for a more thorough discussion. Ideally, an ordination method will represent real, important gradients as its first, second, third, etc. axes. Axes which predominantly summarize noise should be among the last axes.
Nominal Variable - A variable which can be represented as a binary: yes/no, on/off, present/absent. A Nominal variable is usually summarized by a dummy variable.
Nonmetric Multidimensional Scaling (NMDS) - The most widely used distance-based ordination method. The user needs to prespecify the number of dimensions, and then the method will minimize the stress (a measure of poorness of fit between the ordination and measured ecological distances). See also distance matrix.
Normal Equations - The equations by which the solution to regression problems are found. The "normal" comes from the concept of "normal lines" in physics, i.e. vectors which are at right angles, and therefore uncorrelated.
Ordination - The simplest definition is "Putting Things in Order", which explains the titles of a series of papers (Wartenberg et al. 1987, Peet et al. 1988, Jackson and Somers 1991, Palmer 1993). For some opinions on what makes a good ordination method, see The ideal ordination method. The origin of the term "ordination" in ecology is attributed to Goodall (1954).
Orthogonal - At right angles to, or completely uncorrelated with. Usually in ordination, axes are orthogonal to each other. Two orthogonal variables will have a correlation coefficient, (and, for that matter, covariance), equal exactly zero. If two orthogonal variables are standardized, the sum of the products of the variables will equal zero. In many ordinations, it may appear that two axes are correlated with each other (this often creeps up in DCA). However, note it is the WEIGHTED correlations will equal zero - so a single sample with a high weight (i.e. high abundance of all species combined) can counteract the effects of a number of samples with low weight.
Partial Analysis - an analysis (e.g. regression, correlation, ANOVA, ordination) in which the effects of covariables are "factored out" or nullified. Examples of partial analysis include partial correlation, partial DCA, partial CCA, ANCOVA, etc. See Partial Ordination
PC-ORD - A computer program developed by Bruce McCune which provides a wide variety of statistical tests and analyses.
PCA - The acronym for "Principal Components Analysis"
PCoA - The acronym for "Principal Coordinates Analysis"
Permutation Test - a special case of randomization test
Phytosociology - following Kent and Coker (1992), "[the process of] recognizing and defining plant communities". According to some (such as Kent and Coker), the discipline requires a Clementsian world view. However, some would argue that phytosociology is possible within a Gleasonian framework, and that it is necessary for mapping vegetation. Worldwide, the Braun-Blanquet method is the most widely practiced kind of phytosociology. In it, communities are given Latin names just like species are in the Botanical Code and the Zoological Code.
Polar Ordination - Also known as Bray-Curtis ordination. See distance-based ordination methods. One of the first ordination methods to be widely used in ecology. Two sites are chosen as endpoints for each axis (or artificial endpoints can be established), and all the other sites are ordinated relative to these endpoints, based upon their similarity to these endpoints.
Principal Components - The axes of a Principal Components Analysis. The first Principal Component will, ideally, represent the dominant gradient. The second Component will be orthogonal to the first, and will explain some of the residual variation. The third will be orthogonal to the first and second components, and so on.
Principal Components Analysis - see Eigenanalysis-based ordination methods and Principal Components Analysis. Principal Components Analysis (PCA) is an ordination technique which involves an eigenanalysis of the correlation matrix or the covariance matrix. PCA suffers from a serious problem for gradient analysis: the horseshoe effect. This problem is caused by unimodality in the species response curve.
Principal Coordinates Analysis (PCoA) - A distance-based ordination method in which the distances between sites in the ordination diagram is maximally correlated with the ecological distances. Almost any distance matrix can be used, (see Similarity, Difference and Distance) but if the distance measure is Euclidean, PCoA = PCA.
Procrustes Analysis - See Procrustes rotation
Procrustes Rotation - Suppose you had the same objects arranged in two different coordinate systems (e.g. based on two different ordination procedures, or based on different years of data). How can you figure out how well the different ordinations correspond to each other?
Pseudoreplication: a term popularized by Hurlbert (1984). It refers to (usually) field data in which samples are not independent. A hypothetical extreme example is a study of the diatoms in a polluted lake and an unpolluted lake If 1000 samples are taken from each lake, we cannot consider these to be true replicates to test for a pollution effect. This is because we do not know whether the lakes are different for reasons other than pollution (actually, we do know: no two lakes can be identical!). A less extreme example of pseudoreplication is ordinary spatial dependence. Pseudoreplicated data are rampant in ecology, and the problem is to some degree unavoidable.
Q-Mode - Q-mode and R-mode refer to ordinations of sites and species, respectively. These terms are most used in the context of correspondence analysis and related methods. It turns out that R-mode and Q-mode analyses give identical results in correspondence analysis; see Digby and Kempton (1987) and ter Braak (1987).
R-Mode - see Q-Mode
RA - an acronym for "Reciprocal Averaging"
Randomization Test - See also Randomization Tests. The purpose of inferential statistics is to evaluate whether a number which summarizes something of interest, is greater than (or less than) one would expect just due to chance (i.e. if H0 is true). This number can be one of the well-known parametric statistics (t, F, chi-squared, r, etc.), or nonparametric statistics (Mann-Whitney U, Spearman r, etc.), BUT
Often it is possible to get around this problem by the use of randomization tests (also called Monte Carlo tests or permutation tests). These are related, but not the same thing, as the Bootstrap and the Jackknife methods. Randomization tests are good for statistical inference, but not so good for developing confidence intervals or for model building.
The procedure for a randomization test is:
This method may seem somewhat magical, or even circular - how can you get any information out of randomness? It is because you are answering the question directly: "How likely is it that if the null hypothesis were true, I would observe a value this extreme just due to chance." It is worth knowing that Fisher used randomization tests to test the value of the t-test, F-tests, etc. Many people are now promoting the use of randomization tests even when parametric and nonparametric tests exist. Statistical Educators are beginning to use randomization tests as the introduction to statistics, because in many ways it is easier to grasp. See Manly (1992) for more information about randomization tests in ecology.
RDA - an acronym for Redundancy Analysis
Redundancy - The property of data with much repeated information. Ecological data are typically quite redundant. For example, a stream which has a certain fish species which likes fast currents is likely to have other fish species which like fast currents. Also, it is somewhat less likely to have species which like slow currents. A statistical tendency for certain groups of species to be either negatively or positively associated causes this redundancy, and the most common cause of redundancy is that species have particular environmental requirements. If redundancy did not exist, multivariate methods would fail.
Redundancy Analysis (RDA) - a multivariate direct gradient analysis method in which species are presumed to have linear relationships to environmental gradients (i.e. linear species response curves). Like CCA, the results of RDA can be expressed in a triplot, i.e. a plot of sample scores, species scores, and environmental arrows. Unlike CCA, the species scores in RDA are most accurately represented by arrows (that is, the direction in which that species is increasing in abundance).
Regression - A function (alternatively, a method by which this function is found) relating one or more dependent variables to one or more independent variables. CCA and multiple regression are only two of many kinds of regression. The original use of the term "regression" refers to "regression back to the mean". For example, it was observed that the sons of tall fathers tended to be shorter, on average, than their fathers. This regression effect poses some serious problems for ecological monitoring (See Palmer 1993b).
Rescaling - The stretching and compression of coenoclines to a standardized beta diversity. The ability to rescale allows us to convert skewed species response curves to symmetrical ones. It can also allow us to change leptokurtic (tightly peaked) or platykurtic (flat-topped) curves to normal ones. This is why we sometimes don't need to care too much if the Gaussian model doesn't work perfectly. A form of rescaling is performed in Detrended Correspondence Analysis (optionally).
Residual - the observed value minus the expected, predicted, or modeled value. In least-squares regression methods, a line is fit to data such that the sum of the squares of the residuals is minimized.
Sample Score - same as site score and stand score. A coordinate along an ordination axis specifying the location of a sample. It is the goal of an ecologist to determine whether sample scores are related to environmental gradients. Ideally, sample scores represent the position of communities along the coenocline.
Segments - In DCA, axes are divided into segments prior to detrending. It is thought that the choice of number of segments will have a large impact on the results of DCA. Also, there has been a minor bug reported in the detrending algorithm of DECORANA and CANOCO. See the ordination web page for links related to this bug.
Semivariogram - see variogram
Similarity Index - A measure of the similarity of species composition between two samples. Examples include the Sørensen coefficient and the Jaccard coefficient. Most similarity indices have values of zero for samples that share absolutely no species, and 1 or 100% for samples which have identical species composition.
Similarity Matrix - a square and (usually) symmetric matrix in which the entries are similarities between samples. Similarity matrices are easily produced from, or converted into, distance matrices. The diagonal entries are usually 1 or 100%, meaning a sample is usually 100% similar to itself.
Singular Matrix - a square matrix which cannot be inverted. In multivariate methods, a singular matrix can occur if one variable is precisely a linear combination of the other variables. This may occur if data are expressed in a percentage basis, or there are is a categorical variable expressed as a series of dummy variables. See environmental variables in CCA. Most multivariate methods are not able to cope with singular matrices; this is the matrix equivalent of dividing by zero. CANOCO is able to recognize and make corrections for singular matrices, but many other software packages are not.
Site Score - same as sample score.
Spatial Autocorrelation - a synonym for spatial dependence.
Spatial Dependence - the value of a variable at a given point depends on the value of that variable at other points. Spatial dependence violates the basic assumption of most statistics that observations are independent. It thus can lead to pseudoreplication. The most common form of spatial dependence is distance decay. Although problematic in some senses, spatial dependence is indispensable for geostatistics and spatial interpolation techniques such as kriging.
Species Response Curve - a graphical portrayal of the abundance or performance of a species as a function of an environmental gradient. Some ordination methods assume that species response curves are linear, others assume they are unimodal. See, for example, ter Braak and Prentice (1988). A species response surface related the species abundance to two or more gradients simultaneously.
Species Score - A coordinate along an ordination axis specifying the location of a species. In weighted-averaging ordination methods such as CA, CCA, and DCA, the species score represents the centroid of the species, or the mode of the unimodal species response curve. Species scores help one interpret ordination axes in indirect gradient analysis.
Stand Score - a synonym of sample score.
Standardization - a way of scaling variables so that different variables, measured in different units, can be compared. See also the end of Basic Statistical Concepts.
Stepwise Analysis - A multiple regression method (including RDA and CCA, which are special cases of multiple regression) in which explanatory (independent) variables are selected on the basis of whether they explain a "significant" amount of variation in your dependent variable(s). There are several flavors of stepwise analysis:
There are serious problems to the use of stepwise analysis coupled with inferential statistics (see Hypothesis Driven and Exploratory Data Analysis).
STRESS - 1) a measure of the optimality of an ordination solution (i.e. the relationship between the similarity in species composition and the closeness in ordination space), used as part of the algorithm of NMDS. 2) What one often feels while performing multivariate analyses
t-Value Biplot - An infrequently used biplot in RDA and CCA of species scores and environmental variable scores in low-dimensional (conventionally, 2-dimensional) space, in which it is possible to infer the strength of the relationship between species and the environment.
Tongue Effect - a possible statistical artifact in DCA, in which one end of the first axis is artificially compressed along the second axis. The importance of this effect is disputed, since many believe that most real data sets should exhibit "true" compression along a secondary axis. That is, the most important secondary gradient is different at opposite ends of the first gradient.
Trace - the sum of the diagonal elements of a square, symmetric matrix. In a correlation matrix, since the diagonals must equal one, the trace equals the number of variables. The sum of the eigenvalues of a matrix will usually equal the trace of the matrix. In the context of ordination, inertia is a synonym for trace. In CCA, the trace will be related to the amount of variation explained by all ordination axes. CANOCO performs a randomization test on the trace statistic, to test whether the measured variables significantly explain species composition.
Transformation - A mathematical operation performed on a variable (e.g. species abundances or environmental variables), usually with the goal of making that variable more useful in a subsequent analysis. Transformations are performed for a number of purposes, including:
Triplot - In CCA and RDA, we have three sets of scores (species scores, sample scores, and environmental variable scores). If we decide to plot all three simultaneously, we have a triplot. Sample scores and species scores are usually indicated by symbols or labelled points. Continuous environmental variables are indicated by arrows, while categorical variables are indicated by points on their centroids.
Unimodal Distribution - A distribution with one mode. In the case of species response curves, a unimodal distribution means the species has one optimal environmental condition. If any aspect of the environment is greater or lesser than this optimum, the species will perform more poorly (i.e. it will have a lesser abundance). Some ordination techniques (such as DCA and CCA) perform best when species have unimodal distributions, others (such as PCA and RDA) perform better when species have monotonic distributions along gradients (i.e. the species either increase or decrease, but not both, as a function of environmental factors).
Variogram - Also known as semivariogram. A plot of variance as a function of distance of separation. The variogram will tell you expected variance at large scales (i.e. the SILL), the amount of variance at infinitesimally small spatial scales (the NUGGET) and the spatial scale at which samples can be considered independent (the RANGE). The variogram is used in the field of geostatistics.
Variography - the art of interpreting variograms. Variography, a branch of geostatistics, tells you the spatial scales at which your variable(s) varies, whether you have nested scales of variation, and whether you have unresolved "nugget" variation at fine spatial scales. Variograms were developed as part of geostatistics. See Rossi et al. (1992) for an introduction to the use of geostatistics in ecology.
Vector - a matrix which consists of either one row (i.e. a row vector), or one column (a column vector). "A vector is a set of values, commonly denoting a point in a multidimensional space.." - ter Braak (1987)
WCanoImp - The Windows95 version of CanoImp, for inputting data from a spreadsheet.
Weighted average - an average, except that different observations are given differing importances or "weights". In ordination, the weights are typically the abundances (perhaps transformed) of species. The weighted average environment of a species (e.g. the weighted average soil pH) can be used as an index of that species' environmental preference. Weighted averages are intrinsic to Correspondence Analysis and related methods.
Z - The symbol used to represent a "regionalized variable", which is a variable that varies spatially. Regionalized variables are used in geostatistics.
Burrough, P.A. 1987. Spatial aspects of ecological data. Pp. 213-251 in Jongman, R.H., C.J.F. ter Braak and O.F.R. van Tongeren, editors. Data Analysis in Community Ecology. Pudoc, Wageningen, The Netherlands.
Dixon, P. M. 1993. The bootstrap and the jackknife: describing the precision of ecological indices. Pages 290-318 in S. M. Scheiner and J. Gurevitch, editors. Design and Analysis of Ecological Experiments. Chapman and Hall, New York.
Draper, N.R. & Smith, H. 1981. Applied Regression Analysis. 2nd ed. Wiley, NY.
Manly, B.F.J. 1992. Randomization and Monte Carlo methods in biology. Chapman and Hall, New York. 281 pp.
Økland, R. H. 1990. Vegetation ecology: theory, methods and applications with reference to Fennoscandia. Sommerfeltia Supplement 1:1-233.
ter Braak, C. J. F., and P. Šmilauer. 1998. CANOCO Reference Manual and User's Guide to Canoco for Windows: Software for Canonical Community Ordination (version 4). Microcomputer Power (Ithaca, NY USA) 352 pp.
Ver Hoef, J.M. and D.C. Glenn-Lewin. 1989. Multiscale ordination: a method for detecting pattern at several scales. Vegetatio 82: 59-67.