Similarity, Difference and Distance

Similarity, Difference, and Distance

We will use the following table in much of what follows:

                Sample 1 Sample 2 Sample 3 Sample 4

Cardinals          1        0        0        3

roadrunners        1        0        0        0

bluebirds          3        2        0        0

phoebes            1        0        5        2

titmice            0        9        6        0

red-tails          1        0        0        0

chickadees        20        1        1        0

waxwings          66        0        0        0

Gauch (1982) presents five Conceptual Spaces (see species abundances in ordination)

1) Species space: each sample has an abundance for each species, and can be placed as a point on a graph in which the axes are species.

2) Sample space: Each species is present with a given abundance in each sample, and can be placed as a point on a graph in which the axes are samples.

3) Species dissimilarity space: One can measure the dissimilarity of each species to each other species, based upon the sample they occur in. Thus, two species which are never present in the same sample will have a very high dissimilarity, and two species which are always present in the same sample and with similar patterns of abundance will have a low dissimilarity. According to most measures, the dissimilarity between a species and itself is zero. Each species can be placed as a point on a graph in which the axes are dissimilarities to species.

4) Sample dissimilarity space: One can measure the dissimilarity of each sample to each other sample, based upon the species that occur in them (discussed below). Thus, two samples which share no species have a very high dissimilarity, and two samples which share the same species in similar abundances will have a low dissimilarity. According to most measures, the dissimilarity between a sample and itself is zero. Each sample can be placed as a point on a graph in which the axes are dissimilarities to samples.

5) Ecological space: Each sample is characterized by different environments, and can be placed as a point on a graph in which the axes are environmental variables.

Ordination

Unless you are dealing with a very simple system, it is impossible to completely visualize all of these spaces. Imagine a coordinate system with several hundred axes! This is why we need ordination: To try to find a low-dimension space which summarizes the most important aspects of the above spaces.

Gauch (1982): "Ordination primarily endeavors to represent sample and species relationships as faithfully as possible in a low-dimensional space"

Dissimilarity

(Also known as distance or difference)
There are many different dissimilarity spaces, because there are many different ways to measure dissimilarity.
Many ordination techniques are based upon sample dissimilarity. See Distance-based ordination techniques.

In the following, I will focus upon dissimilarity among samples. Data are usually relativized before calculating dissimilarity.

Gauch gives three measures:

Euclidean Distance (based upon Pythagorean Theorem)

_I

ED_jk = [S(A_ij-A_ik)²]^0.5

ⁱ⁼¹

Percentage Dissimilarity

PD_jk = IA-PS_jk
_I _I
where PS_jk = 200[Smin(A_ij,A_ik)] / S(A_ij + A_ik)
ⁱ⁼¹ ⁱ⁼¹

and IA is the "Index of Association", or the similarity of replicate samples. Since this is usually not known, either the highest similarity, the similarity of environmentally similar samples, or 100% is substituted.

Complemented Coefficient of Community (CD) - Based upon presence and absence.

                Sample 1 Sample 2 Sample 3 Sample 4

Cardinals          1        0        0        3

roadrunners        1        0        0        0

bluebirds          3        2        0        0

phoebes            1        0        5        2

titmice            0        9        6        0

red-tails          1        0        0        0

chickadees        20        1        1        0

waxwings          66        0        0        0

The relative abundances are:

              Sample 1 Sample 2 Sample 3 Sample 4
Cardinals        0.01    0.00    0.00    0.60
roadrunners      0.01    0.00    0.00    0.00
bluebirds        0.03    0.17    0.00    0.00
phoebes          0.01    0.00    0.42    0.40
titmice          0.00    0.75    0.50    0.00
red-tails        0.01    0.00    0.00    0.00
chickadees       0.22    0.08    0.08    0.00
waxwings         0.71    0.00    0.00    0.00

The Euclidean distance between sample 1 and sample 2 will be:

_I

ED_jk = [S(A_ij-A_ik)² ]^0.5

ⁱ⁼¹

₈

ED₁₂ = [S (A_i1-A_i2)² ]^0.5

ⁱ⁼¹

={[(0.01-0.00)² +(0.01-0.00)²+(0.03-0.17)²+(0.01-0.00)²+(0.00-0.75)²+(0.01-0.00)²+(0.22-0.08)²+(0.71-0.00)]²}^0.5

=[0.0001+0.0001+0.0196+0.0001+0.5625+0.0001+0.0196+0.5041]^0.5

=1.106200^0.5

=1.0518

Dissimilarity matrix (distance matrix, difference matrix):
                  Sample
Sample     1      2     3      4
1       0.0000 1.0518 0.9711 1.0265
2       1.0518 0.0000 0.5175 1.0573
3       0.9711 0.5175 0.0000 0.7854
4       1.0265 1.0573 0.7854 0.0000

A dissimilarity matrix is SQUARE and SYMMETRIC, which means the columns and the rows are the same, or (equivalently) the matrix is the same as its transpose.
Similarity matrices, and correlation matrices are also square, symmetric matrices, but differ from dissimilarity matrices in that:

The diagonals of similarity matrices are usually 1 or 100, and the correlation matrix has diagonals of 1.

Example of correlation matrix:
values are values of r
        pH     Ca       Mg        K     elevation
pH    1.000    0.971    0.873    0.652   -0.322
Ca    0.971    1.000    0.911    0.653   -0.421
Mg    0.873    0.911    1.000    0.389   -0.499
K     0.652    0.653    0.389    1.000    0.121
elev-0.322   -0.421   -0.499    0.121    1.000

The Covariance matrix (also known as variance-covariance matrix) is also a square, symmetric matrix.
The covariance between two variables, x and y, is defined as:
_N _ _
Cov(x,y)=[S (x_i-x)(y_i-y)]/(N-1)
ⁱ⁼¹

Note that the units for covariance are in x units times y units. So if x is Mg and y is elevation, the covariance between the two is in "ppm m".
Note what happens if you take the covariance of a variable with itself:

          _N    _    _            _N   _

Cov(x,x)=[S(x_i-x)(x_i-x)]/(N-1) = [S(x_i-x)²]/(N-1) = s²

         ⁱ⁼¹                      ⁱ⁼¹

So the diagonal elements of the covariance matrix equal the variances of the variables.
If you standardize your variables (see Basic Statistics), the covariance matrix becomes your correlation matrix!

This page was created and is maintained by Michael Palmer.
To the ordination web page