DATA FORMATS FOR INPUT INTO CANOCO, DECORANA, OR TWINSPAN

NOTE added in 2013:  This section is no longer relevant for users of CANOCO 5.0.   I am not eliminating the material, in case users have legacy data sets they need to understand.

 

 

Canoco uses input data in ASCII form.  In CANOCO for Windows, it is theoretically possible that you would never need to see such ASCII files, since they can be created and read by other facilities.  However, it is good practice to know the general data formats, for the purpose of troubleshooting.  Most of this page is valid for the older CANOCO for DOS.  Special considerations for CANOCO for WINDOWS are listed at the end of this page.



Suppose you had a data set in which four large quadrats were sampled for birds, and you obtained the following data:

            Sample 1   Sample 2   Sample 3   Sample 4
 
Cardinals      1         0           0          3
roadrunners    1         0           0          0
blue birds     3         2           0          0
phoebes        1         0           5          2
titmice        0         9           6          0
red-tails      1         0           0          0
chickadees    20         1           1          0
waxwings      66         0           0          0

How would you get these data into shape, so that CANOCO can read them?

CANOCO is a FORTRAN program, and therefore requires input in FORTRAN format.

Conceptually, the most straightforward way to input these data into CANOCO is in "full format". In full format, the samples are the rows, and the columns are the species. An example of the above data translated into full format follows. (In the remaining, data sets ready for analysis are surrounded by horizontal lines - and the lines are not part of the data files). It is worth noting here that you would in most cases be better off having your data in reduced condensed format, to be discussed later. 


BIRD DATA IN FULL FORMAT
(I3,8F3.0)
  8
  1  1  1  3  1  0  1 20 66
  2  0  0  2  0  9  0  1  0
  3  0  0  0  5  6  0  1  0
  4  3  0  0  2  0  0  0  0
  0  0  0  0  0  0  0  0  0
CARDINALROADRUNNBLUEBIRDPHOEBE  TITMOUSEREDTAILSCHICKADEWAXWINGS
SAMPLE 1SAMPLE 2SAMPLE 3SAMPLE 4

Let us now dissect the above data file.

Make sure to end your data set with a hard return.

Make ABSOLUTELY SURE that the file is stored in ASCII form (i.e. "text" or "data" form with no tabs).

As mentioned above, you cannot have more than 80 characters per line. What if you just have too much data per sample? You can either use reduced condensed format, or use the slash (/) to indicate an additional line; both of these will be discussed later. It is permissible to have data values without spaces in between, as long as the format statement is precise.

Notice in the above data file that there are a lot of zeros - and indeed, most data sets are loaded with zeros. It wastes space, computer memory, and effort to include them all. Therefore, it is usually preferable to have data files in "Cornell reduced condensed format" - so called because it was originally developed for Cornell Ecology Programs. The data are given below in this format: 


BIRD DATA IN REDUCED CONDENSED FORMAT
(I3,5(I3,F3.0))
  5
  1  1  1  2  1  3  3  4  1  6  1
  1  7 20  8 66
  2  3  2  5  9  7  1
  3  4  5  5  6  7  1
  4  1  3  4  2
  0
CARDINALROADRUNNBLUEBIRDPHOEBE  TITMOUSEREDTAILSCHICKADEWAXWINGS
SAMPLE 1SAMPLE 2SAMPLE 3SAMPLE 4

A special case of reduced condensed format is when you have only one couplet per line: 


BIRD DATA IN REDUCED CONDENSED FORMAT - with one couplet
(I3,I3,F3.0)
  1
  1  1  1
  1  2  1
  1  3  3
  1  4  1
  1  6  1
  1  7 20
  1  8 66
  2  3  2
  2  5  9
  2  7  1
  3  4  5
  3  5  6
  3  7  1
  4  1  3
  4  4  2
  0
CARDINALROADRUNNBLUEBIRDPHOEBE  TITMOUSEREDTAILSCHICKADEWAXWINGS
SAMPLE 1SAMPLE 2SAMPLE 3SAMPLE 4

Now why might you want to do this, given that it takes up more space? This is because it is easy to input data in this format in a spreadsheet, and it is easier to manipulate in programs other than CANOCO. It is also a lot easier to make sure your columns are aligned correctly! I don't recommend that you use this format if you plan on printing hard copies of your data set.

The data sets for environmental variables are best kept in separate files from those for the species data. Environmental data can be in the form of either full format or in Cornell reduced condensed format. In general, I recommend full format if you have a preponderance of quantitative (e.g. continuous) variables. Reduced condensed format is better if you have a preponderance of qualitative (e.g. categorical) variables. Categorical variables must be coded as dummy variables; please see Environmental Variables in Constrained Ordination (CCA, RDA)

For environmental data (including covariable files), the variable names are given in place of species names. The sample names can either be left blank (in which case a number of hard returns should be given at the end of the file), or they should be identical to the sample names in the species data file.

Although it can be frustrating to use FORTRAN formatting statements, they do allow a wide range of flexibility. 


SKIPPED COLUMNS

Suppose you had a data file with some information you did not want to use. You could then use an "X" to indicate skipped columns. A statement like:

(I3,10X,5(I3,F3.0))

Would indicate that immediately following the sample number, there were 10 characters of either blank spaces, comments, or numbers that you did not want CANOCO to read.

WARNING: CANODRAW is a program which takes the output of CANOCO and plots it. Since CANODRAW is not a FORTRAN program, it imperfectly reads FORTRAN statements. CANODRAW may interpret these skipped columns as data. 


CONTINUED LINES

For very large environmental data files, with dozens of variables, it may be impossible to fit all the data for a sample on one line. Therefore, you could have a FORTRAN formatting statement such as:

(I5,8F3.0/9F5.0/5F5.0)

This means that there is a five-character integer, followed by eight three-character (including spaces) real numbers, and then a second line consisting of nine five-character real numbers, and then a third line consisting of five five-character real numbers. If you have continued lines, then you must make sure your final sample (the notational "zero" sample which ends the data set) has the same number of lines.


Data file names

Although any file extension is acceptable, it is good form to develop a convention for naming your data files. I give all of my reduced condensed format files the extension "*.rc", my full format files the extension "*.ful",  my environmental files the extension "*.env", and my covariable files "*.cov".  However, note that "*.spe" is becoming a standard extension for species data. Choose whatever format is convenient for you.

Many people don't realize you can use the same file for both the environmental data and for the covariables. All you need to do is to omit the other variables from the analysis.

For heaven's sakes, don't give your data files names like "data.spe" or "envdata.env".  You will regret this in the long run, when you accumulate a large number of data files.  Try to be as descriptive as possible.


Common mistakes

Because the input formats for CANOCO are fairly awkward, it is common to have errors in the files. Common errors include:

Fortunately, CANOCO will alert you to a number of errors (e.g. if the number or names of entities do not match), and the results of other errors will be obvious (e.g. nonsensical species names). Errors in coding or ordering species can be detected in CANOCO output if rare species have high weights and common species have low weights. However, some errors remain elusive, and can best be found by repeatedly proofing the files.



HINT: if you would like to check your input format by printing out a hard copy of your file, make sure to use a nonproportional (i.e. fixed letter width) font. This makes it easier to count characters and determine the alignment of columns.
 


CANOCO for Windows

CANOCO for Windows has a new facility, WCanoImp, which makes it easier to create data from a spreadsheet.  The general procedure is as follows:

Some general recommendations and comments:

This web page is intended as a quick overview. See the CANOCO manual and readme files for further details and options.



This page was created and is maintained by Michael Palmer.
 To the ordination web page