18, 2006 08:35 PDT
Last spring there was a back and forth discussion about how to
best assign a measured tree a unique ID number, while still
allowing people to work independently and allowing the datasets
to be merged. The discussion trailed off at the end, but I think
that the summary at that point deserves posting to the
discussion list in general. The following is a post I made to
various individuals on April 26, 2006. In retrospect I like John
Eicholz proposal better than I did at the time. This is a
problem that must be resolved if we are to coordinate our
datasets at some point on down the road, and I wanted to bring
it back into the the general debate.
John Eicholz rolled out his tree ID numbering system in a post
to the ENTS discussion list on Tuesday entitled - Revisiting the
This number consists of ten digits and addresses all of the
concerns raised to me so far either in individual correspondence
and discussions and posts to this thread. It is a bit longer
than the 8 digit alpha-numeric system I first proposed, but
resolves the questions raised about sorting and filling that
existed in a mixed alphabetic and number system.
The overall goal as I see it is to assign unique numbers to each
tree (and data measurement) in as simple manner as possible,
that allows the measurers to work independently with little
top-down coordination, but still be able to compile the
information from different people into a single master list
without overlaps and conflicts in a compatible format.
I think the tree ID system should have a set structure so that
it can be easily incorporated into a master database, so the
numbering system used in any particular case be understandable
by others without converting for specific user variations. The
numbers themselves can be most anything so long as fit the
structure. The ID number and data increment
number should be separate numbers in separate columns.
John Eicholz proposed: " It matters little whether it is a
tree id, a measurement id, or a photo id, as long as each number
is used responsibly and consistently in a way that others can
follow. This is basically a "GUID" in computer terms,
which stands for "Globally Unique ID" You can find
that in most databases. I think 10 digits is sufficient to
provide the uniqueness we desire, without limiting the amount of
data we can gather. Assigning a 3-digit "person ID"
allows 999 people to participate, and if we run out (miracle
that would be) we can add a digit. And, it is a reasonable
compromise to allocate numbers AND identify the data source.
That is why I chose 309 as my ID. Then there are no letters,
just integers, which are very efficient to process."
I think this is the way to go with the tree ID system.
Jess Riddle wrote: " My main objection (to my original
concept) is the mixing of letters and numbers within the
identifier because that presents problems with sorting. I would
suggest separating the information into two columns to maintain
maximum flexibility with presenting the data, and greatly speed
up data entry. That way identifiers may be entered by copying
down and using auto fill (series?) instead of having to type in
each individual entry, which becomes very time consuming if you
go to some site and measure 40 trees in a day.
By using a 3 digit number there is no need to separate the
measurer ID from the tree identification numbers by using John's
It allows the ID listing to be sorted and allows you to use the
autofill in Excel.
John's ID number is structured in this format (the dashes are
included just for illustrative purposes and will not be included
in the actual number:
In this system the first three numbers identify the measurer of
the tree. I agree the best way to go is to use the number to
represent the person who did the measurement rather than the
database compiler as I first suggested. I will post a list of
Measurer ID's on the website so that when someone wants to opt
into the measuring game they can find an ID that has not been
used. They can email me their selection.
The fourth number is used in John's system to identify a data
measurement when a 1 is used, or a tree ID when a zero is used.
This number could be used for other purposes by individual
measurers or simply be a number reserved for future use.
Numbers 5, 6, and 7 represent a site location assigned by the
measurer. It would allow up to 999 different sites to be
measured and uniquely identified by the measurer. This is a
reasonable item to include. Basically the more information
included in the ID, the more digits it will require.
Numbers 8, 9, and 0 represent a unique number for a particular
tree, within that particular site, and measured by that
particular person. The three digits would allow up to 999 trees
to be measured at each of 999 sites.
Dale Luthringer wrote (individual email): " I've tended to
try to mark my more significant trees with a small painted tack.
My sequential numbering system is short on characters (3
digits), fits on the tack, transposes easily to GPS and data
entry. If I have to use two different numbering systems, one for
the field and one for data entry, I'll probably resort to just
using my shorter number system due to time and ease. I see the
problem of numbering pines when it comes to a bunch of us
keeping track of various different lists."
At a particular site, this system would allow up to 999 trees to
be listed by the three digit numbers that fit on the head of a
tack. Data entry into a master list would not require two
different numbering systems, the number used in the field on the
tack would be appended to the end of a uniform 7 digit user/site
code for that particular site.
I think this system will work for uniquely identifying trees and
I recommend that it be adopted.
The question of how to identify particular measurement records
is still up in the air to my mind. I agree completely with John
that we should keep the records of older measurements, and that
these measurements should be identified.
I wrote earlier: Each measurement record, regardless of the type
of measurement being taken, even a photograph might qualify,
would be given a record key number to identify this set of
measurements. The initial measurement , by default would be the
first in the sequence. Historical information could be logged
into the dataset on the day it was found or compiled.
John Eicholz wrote: "The measurement or photograph or poem
has a unique id, independent of tree id. I think the poem
example makes this point clearly. Also, there could be records
that involve two or more trees, or an association between trees
and geology, or whatever. There could be records with no tree at
I wrote to John E.: " I suggested using the date as the
data increment number. This would be an independent number
understandable by everyone. It would also assure that, unless
you both were measuring the same tree on the same day, that you
would always have a unique data increment number. The date would
be used in the dataset anyway. You could have multiple datasets
collected on the same day on different trees, but they would
still be unique identifiers when used in conjunction with the
tree ID. If both you and Bob measured the same tree on the same
day and obtained different data, they could be combined into a
single batch for that day."
John wrote (individual email): "My own structure would be
to have the 4th digit be a zero for a tree and a 1 for a
measurement, that way I can have two sequences going. Each
person could have their own structure, or use mine, since all
that really matters is to have some kind of unique index to the
trees and measurements.
For example (and for real):
Tree ID Measurement
3090032053 3091000041 American
height Circ Comments
126.9' 5.5' new
state height record
"I often make multiple measurements of a single tree on a
single day. I measure different tops, collect multiple readings
to average, and move around to see if I have the best view. I
think it is called "Eichholzing", although I am not
sure why. I may save more than one measurement in the database,
for various reasons."
John's method uses his ID number, a 1 in the fourth place, and a
measurement ID number in the 8,9, and 0 space.
I don't really see any problem with more than one set of
measurements for the same tree being included on an individual
record for a single day. So long as each record contains the
date of the record, I am not sure that a unique ID number even
needs to be assigned.
If a number needs to assigned to a record I proposed to use the
date as the data increment number. This would be an independent
number understandable by everyone. It would also assure that,
unless you both were measuring the same tree on the same day,
that you would always have a unique data increment number. The
date would be used in the dataset anyway. If both people did
measurements in the same day of the same tree the measurements
could be combined into a single record. You could have multiple
datasets collected on the same day on different trees, but they
would still be unique identifiers when used in conjunction with
the tree ID. The data could simply be a date arranged yyyymmdd.
I am hoping for some ideas and opinions on the matter from
everyone else. I am not sure what the best option is, but I am
not enthusiastic about John's method, or even about the date