Tree ID Numbers   Edward Frank
  Oct 18, 2006 08:35 PDT 
ENTS,

Last spring there was a back and forth discussion about how to best assign a measured tree a unique ID number, while still allowing people to work independently and allowing the datasets to be merged. The discussion trailed off at the end, but I think that the summary at that point deserves posting to the discussion list in general. The following is a post I made to various individuals on April 26, 2006. In retrospect I like John Eicholz proposal better than I did at the time. This is a problem that must be resolved if we are to coordinate our datasets at some point on down the road, and I wanted to bring it back into the the general debate.

Ed Frank
======================================

Everyone,

John Eicholz rolled out his tree ID numbering system in a post to the ENTS discussion list on Tuesday entitled - Revisiting the Ash Queen.

This number consists of ten digits and addresses all of the concerns raised to me so far either in individual correspondence and discussions and posts to this thread. It is a bit longer than the 8 digit alpha-numeric system I first proposed, but resolves the questions raised about sorting and filling that existed in a mixed alphabetic and number system.

The overall goal as I see it is to assign unique numbers to each tree (and data measurement) in as simple manner as possible, that allows the measurers to work independently with little top-down coordination, but still be able to compile the information from different people into a single master list without overlaps and conflicts in a compatible format.

I think the tree ID system should have a set structure so that it can be easily incorporated into a master database, so the numbering system used in any particular case be understandable by others without converting for specific user variations. The numbers themselves can be most anything so long as fit the structure.   The ID number and data increment number should be separate numbers in separate columns.

John Eicholz proposed: " It matters little whether it is a tree id, a measurement id, or a photo id, as long as each number is used responsibly and consistently in a way that others can follow. This is basically a "GUID" in computer terms, which stands for "Globally Unique ID" You can find that in most databases. I think 10 digits is sufficient to provide the uniqueness we desire, without limiting the amount of data we can gather. Assigning a 3-digit "person ID" allows 999 people to participate, and if we run out (miracle that would be) we can add a digit. And, it is a reasonable compromise to allocate numbers AND identify the data source. That is why I chose 309 as my ID. Then there are no letters, just integers, which are very efficient to process."

I think this is the way to go with the tree ID system.

Jess Riddle wrote: " My main objection (to my original concept) is the mixing of letters and numbers within the identifier because that presents problems with sorting. I would suggest separating the information into two columns to maintain maximum flexibility with presenting the data, and greatly speed up data entry. That way identifiers may be entered by copying down and using auto fill (series?) instead of having to type in each individual entry, which becomes very time consuming if you go to some site and measure 40 trees in a day.

By using a 3 digit number there is no need to separate the measurer ID from the tree identification numbers by using John's suggestion.
It allows the ID listing to be sorted and allows you to use the autofill in Excel.   

John's ID number is structured in this format (the dashes are included just for illustrative purposes and will not be included in the actual number:

123-4-567-890

In this system the first three numbers identify the measurer of the tree. I agree the best way to go is to use the number to represent the person who did the measurement rather than the database compiler as I first suggested. I will post a list of Measurer ID's on the website so that when someone wants to opt into the measuring game they can find an ID that has not been used. They can email me their selection.

The fourth number is used in John's system to identify a data measurement when a 1 is used, or a tree ID when a zero is used. This number could be used for other purposes by individual measurers or simply be a number reserved for future use.

Numbers 5, 6, and 7 represent a site location assigned by the measurer. It would allow up to 999 different sites to be measured and uniquely identified by the measurer. This is a reasonable item to include. Basically the more information included in the ID, the more digits it will require.

Numbers 8, 9, and 0 represent a unique number for a particular tree, within that particular site, and measured by that particular person. The three digits would allow up to 999 trees to be measured at each of 999 sites.

Dale Luthringer wrote (individual email): " I've tended to try to mark my more significant trees with a small painted tack. My sequential numbering system is short on characters (3 digits), fits on the tack, transposes easily to GPS and data entry. If I have to use two different numbering systems, one for the field and one for data entry, I'll probably resort to just using my shorter number system due to time and ease. I see the problem of numbering pines when it comes to a bunch of us keeping track of various different lists."

At a particular site, this system would allow up to 999 trees to be listed by the three digit numbers that fit on the head of a tack. Data entry into a master list would not require two different numbering systems, the number used in the field on the tack would be appended to the end of a uniform 7 digit user/site code for that particular site.

I think this system will work for uniquely identifying trees and I recommend that it be adopted.

The question of how to identify particular measurement records is still up in the air to my mind. I agree completely with John that we should keep the records of older measurements, and that these measurements should be identified.

I wrote earlier: Each measurement record, regardless of the type of measurement being taken, even a photograph might qualify, would be given a record key number to identify this set of measurements. The initial measurement , by default would be the first in the sequence. Historical information could be logged into the dataset on the day it was found or compiled.

John Eicholz wrote: "The measurement or photograph or poem has a unique id, independent of tree id. I think the poem example makes this point clearly. Also, there could be records that involve two or more trees, or an association between trees and geology, or whatever. There could be records with no tree at all.

I wrote to John E.: " I suggested using the date as the data increment number. This would be an independent number understandable by everyone. It would also assure that, unless you both were measuring the same tree on the same day, that you would always have a unique data increment number. The date would be used in the dataset anyway. You could have multiple datasets collected on the same day on different trees, but they would still be unique identifiers when used in conjunction with the tree ID. If both you and Bob measured the same tree on the same day and obtained different data, they could be combined into a single batch for that day."

John wrote (individual email): "My own structure would be to have the 4th digit be a zero for a tree and a 1 for a measurement, that way I can have two sequences going. Each person could have their own structure, or use mine, since all that really matters is to have some kind of unique index to the trees and measurements.

For example (and for real):

Tree ID            Measurement ID    Species                     
3090032053      3091000041      American basswood

height         Circ      Comments
126.9'        5.5'      new state height record

"I often make multiple measurements of a single tree on a single day. I measure different tops, collect multiple readings to average, and move around to see if I have the best view. I think it is called "Eichholzing", although I am not sure why. I may save more than one measurement in the database, for various reasons."

John's method uses his ID number, a 1 in the fourth place, and a measurement ID number in the 8,9, and 0 space.

I don't really see any problem with more than one set of measurements for the same tree being included on an individual record for a single day. So long as each record contains the date of the record, I am not sure that a unique ID number even needs to be assigned.

If a number needs to assigned to a record I proposed to use the date as the data increment number. This would be an independent number understandable by everyone. It would also assure that, unless you both were measuring the same tree on the same day, that you would always have a unique data increment number. The date would be used in the dataset anyway. If both people did measurements in the same day of the same tree the measurements could be combined into a single record. You could have multiple datasets collected on the same day on different trees, but they would still be unique identifiers when used in conjunction with the tree ID. The data could simply be a date arranged yyyymmdd.

I am hoping for some ideas and opinions on the matter from everyone else. I am not sure what the best option is, but I am not enthusiastic about John's method, or even about the date option.

Ed Frank