Remote Sensors:
Me and a colleague (who shall remain unnamed... We will refer to him as
Solomon D.) are having a lively discussion about training/test data with
remote sensing and I was hoping to get some additional feedback on this
problem. We created a species map with maximum likelihood (using 1m IKONOS
imagery), and here's how we created training data (and how we are
approaching, in one case, the testing):
We have mostly USFS plot data with a known center location and plot
boundary, and that has cover values for each species we are after in our
classification. We choose pixels from plots with a high percentage of a
single species, that are readily identifiable as the species in question
(e.g. If we know a plot only contains red fir trees, we manually choose each
pixel belonging to a tree within the boundary of the plots). This, of
course, is not an optimal way of doing this -- in theory we should have
collected individual species in the field, but this was our curse with the
data we had.
Ok, so now we have a bunch of pixels per class, taken from a limited
number of plots (e.g. We may have 1000 red fir pixels, but we took them from
10 plots). The questions is, is it "legitimate" to subdivide the 1000
pixels into two randomly chosen training and test groups (say 60% train and
40% test), and use the 60% to create the map, and validate it with the
remaining 40%, OR do we have a problem with spatial autocorrelation problem
because, while we have 1000s of pixels, the training and test pixels are all
right next to each other in the 10 plots.
In my mind the issue is muddled, because we are training based on color,
and is does the color (within a class) have a strong enough spatial pattern
to warrant a very different training/test setup (e.g. Taking the pixels from
6/10 plots for training and 4/10 for testing?) Thoughts?
--j
-- Jonathan Greenberg Graduate Group in Ecology, U.C. Davis http://www.cstars.ucdavis.edu/~jongreen http://www.cstars.ucdavis.edu AIM: jgrn307 or jgrn3007 MSN: jgrn307@msn.com or jgrn3007@msn.com
This archive was generated by hypermail 2b29 : Wed May 05 2004 - 15:37:44 PDT