[Prev][Next][Index]
FW: Regression Analysis and centralization
Dear colleagues,
Would someone be interested in the following short article and help me by
reviewing or offering some professional advise, before I response to T. W.
Knight' email on 5/6/98 Ecol-L?
Thank you very much in advance.
T. Jay Bai
-----Original Message-----
From: Jay Bai [SMTP:bai@gpsr.colostate.edu]
Sent: Saturday, May 09, 1998 8:59 AM
To: MDSM (E-mail)
Cc:
Subject: Regression Analysis 2
=============
Thomas W. Knight and others with interest on regression analysis,
A pilot study with centralization could help you to find a best fit to your
data. Further explanation attached. If you have any more questions, comments,
or suggestions, please contact to bai@gpsr.colostate.edu or join our discussion
mailing list MDSM@gpsrv1.gpsr.colostate.edu
Best
T. Jay Bai
Attachment:
It is interesting and important to choose a regression line of best fit to the
data. In my opinion, the purpose of regression is to reveal a longer term trend
from relatively shorter term data (expressed by n points).
The classic method of regression analysis is the Least Squares Solution. It is
to find a line, straight or curved, which has the smallest sum of squared
distances from all the n points. Generally, it is difficult and complex to
decide which curve fits the best or even if there were curves there. It depends
on the experience and skills of the operator and the structure of the data.
Here we are offering an easier and sound way to find the best curve or curves
fitting a time series data. This is an alternative way to conduct the least
squares solution. It is to find N centroids from the n points (N<n) using
centralization (averaging). These centroids have the smallest sum of distances
from the original points.
For example, we have 50 artificial values, say 50-years production of corn:
16.45
20.45
26.81
4.29
34.44
53.03
69.55
80.51
27.26
5.02
16.67
74.11
42.69
58.34
29.84
13.18
76.01
80.17
50.57
100.94
84.97
70.33
42.22
115.00
62.83
20.41
52.88
71.35
5.85
83.19
103.56
70.59
95.88
80.74
89.13
90.77
73.10
97.07
84.95
94.02
93.86
87.44
78.69
100.55
75.57
82.39
87.22
87.43
84.71
95.44
It is hard to see if there is a trend behind these data or which curve to fit
the data the most, unless you are a very experienced data analyst.
However, after the data were equally grouped into ten groups (or other number
of groups you prefer) and represented by their averages, the trend (or trace)
becomes clearer:
20.49
47.07
44.33
64.17
75.07
46.73
87.98
87.98
87.22
87.44
There are two peaks, two valleys, one plateau, and a 50-years upward trend,
especially after they were plotted.
Further analysis might even discover the peaks and valleys would fit the annual
precipitation pattern, but the upward trend over fifty years might fit with the
building up of the organic mater in soils.
The 10 points are the group averages of the 50 points. Geometrically, they are
centroids expressing the trace of movement of the corn production over the
fifty years, but with the variances (noise) were filtered out.
Furthermore, we can get averages from 1st-5th, 2nd-6th, 3rd-7th,.. 45th-50th.
Then we can use these 45 new points to represent the original 50 points, and
plot them. The curve would look much smoother. This method of moving average is
widely used in engineering. It may be termed filtering or smoothing, but is
really based on the Least Squares Solution or Centralization in
Multidimensional space.
This method based on centralization does not require much experience or skills.
It can be applied to any situation where the data can be grouped. In Thomas W.
Knight' case, a pilot study with centralization could offer a best curve to
fit his data.
Please address any comments, questions, and suggestions to:
bai@gpsr.colostate.edu
(For email communication, we omitted the graphics.)
References for data centralization:
Bai, T. J., T. Cottrell, D.Y. Hao, T. Te, 1997: Multi-Dimensional Sphere Model
and vegetation instantaneous trend analysis. Ecological Modelling, Vol.97
No.1-2, pp75-86 Or one can download an author revised version from:
http://lamar.colostate.edu/~jbai/mdsm55.html
Contributions from S. Canner and T. Cottrell are appreciated.
T. Jay BAI, Ph.D.
Quantitative Ecologist
MDSM Data Analysis Service
PO Box 272628
Fort Collins, CO 80527
970-490-8345
Bai@gpsr.colostate.edu
http://lamar.colostate.edu/~jbai
Follow-Ups: