[Prev][Next][Index]

FW: Regression Analysis and centralization



Dear colleagues,

Would someone be interested in the following short article and help me by 
reviewing or offering some professional advise, before I response to T. W. 
Knight' email on 5/6/98 Ecol-L?

Thank you very much in advance.

T. Jay Bai

-----Original Message-----
From:	Jay Bai [SMTP:bai@gpsr.colostate.edu]
Sent:	Saturday, May 09, 1998 8:59 AM
To:	MDSM (E-mail)
Cc:	
Subject:	Regression Analysis 2

=============
Thomas W. Knight and others with interest on regression analysis,
A pilot study with centralization could help you to find a best fit to your
data. Further explanation attached. If you have any more questions, comments,
or suggestions, please contact to bai@gpsr.colostate.edu or join our discussion
mailing list MDSM@gpsrv1.gpsr.colostate.edu
Best
T. Jay Bai

Attachment:
It is interesting and important to choose a regression line of best fit to the 
data. In my opinion, the purpose of regression is to reveal a longer term trend 
from relatively shorter term data (expressed by n points).
The classic method of regression analysis is the Least Squares Solution. It is 
to find a line, straight or curved, which has the smallest sum of squared 
distances from all the n points. Generally, it is difficult and complex to 
decide which curve fits the best or even if there were curves there. It depends 
on the experience and skills of the operator and the structure of the data.
Here we are offering an easier and sound way to find the best curve or curves 
fitting a time series data. This is an alternative way to conduct the least 
squares solution. It is to find N centroids from the n points (N<n) using 
centralization (averaging). These centroids have the smallest sum of distances 
from the original points.
For example, we have 50 artificial values, say 50-years production of corn:
16.45
20.45
26.81
4.29
34.44
53.03
69.55
80.51
27.26
5.02
16.67
74.11
42.69
58.34
29.84
13.18
76.01
80.17
50.57
100.94
84.97
70.33
42.22
115.00
62.83
20.41
52.88
71.35
5.85
83.19
103.56
70.59
95.88
80.74
89.13
90.77
73.10
97.07
84.95
94.02
93.86
87.44
78.69
100.55
75.57
82.39
87.22
87.43
84.71
95.44

It is hard to see if there is a trend behind these data or which curve to fit 
the data the most, unless you are a very experienced data analyst.
However, after the data were equally grouped into ten groups (or other number 
of groups you prefer) and represented by their averages, the trend (or trace) 
becomes clearer:

20.49
47.07
44.33
64.17
75.07
46.73
87.98
87.98
87.22
87.44


There are two peaks, two valleys, one plateau, and a 50-years upward trend, 
especially after they were plotted.
Further analysis might even discover the peaks and valleys would fit the annual 
precipitation pattern, but the upward trend over fifty years might fit with the 
building up of the organic mater in soils.
The 10 points are the group averages of the 50 points. Geometrically, they are 
centroids expressing the trace of movement of the corn production over the 
fifty years, but with the variances (noise) were filtered out.
Furthermore, we can get averages from 1st-5th, 2nd-6th, 3rd-7th,.. 45th-50th. 
 Then we can use these 45 new points to represent the original 50 points, and 
plot them. The curve would look much smoother. This method of moving average is 
widely used in engineering. It may be termed filtering or smoothing, but is 
really based on the Least Squares Solution or Centralization in 
Multidimensional space.
This method based on centralization does not require much experience or skills.
It can be applied to any situation where the data can be grouped. In Thomas W. 
 Knight' case, a pilot study with centralization could offer a best curve to 
fit his data.
Please address any comments, questions, and suggestions to:
bai@gpsr.colostate.edu
(For email communication, we omitted the graphics.)
References for data centralization:
Bai, T. J., T. Cottrell, D.Y. Hao, T. Te, 1997: Multi-Dimensional Sphere Model 
and vegetation instantaneous trend analysis. Ecological Modelling, Vol.97 
No.1-2, pp75-86 Or one can download an author revised version from:
http://lamar.colostate.edu/~jbai/mdsm55.html

Contributions from S. Canner and T. Cottrell are appreciated.

T. Jay BAI, Ph.D.
Quantitative Ecologist
MDSM Data Analysis Service
PO Box 272628
Fort Collins, CO 80527
970-490-8345
Bai@gpsr.colostate.edu
http://lamar.colostate.edu/~jbai


Follow-Ups: