[Prev][Next][Index]

Re: FW: Regression Analysis and centralization



I may be missing a subtle distinction, but it seems you are suggesting 
Knight check out smoothers. While I heartily agree, I would ammend your 
response to include the last 20 years of tools statisticians and data 
analysts have developed for  this purpose: locally weighted least 
squares (loess, Cleveland, JASA, 
1979), kernal smoothing, polynomial smoothers, natural splines, adaptive 
regression splines, etc. See, for example, Hastie & Tibshirani's 1990 
monograph "Generalized Additive Models", Chapman & Hall.


Cheers,

Joel H. Reynolds		reply to: joel@stat.washington.edu
Dept. of Statistics
University of Washington
Seattle, WA 98195		Festina Lente => Make Haste Slowly

On Mon, 11 May 1998, Jay Bai wrote:

> Dear colleagues,
> 
> Would someone be interested in the following short article and help me by 
> reviewing or offering some professional advise, before I response to T. W. 
> Knight' email on 5/6/98 Ecol-L?
> 
> Thank you very much in advance.
> 
> T. Jay Bai
> 
> -----Original Message-----
> From:	Jay Bai [SMTP:bai@gpsr.colostate.edu]
> Sent:	Saturday, May 09, 1998 8:59 AM
> To:	MDSM (E-mail)
> Cc:	
> Subject:	Regression Analysis 2
> 
> =============
> Thomas W. Knight and others with interest on regression analysis,
> A pilot study with centralization could help you to find a best fit to your
> data. Further explanation attached. If you have any more questions, comments,
> or suggestions, please contact to bai@gpsr.colostate.edu or join our discussion
> mailing list MDSM@gpsrv1.gpsr.colostate.edu
> Best
> T. Jay Bai
> 
> Attachment:
> It is interesting and important to choose a regression line of best fit to the 
> data. In my opinion, the purpose of regression is to reveal a longer term trend 
> from relatively shorter term data (expressed by n points).
> The classic method of regression analysis is the Least Squares Solution. It is 
> to find a line, straight or curved, which has the smallest sum of squared 
> distances from all the n points. Generally, it is difficult and complex to 
> decide which curve fits the best or even if there were curves there. It depends 
> on the experience and skills of the operator and the structure of the data.
> Here we are offering an easier and sound way to find the best curve or curves 
> fitting a time series data. This is an alternative way to conduct the least 
> squares solution. It is to find N centroids from the n points (N<n) using 
> centralization (averaging). These centroids have the smallest sum of distances 
> from the original points.
> For example, we have 50 artificial values, say 50-years production of corn:
> 16.45
> 20.45
> 26.81
> 4.29
> 34.44
> 53.03
> 69.55
> 80.51
> 27.26
> 5.02
> 16.67
> 74.11
> 42.69
> 58.34
> 29.84
> 13.18
> 76.01
> 80.17
> 50.57
> 100.94
> 84.97
> 70.33
> 42.22
> 115.00
> 62.83
> 20.41
> 52.88
> 71.35
> 5.85
> 83.19
> 103.56
> 70.59
> 95.88
> 80.74
> 89.13
> 90.77
> 73.10
> 97.07
> 84.95
> 94.02
> 93.86
> 87.44
> 78.69
> 100.55
> 75.57
> 82.39
> 87.22
> 87.43
> 84.71
> 95.44
> 
> It is hard to see if there is a trend behind these data or which curve to fit 
> the data the most, unless you are a very experienced data analyst.
> However, after the data were equally grouped into ten groups (or other number 
> of groups you prefer) and represented by their averages, the trend (or trace) 
> becomes clearer:
> 
> 20.49
> 47.07
> 44.33
> 64.17
> 75.07
> 46.73
> 87.98
> 87.98
> 87.22
> 87.44
> 
> 
> There are two peaks, two valleys, one plateau, and a 50-years upward trend, 
> especially after they were plotted.
> Further analysis might even discover the peaks and valleys would fit the annual 
> precipitation pattern, but the upward trend over fifty years might fit with the 
> building up of the organic mater in soils.
> The 10 points are the group averages of the 50 points. Geometrically, they are 
> centroids expressing the trace of movement of the corn production over the 
> fifty years, but with the variances (noise) were filtered out.
> Furthermore, we can get averages from 1st-5th, 2nd-6th, 3rd-7th,.. 45th-50th. 
>  Then we can use these 45 new points to represent the original 50 points, and 
> plot them. The curve would look much smoother. This method of moving average is 
> widely used in engineering. It may be termed filtering or smoothing, but is 
> really based on the Least Squares Solution or Centralization in 
> Multidimensional space.
> This method based on centralization does not require much experience or skills.
> It can be applied to any situation where the data can be grouped. In Thomas W. 
>  Knight' case, a pilot study with centralization could offer a best curve to 
> fit his data.
> Please address any comments, questions, and suggestions to:
> bai@gpsr.colostate.edu
> (For email communication, we omitted the graphics.)
> References for data centralization:
> Bai, T. J., T. Cottrell, D.Y. Hao, T. Te, 1997: Multi-Dimensional Sphere Model 
> and vegetation instantaneous trend analysis. Ecological Modelling, Vol.97 
> No.1-2, pp75-86 Or one can download an author revised version from:
> http://lamar.colostate.edu/~jbai/mdsm55.html
> 
> Contributions from S. Canner and T. Cottrell are appreciated.
> 
> T. Jay BAI, Ph.D.
> Quantitative Ecologist
> MDSM Data Analysis Service
> PO Box 272628
> Fort Collins, CO 80527
> 970-490-8345
> Bai@gpsr.colostate.edu
> http://lamar.colostate.edu/~jbai
> 
> 

References: