The question addressed by this symposium was not what is evidence, but how can we measure the strength of evidence when weighing alternatives against data. Scientists are confronted with the goal of seeking understanding about the mechanistic workings of the natural world, which we can never observe in entirety. Statistics is the tool by which we take incomplete observation and make inferences about the whole population or latent quantities that cannot be observed precisely. The major approaches to inference are the Neyman-Pearson/Fisherian school usually referred to as classical statistics, the likelihood approach and the Bayesian approach. The goal of the symposium and workshop was to present a clear delineation of the major approaches to inference and their implications, both for the methodology suggested by each and the underlying connections to our goals as scientists. By presenting a clear delineation, we hoped to dispel the faddishness of one approach over the other in the hope that informed ecologists will make choices that are harmonious with their questions.
In the meeting and symposium we clearly delimited the underlying principles and epistemologies of Neyman/Pearson, Fisherian, likelihood and Bayesian approaches to inference. We did not see the various approaches and their associated questions as mutually exclusive. We expected to find gray areas that are not well addressed by any one paradigm. It is clear that in many cases a combination of approaches is necessary, and that the overlap of different schools can expose both positive relationships and fallacies in our reasoning.
The Classical approach: Dr. Deborah Mayo and Dr. Brian Dennis brought to light clear reasoning for the desirability of controlling error probabilities and consideration of the sample space. Mayo (1996) points out that under the Neyman/Pearson paradigm the important property is the long run control of errors. This property is puts each experiment in a larger class of repeated experiments from the proposed model from which the scientist can judge how often he/she is willing to make errors. Dr. Mayo points out as well that the choice of model and experiment dictates the evidence that will be presented. Dr. Dennis through examples of how researchers can be misled by mispecification of a prior, discusses why inference should be conducted with reference to the sample space. His point is that in conducting science the burden of proof is on the scientist to convince a body of reasoned skeptics (other scientists) that a hypothesis or model should be accepted. A reasoned skeptic he argues would not accept results based on the evidence from one experiment.
The Bayesian approach: Dr. Daniel Goodman and Dr. Bruce Lindsay both expressed the view that in the Bayesian framework probability could be associated with a frequentist interpretation. This step away from personal probability is encompassed by the school of empirical Bayes methodology (Effron 1996) and those Bayesians who think of the posterior as a realization from a family of distributions that converges to the "true" posterior in a hypothetical probability world (Rubin 1984). Probability in the Bayesian framework is generally thought to be a quantification of personal belief (Oakes 1986).
The likelihood Approach: Besides addressing the Bayesian and frequentist debate we addressed likelihood based methods. Dr. Royall showed that the likelihood ratio (the ratio of two likelihood functions) can understood through the law of likelihood as the weight of evidence for one hypothesis over the other conditional on the data alone. The likelihood ratio allows the calculation of the probability of obtaining misleading evidence, and the probability of getting strong or week evidence for a given sample size (Royall 1997, 1998). The likelihood approach centers on the law of likelihood and the likelihood principal. Likelihood approaches are constrained by the parameter space and not the sample space, though one implicitly conditions on the sample space through how the model is specified.
The Evidential Framework: Dr. Subhash Lele addressed the issue of evidence measures. Starting with the likelihood ratio as a measure of evidence for one hypothesis over another he looked at a large class of evidence functions, functions that compare the disparity of a model to the data to the disparity of another model with the data. Focusing on the asymptotic properties of these measures he showed that they have properties similar to estimating equations and that the likelihood ratio is optimal in this class. Dr. Lele also points out that other disparity measures are more robust to outliers and model mis-specification. Complementing the findings of Dr. Lele were the talks by Dr. Mark Taper and Dr. Bruce Lindsay. Dr Taper looked at the Akaike information criteria (AIC) and other related measures as a tool for model selection. The AIC is a disparity measure between the model and the data, so is directly related to evidence functions. Dr. Lindsay looked at model adequacy, and robustness. Since a model never specifies the truth exactly how can we decide whether a model is specified to be close enough to the truth to provide adequate inference? Again since the truth is never observed we have to rely on our data to provide an estimate and look at the disparity between the model and the data.
Though we did not come to a complete consensus as a group which methods were most appropriate, we did find a common thread in the idea of evidence functions. The exact specification of an evidence measure is still vague and will form the core of our endeavors in a working group we are currently proposing. What the group held in common was that a proper methodological framework should,
1)Quantitatively assess the strength of evidence for one hypothesis relative to other hypotheses.
2)Have known long run error properties that are small.
3)Be able to include prior information when it exists into the evidential assessment.
4.) Make meaningful statements about the accuracy of estimates.
The clear specification of measures that quantify evidence is one of the foci for the proposed working group.Efron, B. 1996. Empirical Bayes methods for combining likelihoods. JASA 91(2):538-565.
Mayo, D. G. 1996. Error and the Growth of Experimental Knowledge. University of Chicago Press, IL.
Oakes, M. 1986. Statistical Inference: A Commentary for the Social and Behavioural Sciences. John Wiley and Sons, N. Y.
Rubin, D. B. 1984. Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4):1151-1172.
Royall, R. M. 1992. The elusive concept of statistical evidence. In Bayesian Statistics IV. Ed. Bernardo, J. M., J. O. Berger, A. P. Dawid, and A. F. M. Smith. Oxford University Press, UK.
Royall, R. M. 1997. Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, N.Y.
Updated Sept,1 1998