# A Field-test of Basic Empirical Bayes and Bayes Methodologies: In-Season Prediction of Baseball Batting Averages

Batting average is one of the principle performance measures for an individual baseball player. It has a simple numerical structure as the percentage of successful attempts, “Hits”, as a proportion of the total number of qualifying attempts, “At-Bats”. This situation, with Hits as a number of successes within a qualifying number of attempts, makes it natural to statistically model each player’s batting average as a binomial variable outcome, with a true (but unknown) value of that represents the i-th player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications.

We will look at batting records for every Major League player over the course of a single season (2005). The primary focus is on using only the batting record from an earlier part of the season (e.g., the first 3 months) in order to predict the batter’s latent ability, , and consequently to predict their batting-average performance for the remainder of the season. Since we are using a season that has already concluded, we can validate our predictive performance by comparing the predicted values to the actual values for the remainder of the season.

The methodological purpose of this study is to gain experience with a variety of predictive methods applicable to a much wider range of situations. Several of the methods to be investigated derive from empirical Bayes and hierarchical Bayes interpretations. Although the general ideas behind these techniques have been understood for many decades*, some of these methods have only been refined relatively recently in a manner that promises to more accurately fit data such as that at hand.

One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses. If time permits we will also describe how it can be used to test for the presence of “streaky hitters”, batters whose latent ability appears to significantly change over time.

No prior knowledge of the sport of baseball is required.

• A particularly relevant background reference is Efron, B. and Morris, C. (1977) Stein’s paradox in statistics” Scientific American 236 119-127, and the earlier, more technical version (1975), “Data analysis using Stein’s estimator and its generalizations” Jour. Amer. Stat. Assoc. 70 311-319.

This talk is part of the Kuwait Foundation Lectures series.

## This talk is included in these lists:

Note that ex-directory lists are not shown.

© 2006-2022 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity