March 9, 2005

How to do a sabermetric study, III

Preparing for Data Collection

Once your hypothesis is defined and your model is formulated, you should do one more exercise before you commence collecting data.  Draw a picture.

Put pen on paper and sketch the ideal graphic(s) you would expect to see in your final paper.  It might be a histogram a boxplot, a scatterplot, a regression surface, whatever.  Just draw it.  Label your axes, give it a title, think about the points.  If you can do this, then you're ready to go.

For example, your hypothesis might be that the ERA of a pitching staff when the primary catcher is behind the plate is equivalent to the ERA when the back-ups are in the game.  You might sketch something like this:


I made that graph with SPSS.  Your hand-drawn picture might just be the X and Y axes with a few sample points and a line that fits them.  But in either case, it shows that this is a testable hypothesis. 

If your hypothesis involves a straight comparison of two means, you might sketch a boxplot to represent what your data will (hopefully) look like.  Suppose you were interested in assessing the difference between mean ERA in 1968 and 2000:


(That graph also comes from SPSS 12.0.  Those are actual data, showing the outstanding performances of Gibson and Pedro relative to the years in which they pitched.  We use this plot in our sabermetrics course to show that Petey's 2000 performance is just as impressive as Gibson's microscopic 1968 ERA.)

Finally, suppose your hypothesis was that slugging percentage has increased over the past 15 years.  Your sketch might look like this:


Again, a simple sketch will do.  If you can label your axes, define your X and Y variables, and think about which means/slopes/areas your analysis will compare, then you ready to start collecting data.  Go for it.


Home