This first post builds on work that Joe Sheehan did a year ago looking at the run value of each pitch based on its location. He placed each pitch into one of 25 bins and calculated the average run value in each bin. In the post he suggested that it would be interesting to get rid of the bins and take a continuous approach. A year later and it seems no one has done that so I thought it would be a good way to start off this blog.
Using the first table in this post I assigned a run value to each pitch in the pitch fx database and then averaged the run value of all the pitches in each location. I split the data up by handedness of the pitcher and batter. The number in parentheses is the average run value for all pitches regardless of location. The images are from the catcher perspective with an average strike zone added.
This method reproduces the some of Sheehan's results:
- Pitches outside the strike zone have a higher run value than those inside the strike zone.
- Pitches down the middle of the zone have the highest run value of pitches in the strike zone.
- Inside pitches have higher run values than outside pitches.
- Pitches down and in have higher run values than those that are up and in.
This continuous approach gives some additional insights beyond Sheehan's
- Of outside pitches, those high in the zone have a slightly higher run value than those down in the zone. This is interesting, it seems hitters prefer inside pitches down in the zone, and outside pitches up in the zone.
- The area of negative to zero to just slightly positive run value pitches (the red, yellow and green colored area) extends well beyond the defined strike zone.
- This zone of negative to zero valued pitches extends far above the strike zone peaking at x=0 over a foot above the top of the strike zone.
Tango and Litchman made some important comments on the limitations of Sheehan's original work without splitting the data by swing/taken, or pitch type. These critiques equally, if not more so since I do not split the data by count as Sheehan did, to my above analysis.
I hope to address these critiques in future posts. For example, I assume the peak of negative to zero valued pitches a foot above the center of the zone is mostly the result of 'high heat' fastballs in pitcher's counts. By analyzing the run value of pitch locations for just fast balls in specific counts I will be able to confirm or deny this assumption.
what software did you use to create the graphs?
ReplyDeletenever mind, figured it out.. R
ReplyDeleteGreat work!
ReplyDeleteWhat R package is that?
I used the R command filled.contour
ReplyDeleteWhere did you get the original data set? Great work, R is amazing!
ReplyDeleteGo here:
ReplyDeletehttp://gd2.mlb.com/components/game/mlb/year_2008/month_07/day_01/
Pick the game you want, then pick 'bpb', then either 'batters' or 'pitchers', then choose the player code and you will get all the info for that given batter (or pitcher) for that game.
I wrote a script that got all the data into a database. I don't know if there is an easier way to get the data.
Hi Centris,
ReplyDeletePlease contact me via email. My address is available in the left-hand sidebar at Baseball Analysts.
Thanks,
Rich Lederer