Monday, March 16, 2009

Moving to Baseball Analysts

I am excited to have the opportunity to post my future work at Baseball Analysts. My first two original posts will be reposted there as well as a new post tomorrow and all my future posts.

Tuesday, March 10, 2009

Run Value by Pitch Type and Location

In the last post I noted Tango and Litchman's comment that run value by pitch location analysis was limited when averaged across pitch types and pitch counts.In this post I will address the first concern by looking at the run value by pitch location of the different pitch types separately (but again averaging across count).

Again I split it up by handedness of the batter and the pitcher and then split the data into four different pitch types (based on the pitch fx classification). I forgot to note in the first post that all the images are from the catchers perspective so that a right-handed batter stands to the left of the strike zone and a left handed batter stands to the right of the strike zone. At the top of each image is the proportion of pitches between the given handedness combination made up of the given pitch type (out of the four pitch types considered). So counting just the four pitches considered 60.9% of pitches from a right handed pitcher to a right handed batter are fast balls.



Of the pitches considered fast balls made up over 60% of pitches in each handedness combination. Thus the overall run value maps in the first post are largely reflecting the run values for fast balls. But there are some small differences:
  • In the overall maps there was no region inside the strike zone with the deep blue >.04 run value. But for fast balls a bottom corner in each image has >.04 run value. I wonder if fast balls in this region of the strike zone are less likely to be called as strikes than other pitches.
  • The region of negative to neutral run valued pitches directly above the center of the zone is even more pronounced for fastballs. The region of deep red <-.04 run valued pitches above the top of the strike zone is larger than the corresponding region in the overall map.
  • The region of negative to neutral run valued pitches below the zone is much smaller than in the overall map. This region is just away for LHB and for RHB v LHP but strangely inside for RHB v RHP. In the overall map this region extended below the entire strike zone not just one side.
  • Fast balls are thrown in roughly the same proportion in all handedness combinations.

Changeups are overwhelmingly thrown when the pitcher is of the opposite handedness of the batter. Additionally the few times when changeups are thrown when the pitcher and batter have the same handedness may be a highly non-random sample: pitchers with very good changeups, poor hitters, good pitcher's counts (this is just speculation). Because of this and the small data size we should not read too much into the same-handedness changeup maps.
  • In opposite handedness at-bats the change up has a large region of negative to neutral run valued pitches low and away extending far outside the strike zone. This is complementary to the fastball's region negative region below the zone and together they probably make-up the negative region below the zone in the overall map.


Curves are thrown in relatively constant proportion in all handedness combinations, expect for leftie/leftie where they are thrown a little bit more.
  • Compared to overall the negative to neutral region for curves is much larger extending down and away predominately.
  • With fewer curves thrown it is hard to get as good resolution, but it seems that compared to other pitches there is less discernible structure within the strike zone (i.e. there are not as clear large regions of very low run value separated by large regions of larger run value).



Sliders are thrown more when the batter and pitcher have the same handedness (the opposite of changeups), thus the same caveats apply to reading too much into the opposite-handedness maps.
  • A very large region of negative to neutral pitches extends below and away out of the strike zone.
  • Sliders up and in have a higher run value compared to overall pitches up and in.


With these separated by pitch type maps we can make some additional insights into the overall maps in the first post. The negative to neutral region above the strike zone is mostly the result of fastballs, while the negative to neutral region below the strike zone is mostly the result of non-fastball pitches. Within the strike zone most pitches have the same overall structure with the center of the zone and down and in having the highest run value within the zone for most pitches, although the pattern is not quite as apparent with curveballs.

Sunday, March 8, 2009

Run Value by Pitch Location

A lot of interesting new sabremeteric work has become possible over the past two years with the availability of the pitch fx data. In this new blog I will continue this analysis and present the results in a simple, yet hopefully effective, visual manner.

This first post builds on work that Joe Sheehan did a year ago looking at the run value of each pitch based on its location. He placed each pitch into one of 25 bins and calculated the average run value in each bin. In the post he suggested that it would be interesting to get rid of the bins and take a continuous approach. A year later and it seems no one has done that so I thought it would be a good way to start off this blog.

Using the first table in this post I assigned a run value to each pitch in the pitch fx database and then averaged the run value of all the pitches in each location. I split the data up by handedness of the pitcher and batter. The number in parentheses is the average run value for all pitches regardless of location. The images are from the catcher perspective with an average strike zone added.







































This method reproduces the some of Sheehan's results:
  • Pitches outside the strike zone have a higher run value than those inside the strike zone.
  • Pitches down the middle of the zone have the highest run value of pitches in the strike zone.
  • Inside pitches have higher run values than outside pitches.
  • Pitches down and in have higher run values than those that are up and in.

This continuous approach gives some additional insights beyond Sheehan's
  • Of outside pitches, those high in the zone have a slightly higher run value than those down in the zone. This is interesting, it seems hitters prefer inside pitches down in the zone, and outside pitches up in the zone.
  • The area of negative to zero to just slightly positive run value pitches (the red, yellow and green colored area) extends well beyond the defined strike zone.
  • This zone of negative to zero valued pitches extends far above the strike zone peaking at x=0 over a foot above the top of the strike zone.

Tango and Litchman made some important comments on the limitations of Sheehan's original work without splitting the data by swing/taken, or pitch type. These critiques equally, if not more so since I do not split the data by count as Sheehan did, to my above analysis.

I hope to address these critiques in future posts. For example, I assume the peak of negative to zero valued pitches a foot above the center of the zone is mostly the result of 'high heat' fastballs in pitcher's counts. By analyzing the run value of pitch locations for just fast balls in specific counts I will be able to confirm or deny this assumption.