Methodology
The Whiffs methodology is my attempt at implementing ideas from Tom Tango on how to best evaluate projection systems.
Here are a few important posts I'm drawing from:
- Forecast Evaluations - Tom Tango responds to Nate Silver (remember when he was just a baseball nerd?)
- Testing the 2007-2010 Forecasting Systems: Official Results - Tom Tango's results from testing projection systems using his recommended methodology.
- Who's evaluating the 2011 forecasts this year?
Based on these posts, Whiffs uses four key principles:
Separate playing time from rate stats
Consider a player projected for 300 PA and 30 HR who actually gets 600 PA with 30 HR. Was that a good HR projection?
If you only look at raw HR totals, the projection appears perfect. But it actually got lucky—vastly underestimating playing time while overestimating power, with the errors canceling out.
Whiffs evaluates playing time separately from rate stats to prevent this misleading assessment.
Playing time is measured by plate appearances for batters and batters faced for pitchers. All other stats are evaluated as rates using the most appropriate denominator for each type of statistic.
For example, strikeouts and walks are evaluated per plate appearance or batter faced, while batted ball outcomes are evaluated per ball in play.
This approach aims to evaluate each stat within its most meaningful context.
Adjust for league average
Here's an example based on one from Tom Tango:
One system projects .340 wOBA for a player in a .330 league. Another projects .330 wOBA in a .330 league. The player ends up with .338 wOBA in a .338 league. Which projection was more accurate?
The player turned out to be league average, so the second projection is more useful to us, even though it was further from the raw number.
Whiffs subtracts a projection's league average of a rate stat before calculating error.
In this example, the projected league-adjusted wOBAs would be +.010 and .000, respectively. The second projection better predicted the actual league-adjusted wOBA (also .000).
Weight by playing time
A projection error for a 100 PA player matters less than the same error for a 600 PA player. Projection systems shouldn't be penalized equally for mistakes on marginal versus regular players.
Whiffs weights each error by the player's actual PA or BF to reflect real-world impact.
Include unprojected players at league average
If you only include players projected by every system, you're really limiting yourself to the easiest players to project. We want to reward systems that attempt harder projections.
If a player is missing from a projection system, Whiffs assigns them 1 PA/BF for assessing playing time and league average for rate stats.