August 23, 2008

# Adjusting averages to account for bowling strengths

A stats piece that calculates effective batting averages by determining the strength of the bowling attack faced

Some of you may recall the quotient of BQI to bowling average discussed in this post. Roughly speaking, the idea is to reward bowlers who take the wickets of better batsmen. In this post, I'll flip the idea round, and reward batsmen who score against better bowling attacks.
***
Firstly, a digression on Ananth's post. The quotient was defined by summing up the batting averages of the batsmen dismissed by a particular bowler, and then dividing by the bowler's regular average. This is, to my mind, a very useful stat, perhaps the best of its kind for its simplicity (you can of course make it better by making it more complicated in appropriate ways). The only problem is that the numbers you get don't correspond to numbers we're used to in following cricket. How good is a 1.2 bowler? A 0.9 bowler?

Happily, there's an interpretation of this stat that puts the numbers on a scale we're familiar with. It's equivalent to the usual average (runs conceded divided by number of wickets taken), with each wicket weighted by the average of the batsman dismissed. You can set a 'benchmark' average (its value is arbitrary), and I'll set it at 31.5. Dismissing a batsman who averages 31.5 is worth 1 wicket. Dimissing a batsman who averages 47.25 is worth 1.5 wickets. A quotient Q is then equivalent to an average of 31.5 / Q. So, a bowler with a quotient of 1.2 has an 'adjusted average' of 31.5 / 1.2 = 26.25. This is the sort of number we're used to thinking about with bowlers' averages.

I don't know who first came up with the idea of weighting wickets in this way – it was first suggested to me by a friend of mine. Probably various people over the years have thought of it.

Working in the reverse direction (adjusting batsmen's averages) is more difficult, since apart from the last few years, we don't know which bowlers each batsman faced. But we can make a first attempt, by taking the average of the bowlers' averages for each innings, weighting each by the number of overs that they bowled.

To take an example, suppose that in one innings, four bowlers were used:

Bowler A, career average 28, bowls 30 overs.
Bowler B, career average 30, bowls 30 overs.
Bowler C, career average 35, bowls 25 overs.
Bowler D, career average 40, bowls 20 overs.

The "average average" is then (28*30 + 30*30 + 35*25 + 40*20) / (30 + 30 + 25 + 20) = 32.52.

Each batsman's runs for this innings would be multiplied by 31.5 / 32.52 – they'll all be slightly decreased, because the attack is slightly weaker than our benchmark average of 31.5.

(Note: if a bowler never took a wicket, or has an average above 100, then I set that bowler's average at 100. This seems reasonable to me.)

We do this for all innings, and we get adjusted averages for all batsmen.

One useful feature of this method (for both batsmen and bowlers) is that it adjusts across changes in the relative strength of bat and ball (as well as rewarding players who do well against strong opposition). In an era where averages are high (such as today), bowlers are rewarded more for wickets and batsmen less for runs. For players in the low-scoring years before 1900, the reverse is true. Of course, it's possible that in a given era, runs are low because there happen to be a lot of good bowlers and not many good batsmen, and in this case the bowlers are unfairly punished (and batsmen unfairly rewarded). But to my mind the results are better than raw averages.

So onto the results. Qualification: 20 Test innings. Here's the top 20.

```name            inns  no  runs  avg   adj avg
DG Bradman      80    10  6996  99.9  90.4
GA Headley      40    4   2190  60.8  62.8
MEK Hussey      42    8   2325  68.4  59.4
CL Walcott      74    7   3798  56.7  58.3
ED Weekes       81    5   4455  58.6  55.9
FS Jackson      33    4   1415  48.8  55.4
JB Hobbs        102   7   5410  56.9  55.0
GS Sobers       160   21  8032  57.8  54.6
L Hutton        138   15  6971  56.7  53.8
H Sutcliffe     84    9   4555  60.7  53.6
AD Nourse       62    7   2960  53.8  53.4
KF Barrington   131   15  6806  58.7  52.6
GS Chappell     151   19  7110  53.9  52.3
GE Tyldesley    20    2   990   55.0  52.2
RG Pollock      41    4   2256  61.0  52.0
KS Ranjitsinhji 26    4   989   45.0  50.8
BC Lara         230   6   11912 53.2  50.4
J Ryder         32    5   1394  51.6  50.4
RT Ponting      197   26  9999  58.5  50.3
FMM Worrell     87    9   3860  49.5  49.5
AG Steel        20    3   600   35.3  49.0```

The modern-day greats are surprisingly low down. That their averages should be heavily reduced is not surprising, since the bat has been very dominant over the ball in the past few years. But they're still further down that I had expected. Perhaps there is some bias in the method, or perhaps we should pay more attention to Neil Harvey when he compares modern players to those of his day.

(There's another possibility worth thinking about, and that is a gradual increase in competitiveness of the sport, so that today there are fewer players on the high and low extremes and more players towards the middle. I don't know how big an effect this would be.)

Here is a list of players from recent years:

```name            inns  no  runs  avg   adj avg
MEK Hussey      42    8   2325  68.4  59.4
BC Lara         230   6   11912 53.2  50.4
RT Ponting      197   26  9999  58.5  50.3
KP Pietersen    80    3   3890  50.5  48.8
JH Kallis       207   32  9678  55.3  48.3
V Sehwag        100   4   5074  52.9  48.0
Moh'd Yousuf    134   12  6770  55.5  47.8
SR Tendulkar    244   25  11877 54.2  47.4
RS Dravid       214   26  10223 54.4  47.3
A Flower        112   19  4794  51.5  46.5
KC Sangakkara   125   9   6356  54.8  46.5```

Tendulkar's low position is a bit of a surprise. It's an anomaly that jars with most people's impressions. But remember that averages are not perfect indicators of a batsman's 'true talent' – there's some inherent uncertainty with them.