August 23, 2008

Adjusting averages to account for bowling strengths

A stats piece that calculates effective batting averages by determining the strength of the bowling attack faced

Some of you may recall the quotient of BQI to bowling average discussed in this post. Roughly speaking, the idea is to reward bowlers who take the wickets of better batsmen. In this post, I'll flip the idea round, and reward batsmen who score against better bowling attacks.
Firstly, a digression on Ananth's post. The quotient was defined by summing up the batting averages of the batsmen dismissed by a particular bowler, and then dividing by the bowler's regular average. This is, to my mind, a very useful stat, perhaps the best of its kind for its simplicity (you can of course make it better by making it more complicated in appropriate ways). The only problem is that the numbers you get don't correspond to numbers we're used to in following cricket. How good is a 1.2 bowler? A 0.9 bowler?

Happily, there's an interpretation of this stat that puts the numbers on a scale we're familiar with. It's equivalent to the usual average (runs conceded divided by number of wickets taken), with each wicket weighted by the average of the batsman dismissed. You can set a 'benchmark' average (its value is arbitrary), and I'll set it at 31.5. Dismissing a batsman who averages 31.5 is worth 1 wicket. Dimissing a batsman who averages 47.25 is worth 1.5 wickets. A quotient Q is then equivalent to an average of 31.5 / Q. So, a bowler with a quotient of 1.2 has an 'adjusted average' of 31.5 / 1.2 = 26.25. This is the sort of number we're used to thinking about with bowlers' averages.

I don't know who first came up with the idea of weighting wickets in this way – it was first suggested to me by a friend of mine. Probably various people over the years have thought of it.

Working in the reverse direction (adjusting batsmen's averages) is more difficult, since apart from the last few years, we don't know which bowlers each batsman faced. But we can make a first attempt, by taking the average of the bowlers' averages for each innings, weighting each by the number of overs that they bowled.

To take an example, suppose that in one innings, four bowlers were used:

Bowler A, career average 28, bowls 30 overs.
Bowler B, career average 30, bowls 30 overs.
Bowler C, career average 35, bowls 25 overs.
Bowler D, career average 40, bowls 20 overs.

The "average average" is then (28*30 + 30*30 + 35*25 + 40*20) / (30 + 30 + 25 + 20) = 32.52.

Each batsman's runs for this innings would be multiplied by 31.5 / 32.52 – they'll all be slightly decreased, because the attack is slightly weaker than our benchmark average of 31.5.

(Note: if a bowler never took a wicket, or has an average above 100, then I set that bowler's average at 100. This seems reasonable to me.)

We do this for all innings, and we get adjusted averages for all batsmen.

One useful feature of this method (for both batsmen and bowlers) is that it adjusts across changes in the relative strength of bat and ball (as well as rewarding players who do well against strong opposition). In an era where averages are high (such as today), bowlers are rewarded more for wickets and batsmen less for runs. For players in the low-scoring years before 1900, the reverse is true. Of course, it's possible that in a given era, runs are low because there happen to be a lot of good bowlers and not many good batsmen, and in this case the bowlers are unfairly punished (and batsmen unfairly rewarded). But to my mind the results are better than raw averages.

So onto the results. Qualification: 20 Test innings. Here's the top 20.

name            inns  no  runs  avg   adj avg
DG Bradman      80    10  6996  99.9  90.4
GA Headley      40    4   2190  60.8  62.8
MEK Hussey      42    8   2325  68.4  59.4
CL Walcott      74    7   3798  56.7  58.3
ED Weekes       81    5   4455  58.6  55.9
FS Jackson      33    4   1415  48.8  55.4
JB Hobbs        102   7   5410  56.9  55.0
GS Sobers       160   21  8032  57.8  54.6
L Hutton        138   15  6971  56.7  53.8
H Sutcliffe     84    9   4555  60.7  53.6
AD Nourse       62    7   2960  53.8  53.4
KF Barrington   131   15  6806  58.7  52.6
GS Chappell     151   19  7110  53.9  52.3
GE Tyldesley    20    2   990   55.0  52.2
RG Pollock      41    4   2256  61.0  52.0
KS Ranjitsinhji 26    4   989   45.0  50.8
BC Lara         230   6   11912 53.2  50.4
J Ryder         32    5   1394  51.6  50.4
RT Ponting      197   26  9999  58.5  50.3
FMM Worrell     87    9   3860  49.5  49.5
AG Steel        20    3   600   35.3  49.0

The modern-day greats are surprisingly low down. That their averages should be heavily reduced is not surprising, since the bat has been very dominant over the ball in the past few years. But they're still further down that I had expected. Perhaps there is some bias in the method, or perhaps we should pay more attention to Neil Harvey when he compares modern players to those of his day.

(There's another possibility worth thinking about, and that is a gradual increase in competitiveness of the sport, so that today there are fewer players on the high and low extremes and more players towards the middle. I don't know how big an effect this would be.)

Here is a list of players from recent years:

name            inns  no  runs  avg   adj avg
MEK Hussey      42    8   2325  68.4  59.4
BC Lara         230   6   11912 53.2  50.4
RT Ponting      197   26  9999  58.5  50.3
KP Pietersen    80    3   3890  50.5  48.8
JH Kallis       207   32  9678  55.3  48.3
V Sehwag        100   4   5074  52.9  48.0
Moh'd Yousuf    134   12  6770  55.5  47.8
SR Tendulkar    244   25  11877 54.2  47.4
RS Dravid       214   26  10223 54.4  47.3
A Flower        112   19  4794  51.5  46.5
KC Sangakkara   125   9   6356  54.8  46.5

Tendulkar's low position is a bit of a surprise. It's an anomaly that jars with most people's impressions. But remember that averages are not perfect indicators of a batsman's 'true talent' – there's some inherent uncertainty with them.

A full list of batsmen (with an adjusted average of at least 25), click here.

Some further comments:

- Opening batsmen face the opening bowlers disproportionately often, and this isn't taken into account.

- The conditions or characteristics of the batsmen on a given day can change the effectiveness of the bowlers, and the captain would use his bowlers accordingly. So the simple weighting by career average is not a perfect reflection of the overall skill of the attack. But in the long run the above method should be pretty close.

- There's no allowance for ground or pitch conditions, etc.

- I've ignored not-outs. This is worthy of a post of its own, but not-outs don't affect averages much in Test cricket.

- I've used career averages of the bowlers, mainly because it's easy to do. Career-to-date averages can be unstable. It would be reasonable to add a correction factor for the experience of each bowler. But while I've done a small amount of work in this area, I don't have enough results for it to be usable.

Comments have now been closed for this article

  • testli5504537 on September 8, 2008, 15:38 GMT

    Lara scores some 25 more runs on the average than Sachin in all 99+ scores if * is excluded.He scores at 10 more str: rate on avg: in these inns too.All agreed.But I have 1 point which by instincts prompts me to place Lara a touch below Sachin.Lara didn't have a 100 to show vs Akr-Wqr,Don-Pol,Kum-'x'in India.And these 5 bowlers along with Mc-gra-Warne,Amb-Wal,Murali-Vas to some extent formed the bowling force in the era when these 2 bats played.Others such as Lee in his start days, Gillespie,Srinath,Whitney,Zoysa,Saqlain,Raju etc can only be classified as good bowlers to the utmost extent.So to me it is a very vital factor to have as much as big knocks against this bowling force.Here Sachin far outweighs Lara.Only when Akram,Waqar& Donald retired did Lara start scoring heavily against their respective nations.Sachin due to his better technique had more varied 100s against this bowling force.Lara used to give up when going was tough at both ends.Sachin found ways to overcome .

  • testli5504537 on September 3, 2008, 6:10 GMT

    @Barry "not-outs don't affect averages much in Test cricket." Hmm that's a very subjective statment. I don't see any valid reason how you can presume something like that. Can we get an analysis on that? afterall this is statistical analysis :-) You can probably compare the positions of modern stalwarts like Lara, Sachin, Richards and Ponting after considering the not outs.

  • testli5504537 on September 2, 2008, 19:18 GMT

    Dear "John", Your spelling and grammar both suggest that English is not your first language, so stop pretending. Btw, no comments on why Sehwag is ranked higher than Tendulkar?

  • testli5504537 on September 2, 2008, 15:29 GMT

    Scoring most of your runs at home carries less weight than scoring runs away from home tendulkar averages more than all the modern players away from home. Hussey? what class bowlers has he scored against? Half an english attack? Pitch conditions have to be taken into account in swing conducing cnditions tendulkar averages more than lara. Who has a higher average against the worlds best team? What does tendulkar scoring hundreds have to do with what the rest of the team does? If anything it should carry more weight. Lara is great no doubt but as an english man ive seen lara do nothing against the top teams until the series is lost then he pulls out a great pointless innings.Lara was no way near as consistant as tendulkar barring the last couple of seasons, he had some awesome series and alot of really poor ones. And alot of those so called winning hundreds came when he had ambrose walsh and co tendulkar had srinath and prasad. And read the stats of sachin anyone negative

  • testli5504537 on September 1, 2008, 23:24 GMT

    Alim, no no, the title's not misleading. My bullet point near the end of the main post is misleading. I should probably work out how to edit it. In the meantime, there's only us three still here, so the damage is done now.

    If a batsman is left not out, I treat it as a not-out. That is, I calculate the average by runs / dismissals, just as is usually done.

    Now, a lot of people think that averages are inflated by not-outs (ie, a batsman with a lot of not-outs has an average that is too high). In Test cricket, that is generally false. A batsman with a lot of not-outs has an average about what it should be, just like a batsman with few not-outs.

    Not-outs *do* have some effect on averages, since we don't know how many more runs they would have scored if they'd been able to keep batting until dismissed. But this effect is small.

    I have ignored this effect of not-outs on averages.

  • testli5504537 on September 1, 2008, 20:24 GMT

    David, old chap, you might be on to something here. Why don't you try it out to see what individual players' highest scores REALLY would be, against middle-of-the-road bowling? (Sobers didn't have much of a change, but perhaps other batsmen may be humbled.)

  • testli5504537 on September 1, 2008, 13:57 GMT

    Dear David, Regret the umpteenth post. But I think you should perhaps change the title of the blog from "...averages..." to "...rpis (or some such)...".the current title is extremely misleading. Talk about reading the fine print.

    Also things seem to be getting a bit complex for simple cricket fans like me. Just a few points: 1) So "ignoring not outs" apparently means that if a batsman is actually not out you have taken it as out?! 2) Again this is effectively penalizing the batsman. So the good old fashioned attritional test match innings wherein staying at the crease is paramount, say in saving a test match or avoiding a collapse may not count for as much as a little cameo or blitz. 3) These hard test match innings may involve scoring say just the few runs inspite of spending literally hours at the crease. So the batsman would fare better off in the above table if he instead just came in and simply wacked it around for a short while. What?

  • testli5504537 on September 1, 2008, 9:49 GMT

    Riverlime, Hayden's 380 got adjusted down to a 267 (using the 31.5 scale). The highest individual adjusted scores: Sobers 365* --> 353* Hanif Mohammad 337 --> 339 Lara 375 --> 337 Sehwag 319 --> 335 Foster 287 --> 323 Jayawardene 374 --> 315 Bradman 334 --> 306 Rowe 302 --> 299 Ponsford 266 --> 297 Gooch 333 --> 294

  • testli5504537 on September 1, 2008, 6:44 GMT

    Alim (and Anonymous), I apologise for the poor wording of my post. I meant that I ignored any effect of not-outs - I didn't throw those innings away. I just calculated the average in the usual way, runs (or adjusted runs) / dismissals.

    If batsmen were able to complete their not-out innings (ie, keep batting until dismissed), their averages would, on average, go up a bit. But the difference is small, and not worth bothering about.

  • testli5504537 on September 1, 2008, 6:30 GMT

    @david, the anonymous person above has a point.never mind the "superior batting attacks",if you "ignore" not outs the batsmen with greater not outs naturally suffer more.regardless of superior,or inferior or whatever bowling attacks . Basic Math. @riverlime. thank you for the applause.but you seem to forget that a "counterpoint" was being offered to the "point" raised firstly by you folks. such as the incessant harping about rpis. naturally the batsmen with the lower not outs real avg will be closer to his rpi. a batsman with no not outs will have the same avg as the mentioned earlier...elementary math.

  • No featured comments at the moment.