May 29, 2014

The calculus of the batting average

The batting average is a misleading indicator. How about a scale that is more representative of batting performance?

In the 2003-04 season, Sachin Tendulkar averaged 54.91 in 15 innings. But his score for the period is 28 © Getty Images

The batting average in cricket is the number of runs made by a player per dismissal. Not-outs inflate averages. To get a sense of the extent of this problem, compare the records of Brian Lara and Steve Waugh. The former had six not-outs in 232 Test innings. The latter had 46 in 260. Lara batted most frequently at No. 4, and had one not-out in 148 innings at that position. Waugh batted most frequently at No. 5, and had 22 not-outs in 142 innings there. This makes it basically meaningless to compare the batting averages of Waugh and Lara. Yet this is commonly done with Test batsmen.

© Kartikeya Date

Team success also affects the likelihood of not-outs. In wins, batsmen remain not out about 15% of the time. In defeats, this drops to about 9%. To some extent, this explains why Lara remained not out at a rate well below the average for a No. 4, while Waugh remained not out at a rate well above the average for No. 5. At the very least, this basic fact about the different rates for not-outs depending on batting position illustrates the limits of comparing an opener's batting average to that of a middle-order batsman. A look at batting in Test cricket by batting position shows that the problem is systematic.

The average also does not indicate how many runs a player is likely to score in a given innings. The median innings in Test cricket produces 13 runs. The median Test innings for Lara is 34, while for Waugh it is 26. As a descriptor of a batsman's record, the average is a limited measure.

I propose a different measure for a batsman's quality and consistency. This measure is in the form of a scale.

Consider the first 100 runs made by a batsman. The chart above shows the probability of a batsman (with at least 2000 career runs) reaching any score from 1 to 100. Two hundred and seventy-five batsmen have scored at least 2000 Test runs, from Dilip Sardesai with 2001 to Sachin Tendulkar with 15,921. These batsmen have played a combined 32,409 innings. They score a century nine times in 100 innings. They reach double figures 71 times out of 100. A batsman's batting score can be given simply by calculating the area under this curve.

© Kartikeya Date

Here is a simple example to illustrate the difference between the score and the batting average. In the 2003-04 season, Tendulkar made 659 runs at 54.91 in nine Tests and 15 innings. The distribution of scores over 15 innings was peculiar, though: 495 of the 659 runs came in three innings, each not out. Tendulkar's median score over those 15 innings was 8. In these 15 innings, Tendulkar reached 100 twice, 60 thrice, 37 six times, 8 eight times and 1 13 times. His score over those 15 innings is 28.

The score provides a more representative picture of Tendulkar's performance that season compared to his batting average. To get a sense of how low a score of 28 is, the batting average for the 275 batsmen who made at least 2000 Test runs is 41; their collective score is 34.

It is a matter of some surprise that this method of measurement has not been used yet in cricket. Even the basic idea of measuring runs per innings, instead of runs per dismissal, has gained little currency. The method described in this post could be extended by using a different upper limit, say 150 or 200, or even 400 (the current highest Test score). With an upper limit of 400 it would simply provide the runs per innings (total runs divided by total innings).

I think it is a bad idea to use an upper limit greater than 100 if the goal is to measure consistency of contributions. A single innings of 400 not out can only win or save one Test match, while eight innings of 50 might help your team compete in four or more Test matches. The series in which Lara made his 400 not out (series batting average 83.33), and the one in which he made his 375 (series batting average 99.75) illustrate this point. In the former, his score is 29, in the latter, it is 57. I would argue that the score captures Lara's performance in each series better than the batting average does.

In summary, I prefer the use of the score over the batting average, because the former accounts for events that occur frequently, while the latter is disproportionately affected by events that occur only rarely. As an illustration of the power of this measure, consider all the batsmen with a score of 41. There are 13 such players and they range from Wally Hammond (average 58.45) to Rohan Kanhai (average 47.53). Kanhai and Hammond were basically equally consistent in Test cricket.

Note: In the tables that follow, each score and average is rounded to the nearest integer.

© Kartikeya Date

© Kartikeya Date

Kartikeya Date writes at A Cricketing View and tweets here

Comments have now been closed for this article

  • John on May 31, 2014, 10:46 GMT

    After reading the numerous replies is this an attempt to get recently retired players into the top twelve of Test Match Batting Averages,something they have not attained through their own efforts in the middle.When consideration is given that 5 of the Top 12 had their careers interrupted by World Wars and 1 was banned from Test Match Cricket through no fault of his.If there is to be a change in the system will all Test Match batting averages from 1877 be deleted and we all start again under another system which will place batsmen in some sort of order which would have to factor in Batting Position,Opposition Bowlers,Pitch Conditions etc.

  • K. on May 31, 2014, 10:05 GMT

    Another factor is that most very big scores ( Triples and 250+) are scored by batsmen batting in positions 1-3. Or otherwise later order batsmen coming in early. This is because these scores require flat pitches . Once a huge score is on the board - the batsmen following then do not have the luxury of posting their own huge scores or get out trying to score quickly. So though the lower order batsmen may remain not out more than the batsmen in Positions 1-3, they lose out on the chance of posting huge scores in good batting conditions.

  • K. on May 31, 2014, 9:37 GMT

    @Hammond- Bradman would stand out any which way. One of a kind. We are in general dealing with the "rest".

  • Geoffrey on May 31, 2014, 7:00 GMT

    @CricFan24- it does actually, I think I read Bradmans career average dropped briefly down to 70 after bodyline, and then went back up to over 100 before dropping down to 99.94 over a 20 year career. But of course, Sachins average of consistently just over 50 still is superior.

  • K. on May 31, 2014, 4:45 GMT

    So - Tendulkar avg. 50+ for around 20 years. Kallis for 11 years. Lara for around 12, given 2 or 3 years sub 50 after 1992..

    Effectively Tendulkar was a Great batsman for almost TWICE as long- and much of it in the tougher period before the mid 2000s.

    Kind of puts things in perspective.

  • K. on May 31, 2014, 4:45 GMT

    To borrow from another blog- when you look at the amount of time spent in International cricket with an average of 50+ this is what you get: 1) Tendulkar- first averaged 50 in his 29 th match in Jan 1994. Thereafter except for a short while in the "49 point somethings" in 1996 he averaged in the 50s till retirement . 2) Kallis first hit an avg. of 50 in his 63rd match in Nov 2002.( At the time Tendulkar was rated 2nd best batsman of all time behind the Don by Wisden ) He too had a short dip into the "49 point somethings" thereafter. But for the most part Kallis maintained an avg. of 50+ till retirement. 3)Lara hit an avg. of 50 in just his 5th match thanks to his 277. Lara then had a couple of dips below 50 in the 1990s and 2000s.

  • K. on May 31, 2014, 4:42 GMT

    @harshthakor- This would make the batsman in a weaker team automatically appear better. And would hence then depend more on the other batsmen. Also ,this thing about a good batsman in a weak team resulting in poorer stats for the good batsman is a complete myth. SRT does better through the 1990s when India were generaly a pathetic team. Lara's best consecutive years statistically are from 2003-07 when Windies were at their worst. Chanderpaul does much better after Lara retired etc.

  • K. on May 31, 2014, 4:39 GMT

    contd.. So- the central problem is not really the calculation of the average. It is the use of average to the exclusion of almost all other factors. As someone mentioned- one of the ways to normalize this would be to use a peer adjusted ratio for the duration of a batsman's career. This would normalize to a large extent pitches, batting conditions, bowling attacks etc.

  • K. on May 31, 2014, 4:39 GMT

    Kartikeya- I agree that big scores completely distort the "average". Infact SRTs 2003-07 injury ridden horror patch was only made to look half decent due to a few big scores and runs against Ban. Many people have pointed this out. However, the N.O issue is for the most part correct. We cannot "complete" an innings ( as RPI and other measures do )when a batsman is actually not out . As mentioned the moment a top batsman gets past around 25 a big score is the norm. The problem is that the average is used almost exclusively to measure "quality"- as the headline of your article insinuates. This again warps the entire issue. For eg.something as minor as a better timed retirement date would then change the "quality" of a batsman/career. If SRT retired after the 2011 WC with an average of 57 , instead of hanging around- would he have been considered a better "quality" batsman.


  • Ganesh on May 31, 2014, 4:25 GMT

    Cricket is a team sport. Any batting innings that contributes in a major way to a team's victory is the one that matters. For example, even a crucial 30 scored in a second innings (lets say on a crumbling final day pitch) in a low scoring encounter makes it so valuable to the team. On the other hand, even a hundred in the manner it's scored can lose a game for your team. Yes, I agree that it's meaningless to compare the batting averages as it does not tell the whole story. Neither do 'the score' comparisons matter much in the same regard.