# The new, improved batting average

The batting average is a simple and convenient way of putting a number to a player’s ability with the bat, but often it doesn’t give the entire picture. One major problem with the conventional average – which is calculated by dividing the total number of runs scored by the number of completed innings – is the way it deals with not-outs. Consider the stats for two of the greatest batsmen in the modern era:

Batsman | Tests | Innings | Not-outs | Runs | Average | Runs per Test |
---|---|---|---|---|---|---|

Brian Lara | 131 | 232 | 6 | 11,953 | 52.89 | 91.2 |

Sachin Tendulkar | 141 | 228 | 24 | 11,207 | 54.94 | 79.5 |

Lara has scored nearly 750 more runs in ten fewer Tests than Tendulkar. His runs per Test is nearly 12 runs more than Tendulkar's. However his average is nearly two runs behind Tendulkar, primarily because of the number of not-outs that Tendulkar has had. It might be partly because of the way Lara played, almost always in an attacking mode. Possibly also because Tendulkar, with an average Batting Position Index, which is the average batting position at which a batsman has batted in, of 4.30 as against Lara's figure of 3.78, probably has a slightly higher chance of remaining not out.

I’ve developed a new measure, which I’ve named the extended batting average, that offers a solution to the problem created by the not-outs in the batting average. It is determined by allowing a batsman to complete his not-out innings in the fourth dimension, so to say, and then by dividing the new total of runs (current aggregate plus the additional runs deemed to have been scored) by the total number of innings played. This will be a fair measure of the batting average of batsmen.

The extension of an innings is done in a logical manner taking into account the batsman's form at the time he played the not-out innings. During the first 10 innings of his career, when an insufficient number of innings have been played to have a handle on his form, his not-out innings will be extended by his OBA (Out Bat Average, derived by dividing total number of runs in completed innings by the number of completed innings).

Afterwards, recent form takes over. The not-out innings is extended by a rolling innings average of his last 10 played innings. In this case even the not-outs are included so that a big not-out innings, indicating very good current form, is not ignored. Of course, a batsman might remain not-out on 10 and this will lower his recent form computation. However, that is more acceptable than ignoring an unbeaten 200.

Two examples illustrate this concept. Kumar Sangakkara, in the greatest form currently, has scored 984 runs in his last 10 innings at an innings average of 98.4. If he remains not out with, say, 32 in the next innings, it is fair to assume that he would extend his innings by another 98 runs, to 130, considering his outstanding form. A similar situation exists with Mohammad Yousuf and Kallis.

On the other hand, Sehwag is in the most wretched form of his career, having scored 189 runs in his last 10 innings with an innings average of 18.9. It is reasonable to expect that if he remained not out at 32, his innings will be extended by only 19 runs, to 51.

This is applied to each and every innings played by all the batsmen. Care is taken to ensure that the adjusted innings total does not exceed the batsman’s highest score. In other words, Lara's 375 will not be allowed to go past 400. However if the highest score by a batsman is a not-out innings, for example Lara's 400 not out and Tendulkar's unbeaten 248, that specific innings will be allowed to be extended. This, I think, is common sense.

Now the new total aggregate of runs is divided, this time with justification, by the total number of innings played.

Since this is a clear "what if", imagination-driven computation, practical factors such as the match getting over, the innings getting over, or a batsman running out of partners etc are ignored.

This is no mean task and there is no way can this be done manually since the "current form" computation has to be done for each and every innings played by a batsman.

The table for the top 25 batsmen (criterion 1500 Test runs), in order of extended batting average, is shown below. These are current up to the Delhi between India and Pakistan.

Batsman | Tests | Innings | Not-outs | Runs | Average |
---|---|---|---|---|---|

Don Bradman | 52 | 80 | 10 | 6996 | 99.94 |

Michael Hussey | 18 | 29 | 7 | 1896 | 86.16 |

George Headley | 22 | 40 | 4 | 2190 | 60.83 |

Herbert Sutcliffe | 54 | 84 | 9 | 4555 | 60.73 |

Graeme Pollock | 23 | 41 | 4 | 2256 | 60.97 |

Everton Weekes | 48 | 81 | 5 | 4455 | 58.62 |

Ricky Ponting | 112 | 186 | 26 | 9504 | 59.40 |

Wally Hammond | 85 | 140 | 16 | 7249 | 58.46 |

Garry Sobers | 93 | 160 | 21 | 8032 | 57.78 |

Ken Barrington | 82 | 131 | 15 | 6806 | 58.67 |

Eddie Paynter | 20 | 31 | 5 | 1540 | 59.23 |

Jack Hobbs | 61 | 102 | 7 | 5410 | 56.95 |

Jacques Kallis | 111 | 189 | 31 | 9197 | 58.21 |

Len Hutton | 79 | 138 | 15 | 6971 | 56.67 |

Kumar Sangakkara | 68 | 112 | 9 | 5741 | 55.74 |

Clyde Walcott | 44 | 74 | 7 | 3798 | 56.69 |

Rahul Dravid | 113 | 193 | 23 | 9564 | 56.26 |

Mohammad Yousuf | 77 | 130 | 10 | 6686 | 55.72 |

Sachin Tendulkar | 141 | 228 | 24 | 11,207 | 54.94 |

Dudley Nourse | 34 | 62 | 7 | 2960 | 53.82 |

Brian Lara | 131 | 232 | 6 | 11,953 | 52.89 |

Kevin Pietersen | 30 | 57 | 2 | 2898 | 52.69 |

Greg Chappell | 87 | 151 | 19 | 7110 | 53.86 |

Matthew Hayden | 91 | 162 | 13 | 7833 | 52.57 |

Javed Miandad | 124 | 189 | 21 | 8832 | 52.57 |

Now let’s apply the adjustments related to not-out innings, and then have a relook at the averages.

Batsman | ORuns | NRuns | ARuns | TRuns | EBA | % of ave | Last 10 inngs |
---|---|---|---|---|---|---|---|

Don Bradman | 5868 | 1128 | 829 | 7825 | 97.81 | 97.87 | 565 |

Michael Hussey | 1519 | 377 | 463 | 2359 | 81.34 | 94.39 | 757 |

George Headley | 1642 | 548 | 263 | 2453 | 61.33 | 100.81 | 389 |

Herbert Sutcliffe | 4098 | 457 | 530 | 5085 | 60.54 | 99.67 | 406 |

Graeme Pollock | 2014 | 242 | 191 | 2447 | 59.68 | 97.88 | 677 |

Everton Weekes | 4171 | 284 | 286 | 4741 | 58.53 | 99.85 | 455 |

Ricky Ponting | 7913 | 1591 | 1381 | 10,885 | 58.52 | 98.52 | 520 |

Wally Hammond | 5728 | 1521 | 931 | 8180 | 58.43 | 99.95 | 256 |

Garry Sobers | 6124 | 1908 | 1273 | 9305 | 58.16 | 100.64 | 406 |

Ken Barrington | 5843 | 963 | 807 | 7613 | 58.11 | 99.05 | 315 |

Eddie Paynter | 1256 | 284 | 249 | 1789 | 57.71 | 97.43 | 511 |

Jack Hobbs | 5067 | 343 | 355 | 5765 | 56.52 | 99.25 | 353 |

Jacques Kallis | 6703 | 2494 | 1468 | 10,665 | 56.43 | 96.94 | 937 |

Len Hutton | 5890 | 1081 | 813 | 7784 | 56.41 | 99.53 | 270 |

Kumar Sangakkara | 4754 | 987 | 560 | 6301 | 56.26 | 100.93 | 984 |

Clyde Walcott | 3419 | 379 | 356 | 4154 | 56.14 | 99.03 | 493 |

Rahul Dravid | 8092 | 1472 | 1156 | 10,720 | 55.54 | 98.73 | 329 |

Mohammad Yousuf | 5861 | 825 | 500 | 7186 | 55.28 | 99.21 | 510 |

Sachin Tendulkar | 9044 | 2163 | 1082 | 12,289 | 53.90 | 98.11 | 438 |

Dudley Nourse | 2612 | 348 | 351 | 3311 | 53.40 | 99.23 | 393 |

Brian Lara | 11,245 | 708 | 337 | 12,290 | 52.97 | 100.16 | 634 |

Kevin Pietersen | 2774 | 124 | 114 | 3012 | 52.84 | 100.29 | 450 |

Greg Chappell | 5883 | 1227 | 862 | 7972 | 52.79 | 98.02 | 478 |

Matthew Hayden | 7329 | 504 | 672 | 8505 | 52.50 | 99.87 | 448 |

Javed Miandad | 7051 | 1781 | 925 | 9757 | 51.62 | 98.20 | 263 |

"ORuns" are the Runs scored in the innings in which the batsman was dismissed. "NRuns" are the runs scored in the not-out innings. "ARuns" are the runs added to the not-out innings by extending these. "TRuns" are the new total runs, obtained by adding the runs in the previous three columns. "EBA" is the extended batting average, computed by dividing TRuns by the total number of innings played.

**A few observations**

In general the EBA benefits the batsmen with lower number of not-outs. Only five batsmen in this group, Headley, Sobers, Sangakkara, Lara and Pietersen, have benefited by the extended batting average, though in most cases the increase is marginal. Sangakkara has benefited quite considerably because of his recent form. The other batsmen have their extended batting averages lower than their normal batting averages by upto 5%. Hussey has lost the most, which is understandable since he has seven not-outs in the 29 innings he has played. Similarly Kallis has lost, which is explained by the fact that he has remained not out a whopping 31 times. However note Kallis' recent form.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

Comments have now been closed for this article

king of kings

Forget the runs per test and calculate runs per innings... Lara is 51.52 and Tendulkar is 49.15. Given that lara bats at 3 and tendulkar at 4, the opportunity of scoring lara gets is more than tendulkar - especially if it is the last innings of a test. So there is hardly any difference between the two....

Hi,

Congrats on the lovely blog. Geeky and nerdy. However, one complaint. Can you guys do the right thing and enable full feeds instead of stupid partial feeds?

These stats do not tell how many times sachin was the beneficiary of the lax umpiring.117 (vs WIN),109(vs SL),94(vs PAK) &50(vs NZ) are only some of the innings which should have been restricted to only single digits but weren't.Also he has always played to remain not out. Remember his comments after he scored 241* against Oz.

Interesting read. One suggestion- I think extending the N.O. innings BY the current form average overestimates the potential number of runs scored. A better measure would be to extend the inning TILL the current form average. For example, if the batsman's current form indicates 100 and batsman has scored only 10 when his inning was terminated then he should get 90 more runs NOT 100 more. In case, if the batsman has already scored more than his current form average then his incomplete inning should not get any more runs. I think this system is "fairer". However, I suspect that doing this will probably align the EBA closely with BA itself.

Interesting read. One suggestion- I think extending the N.O. innings BY the current form average overestimates the potential number of runs scored. A better measure would be to extend the inning TILL the current form average. For example, if the batsman's current form indicates 100 and batsman has scored only 10 when his inning was terminated then he should get 90 more runs NOT 100 more. In case, if the batsman has already scored more than his current form average then his incomplete inning should not get any more runs. I think this system is "fairer". However, I suspect that doing this will probably align the EBA closely with BA itself.

For proper alignment of averages of batsmen, a better suggestion would be to adjust their averages by excluding all innings V the rank minnows - Zimbabwe and Bangladesh. Now Tendulkar has played these two quite a lot as compared to Ponting, and I believe after excluding these puny bowling-attack oppositions, Ponting's average is still near a staggering 58, while Tendulkar's is around 52 - the gulf in their averages increases further.

Why question a "not out" if the batsman in question has not deliberately played out for a "not out" innings near the end - for example an accomplished Number 6 batsman not only returns not-out disproportionately more than any of the top order, but also in the process of his many innings has to sacrifice runs in order to shield the tail, even playing unorthodox at times to farm the strike. Imran's Khan's extra-ordinary batting average of 62 playing at Number 6 for Pakistan, happens to be n all-time best for all Test batsmen in history at that position, yet he returned not-out quite a few times, although hardly anyone can recall a single "selfish" innings during which he returned "not out" due to a conscious effort - he was left to shepherd the tail and the lower-order ever so often as a matter of fact.

kudos for all the work put in, if you did it to stimulate discussion - it succeeded. Beyond that it's all for nothing. The present method is just fine. Nobody seems disturbed by it. If someone like a Lara were to feel slighted there is a simple solution: bat lower down in the order and increase one's chances of NOs. And for crying out loud, how in the world can you compare Bradman's averages when he did not have Bangladesh or Zimbabwe to beat up, like our present guys do.

Your system, though, may have some merit in setting a 'new' standard for ODIs, and even more in 20-20s. The latter is a new concept and averages can easily be adjusted to reflect your system, if chosen. It truly is worth a try as it makes a lot of sense. For example a Dhoni or a Hussey have a great advantage in this form of cricket coming in so late in the batting order.

We can add more confusion to the calculations. Why should each not out be extended by the average of the final 10 innings. Each not-out ought to be extended by the average of it's previous 5/10 out innings. Then we could check if the batsman had enough partners remaining and multiply the additional runs by a factor between 0.3 and 0.7 depending on how many batsmen were left. This can go on and on.

Even if Jacques Kallis had an average of 80 and Viv Richards had an average of 40 after doing all this - who would we go to watch?

king of kings

Forget the runs per test and calculate runs per innings... Lara is 51.52 and Tendulkar is 49.15. Given that lara bats at 3 and tendulkar at 4, the opportunity of scoring lara gets is more than tendulkar - especially if it is the last innings of a test. So there is hardly any difference between the two....

Hi,

Congrats on the lovely blog. Geeky and nerdy. However, one complaint. Can you guys do the right thing and enable full feeds instead of stupid partial feeds?

These stats do not tell how many times sachin was the beneficiary of the lax umpiring.117 (vs WIN),109(vs SL),94(vs PAK) &50(vs NZ) are only some of the innings which should have been restricted to only single digits but weren't.Also he has always played to remain not out. Remember his comments after he scored 241* against Oz.

Interesting read. One suggestion- I think extending the N.O. innings BY the current form average overestimates the potential number of runs scored. A better measure would be to extend the inning TILL the current form average. For example, if the batsman's current form indicates 100 and batsman has scored only 10 when his inning was terminated then he should get 90 more runs NOT 100 more. In case, if the batsman has already scored more than his current form average then his incomplete inning should not get any more runs. I think this system is "fairer". However, I suspect that doing this will probably align the EBA closely with BA itself.

For proper alignment of averages of batsmen, a better suggestion would be to adjust their averages by excluding all innings V the rank minnows - Zimbabwe and Bangladesh. Now Tendulkar has played these two quite a lot as compared to Ponting, and I believe after excluding these puny bowling-attack oppositions, Ponting's average is still near a staggering 58, while Tendulkar's is around 52 - the gulf in their averages increases further.

Why question a "not out" if the batsman in question has not deliberately played out for a "not out" innings near the end - for example an accomplished Number 6 batsman not only returns not-out disproportionately more than any of the top order, but also in the process of his many innings has to sacrifice runs in order to shield the tail, even playing unorthodox at times to farm the strike. Imran's Khan's extra-ordinary batting average of 62 playing at Number 6 for Pakistan, happens to be n all-time best for all Test batsmen in history at that position, yet he returned not-out quite a few times, although hardly anyone can recall a single "selfish" innings during which he returned "not out" due to a conscious effort - he was left to shepherd the tail and the lower-order ever so often as a matter of fact.

kudos for all the work put in, if you did it to stimulate discussion - it succeeded. Beyond that it's all for nothing. The present method is just fine. Nobody seems disturbed by it. If someone like a Lara were to feel slighted there is a simple solution: bat lower down in the order and increase one's chances of NOs. And for crying out loud, how in the world can you compare Bradman's averages when he did not have Bangladesh or Zimbabwe to beat up, like our present guys do.

Your system, though, may have some merit in setting a 'new' standard for ODIs, and even more in 20-20s. The latter is a new concept and averages can easily be adjusted to reflect your system, if chosen. It truly is worth a try as it makes a lot of sense. For example a Dhoni or a Hussey have a great advantage in this form of cricket coming in so late in the batting order.

We can add more confusion to the calculations. Why should each not out be extended by the average of the final 10 innings. Each not-out ought to be extended by the average of it's previous 5/10 out innings. Then we could check if the batsman had enough partners remaining and multiply the additional runs by a factor between 0.3 and 0.7 depending on how many batsmen were left. This can go on and on.

Even if Jacques Kallis had an average of 80 and Viv Richards had an average of 40 after doing all this - who would we go to watch?

you guys are crazy, crazy and crazy.

please forget the stats and start watching cricket !

As others have mentioned, all this extra work doesn't really change the basic order that the traditional average gives.

One area that I think is much underrated in strike. Kallis, Dravid and Ponting have broadly similar averages and aggregates. However, Ponting scores about 40% faster than the other two.

To me, this is the decisive factor in any comparison where averages are broadly the same.

What does all this add upto? There are no significant changes in averages or does this prove that a batsman with an average of 55 is greater than the one with of 54? It just goes to show our excessive obsession with statistics.

Hemant...can you please give a break up of the other 31 centuries scored by Sachin? And if you had looked at Sachin's average after including the "not outs" his average still remains above Lara and there is not too much deviation between the traditional method and the method used by the poster of the blog. Obviously you did not understand the concept of the post and gave one of the most inane comment posted on the blog. Ananth - While the method proposed by you is quite novel, however the Deviation of the batsman average from his traditional average is not significant. And even if you look at your table Lara's average is less than Sachin's average after taking into account the number of "not outs". Hence it seems the traditional method is objective enough to overlook the effect of the "not outs".

irrespective of people liking this different view or not,I would say it gives a good read to number nerds like me. keep up thinking in such interesting way, but please don t claim it as a better wayu than the existing system.

What a totally pointless discussion. Just watch and enjoy the game rather than picking nits about the second decimal places in a worthless pile of statistics.

I liked the different way, although definitely not the better way, to look at the batsmen's batting career. Irrespective of what people say, please carry on generating interesting ideas like this - but please don't claim this is a better indicator than the existing method.

i wonder if we could also factor in the bowling attack in the calculation of the innings by taking the bowling avarages for the 3 most used bowlers in that innings,adding their averages,dividing by 100,dividing 1 by that number and multiplying it to that innings,eg-if a batsman scores 60,the 3 most used bowlers have bowling averages of 20,25,30,their total will be 75,divided by 100=0.75,1/.075=1.334=4/3,hence that 60 will have a value of 80 runs,for eg in virender sehwags 309,the 3 most used bowlers were saqlain mushtaq,(43 overs),mohammed sami(34 overs),and shoaib akhtar(32 overs),oh, i forgot to add,i am using bowling averages at the start of that match,so it would be 23.41+43.62+28.99=96.02/100=0.9602,while 1/0.9602=approx 1.04,thus the innings would be worth around 321 while matthew hayden's 380 would be worth less than 290

Hi,

Also - I want to add another factor - class of opposition. It was mentioned earlier. Jacques Kallis has scored something like 750 runs against Zimbabwe with an average (I think) over 200.

I don't bellieve runs against Bangladesh & Zimbabwe are as hard, (obviously) as others - Sorry Dizzy!

Hi,

I saw a good statistical analysis of batsmen revolving around Standard Deviation. It was on the Cricinfo site a few years ago. I can't remember if it was Rajesh that put it forward. It would be good in conjunction with averages.

I think you should have a look at average balls faced then using career strike rate - calculate a probable score had the batsmen faced his average quota. Not outs with more balls faced than average would be disregarded.

This system would reward the attacking batsmen over the dour, i.e Ponting v Boycott.

Hi,

I saw a good statistical analysis of batsmen revolving around Standard Deviation. It was on the Cricinfo site a few years ago. I can't remember if it was Rajesh that put it forward. It would be good in conjunction with averages.

I think you should have a look at average balls faced then using career strike rate - calculate a probable score had the batsmen faced his average quota. Not outs with more balls faced than average would be disregarded.

This system would reward the attacking batsmen over the dour, i.e Ponting v Boycott.

An interesting analysis, though I think it is flawed, statistically. You have assumed that batting averages are normally distributed and that current performance predicts future performance, and I don't think that is true (though both assumptions can be checked if you have the time). My guess is that a batsman's innings average is far from normally distributed and has a far higher correlation to grounds and opposition than to who was last played. Also, my own experience has seen batsmen have highs and lows that are quite unpredictable.

Come on I m sory to say that its rubbish idea present system shows us the true picture and this new idea of giving batsmen runs which they did not score is so funny so if we look it at this way in some matches few great batsmen were so good that you might have to give them 1000 runs....

Get back to work - what a waste of time this is!

More precise values are: Lara = 52.80 (0.09) or 0.998 x Average Sachin = 53.82 (-1.116) or 0.980 x Average Bradman = 97.93 (-2.014) or 0.980 x Average Ponting = 58.08 (-1.344) or 0.977 x Average M Hussey = 82.04 (-4.136) or 0.945 x Average

A very simplified way of adjusting averages is thus: Average ( 1- (Not outs Inngs/ Total outs Inngs)^2) eg. Adjustment would be Lara 0.999 x Average Sachin 0.989 x Average Bradman 0.984 x Average Ponting 0.98 x Average M Hussey 0.942 x Average

A big assumption you make is that the Not out scores are similar to Out scores (ie. both have the same means).

The way to look at it is this- Which is better? 1) 102*, 0, 0 or 2) 0*, 51, 51 or 3) 34, 34, 34* The formula works perfectly for case 3). You would want 2) to be valued higher than 1). The more precise formula would give the following adjusted averages- 1) 34 2) 51 3) 45.3

One key issue is that Lara and Tendulkar are the most charismatic players of this generation not the best - for that, you have Ponting, Kallis and Dravid. The issue of not outs is secondary. The other key issues when comparing players of different eras are 1. Helmets, 2. Limits on bouncers today, and 3. Induction of weaker teams into test cricket. Sacreligeous as this may be to Tendulkar worshippers, Gavaskar's overall average of 50 is far superior to Tendulkar's 54. A helmet easily adds 10-15 runs to a batsman's average. If the current Australian or Indian team were to have played a Windies pace attack of the 70s, they would have withered much the way teams did then.

Nice analysis, Ananth. The whole issue is about averages affected by not out innings. I have some additional thoughts. As someone suggested the not outs when a team is all out should be counted as a completed innings for statistical purposes. In other situations a projected score can only be arrived at by the average score of all scores equal and above the current score. That is if a batsman is not out on 100, and his previous scores above that score have been 100, 125 and 150 (average 125) then his current innings will be deemed completed at 125. If the current not our score is his career best then his innings will be deemed complete without further addition. A ten innings average is statisticaly arbitrary to determine form. The whole career average cannot be questioned for validity. Some others have mentioned the value of high scores against weaker oppositions. Nothing can be done to manipulate raw averages for these. However the strength of opponents can be taken into considerations for RATINGS and not STATISTICS.

Interesting analysis, Ananth, which highlights a systematic bias in the current formula. Many commentators seem quite upset by your methodology (or the Lara/Sachin debate that seems to underlie it). However, I view it as an insightful adjustment that does shed light on how different styles of play, different team results, and different batting positions can affect scores.

Ultimately, the new statistic didn't provide a great deal of new information. OK: we'll try to find other statistics to put in our toolbox to evaluate batsmen. Baseball has its own batting average stand side-by-side with on-base percentage, slugging percentage, and offensive-production statistic. Why not use a suite of tools to measure the productivity of cricket batsmen in different situations (e.g., adjusted for batting position or for test result) and allow us to have a more informed and entertaining debate?

Enjoyed reading the comments on the article (found them to be more informative than the article itself) and especially Stuart's views. I agree with him when he says that the traditional batting average is the amount a batsman scores between dismissals (hence the level of difficulty of getting him out). At the same time we can not ignore how much the batsman manages to score each time he occupies the crease (whether he gets out or not). Essentially we have the traditional average (TA) supplmented by the innings average (IA). The absolute difference between TA and IA would increase from top to middle order and then decrease towards the tail enders. For a top order batsman it is more important to have a high IA (since he needs to score as many as possible when he steps out) whereas for a lower order batsman it is more important to have a high TA (since he needs to stick around and stay not out as much as possible). To arrive at a better average measure then, we weight the IA and TA by the average batting position assigning a weight of 1 - IA, 0 - TA to a career opener and a weight of 0 - IA, 1 - TA to a career #11. What this means is that someone who can score runs but can't shepherd the tail has an incentive to bat up the order so that his average improves. Of course, this is contingent upon the fact that the average batting position is not a simple average, but weighted by the number of runs scored in each position.

Sounds complicated, but since cricket lovers love to split hairs over statistics, I guess this one should fly too.

However, if it ever comes to votes for the best averaging system I would go with what good ol' MCC has being doing for the past 120 years.

There is no problem to solve here. Why are Not Outs a "problem"? There is a purity and simplicity to the batting average which the EBA simply takes away. Runs divided by times out. No need for a change.

Additionally, the proposed calculation of an EBA is surely misconceived. You shouldn't assume that a not out batsman on, say, 30 will go on to score as if he had just come in. When he has scored 30, he should be well-set; the question then is, how many runs has he teneded to score once he had safely reached 30 not out? The fact that his last 10 completed innings included a handful of first-ball dismissals surely tells us little or nothing about how he would have fared from a starting point of 30 not out.

This illustrates a broader point: that if anything the traditional Batting Average unduly actually penalises (rather than flatters) those who have a higher proportion of Not Outs. This is because they have been exposed relatively more often to the most vulnerable part of their innings - the start of it.

There is no problem to solve here. Why are Not Outs a "problem"? There is a purity and simplicity to the batting average which the EBA simply takes away. Runs divided by times out. No need for a change.

Additionally, the proposed calculation of an EBA is surely misconceived. You shouldn't assume that a not out batsman on, say, 30 will go on to score as if he had just come in. When he has scored 30, he should be well-set; the question then is, how many runs has he teneded to score once he had safely reached 30 not out? The fact that his last 10 completed innings included a handful of first-ball dismissals surely tells us little or nothing about how he would have fared from a starting point of 30 not out.

This illustrates a broader point: that if anything the traditional Batting Average unduly actually penalises (rather than flatters) those who have a higher proportion of Not Outs. This is because they have been exposed relatively more often to the most vulnerable part of their innings - the start of it - relatively more frequently.

This is bordering on silly. A run never scored, never really happenned at all. Statistics should be based on reality and not ones imagination.

Comparing averages or extending not out inning does not help.Some players only play for their averages and don't care one bit if a tailender is batting at the other end.Please try to compare the S/R of these players and the no. of times they threw their wkts away while batting with tailenders.

It is not fair to compare averages as subcontinent players try to remain not out to boost their average.Also they get to score 248*,122*,101 against bangladesh and 200*,176* and 74*against zimbabwe.Any doubt about it should be quashed only by looking at the no. of times sachin remained not out.

You make the point about Lara and Tendulkar, apparently the reason for concocting this system, saying Lara has played fewer tests, scored more runs, etc, but ended up with a lower average. This, I agree, seems somewhat awry. However, in your new and improved table, Lara still has a lower average than Tendulkar. I fail to see what has been achieved, or how you have improved a simple (although, I will accept, flawed) system by making it exponentially more complicated.

The effort to develop a different indicator should be lauded. However, the current method does not take the physical(mental) state of the batsman or the situation of the game. I am sure the process was laborious, however the concept does not attempt to cover situations. It is more important to develop more situation based stats like in Baseball in addition to average if the true worth is to be judged. May be the performance of the top 5 batsman in your team +teams result+quality of the bowling attack+conditions would be a good judge of a player. Though it can be complicated, it could specify the true worth of a batsman or a bowler.

If I want to statistically drill down deeper into a batsman's performance, the first thing I look at is their performance against the great teams of their day. Then their overseas performance. Then the strength of their team. Then their strike rate, batting partners.... Somewhere at the bottom, would be their 'not-out' performance. Even then, a 'not-out' from a high-order batsman has different meaning than one from a low-order batsman. (The latter has to shepherd the tail along, a more difficult job generally than the former who could be attaining a milestone, playing for a draw or accumulating towards victory).

I wonder why it all boils down to the Lara V/s Tendulkar debate. Personally I feel Lara was a better Test Batsman (Purely for that 152* V/s Australia as against 148 out for Sachin V/s Pakistan.. if not anything else). Sachin is far better in one-dayers. But it leads nowhere. Sachin had expectations of a billion fans, and Lara had expectations to revive a losing form of game in the Windies. Both were expected to score a century every innings they play. Let's not forget Lara's misfortune - he had team mates who play much worse. Sachin has had Dravid in his team performing better than him in the Tests. Lara had none. Well.. debates can go on. Just enjoy the entertainment which these great players provide rather than debate and spoil the sport.

Some quite good arguments in this post mercifully free of rancour. It's always a pleasure to join a genuine cricket discussion. Averages have tended to irritate me not because they are measured incorrectly because as Stuart rightly points out they are meant to measure and do measure the runs per dismissal, but because they measure the wrong thing. Surely a batting average should be the average number of runs the batsman scores each time he steps to the crease, regardless of dismissals. The classic example is that of the Waugh brothers. Steve Waugh and Mark Waugh scored roughly the some number of runs per innings for Australia but Steve had so many not out innings that his average was bloated by about 10 above his brothers. Hence any reading of a history book would suggest that Steve contributed 10 more runs to each innings than he did when in reality his wicket didn't fall as often. The only way around it is to get rid of not outs all together.

I think quality of opposition and playing conditions should also be considered for battting averages. But such external factors are highly subjective, of course.

But the main defect of this new system, in my opinion, is that it relies too much on current form of the batsman. For example, Lara averaged 34 in the last ten innings he played before the 400, and Sachin averaged just 11 before his 248. I think great players can strike form in any match, regardless of how much they scored in previous matches.

Interesting new concept, nevertheless. I am sure you must have worked really hard calculating the new stats. I am hoping for more such out-of-the-box ideas in the future.

Batting and bowling averages are virtually useless for comparison purposes between individual players blessed with great natural talent. Playing conditions vary greatly from era to era and there is a huge gap these days between the top and bottom placed Test nations.

Well, Stuart has a point in his post, but a stat to measure performance that takes batting position into acocunt, among other things, seems a good move. I bat at 8 or 9, I average in the mid thirtis most seasons, but score less than 350 runs. Typically lets say 15 innings, 5 not outs, 350 runs, average 35. Not at all the same as the opener who scores 1000+ runs with no not outs (and really very little chance of getting any in the single innings games I play, maybe once or twice a season one of the top three is not out in a match, some seasons this doesn't happen at all. I finish not out at least 3 times a season, and in 2003 I managed it 11 out of 17 innings ! OUr regular opener has played for the club nine years and has only one not out in all that time.

His expectation is different, his role is different, his concentration requirement is different (something which Stuart misses in his post). One last point. In baseball all these stats already exist, and are USED ! As a lover of sports text sims (International Cricket Captain, OOTP Baseball, Football Manager etc) I reckon it's about time cricket used the same information as baseball for it's decision making.//Just my tuppence worth.

Why not leave the averages calculation as it is, other than to add another column for 'weighted average' whereby the runs scored in each innings are multiplied by a dreivative of current test ranking of the team that the runs are scored against. While this would not necessarily reflect the difficulty of the bowling attack faced it still would reward runs scored against teams like australia more than runs scored against teams like Bangladesh. One problem is that Australian batsmen would be penalised more tahn others as all of their runs would be scored against teams with lower test rankings, but not necessarily weaker bowling attacks. Also the significance of the runs in within the context of the match and the series should have some sort of weighting, but every extra factor to consider means more subjectivity and complexity to the "averages", which as a player's career continues will be less affected by not-outs and/or a few large scores and provide a reasonably fair view of a Batsman's abilities.

I have to disagree with the statement "One major problem with the conventional average ... is the way it deals with not-outs". It is not a problem at all if you have thought about it. This one keeps coming up, I am not sure why but I think it is based on a misconception about what a batting average measures. A batting average measures the number of runs between dismissals. If you get 20* and 27, that is equivalent to a single innings of 47 for your batting average. It also means you cobbled together 47 runs before you got out, whther it was over two innings or one. As it stands, interpreted correctly, a batting average is a perfect measure and needs no adjustments or fiddling (aside from the differences in opposition/pitches/how they play/etc which can't be measured statistically and are outside the scope of the batting average). Tendulkar gets 54.94 runs for each time the bowlers get him out, Lara gets 52.89. It doesn't matter if it is spread over different innings, or different games; or if the not outs ("unbeaten" runs) happen because of rain, declarations, successful run chases, or running out of partners. The batting average measures how many runs they get between dismissals (beaten). The time Bill Brown averaged a 102 on a tour of England (or something) was because he was getting an average of 100 runs per dismissal. Which is pretty good achievement, and the average measures that. Yes it is an oddity, but shows that on this tour bowlers couldn't get him out more than once per hundred runs. Ditto for Mike Hussey and his average in the 80s

Cheers,

Stuart

The idea of extrapolating one's score if he is not out looks to me a total waste of a thought. How can you reward batsmen runs when thay have actually not scored them. Why are we not taking into account on aggregate of runs scored by the batsmen at the end of his career. This aggregate will comprise of runs scored by him + runs awarded to him just for the sake of useless averages. Will that not create inconsistency? Citing an example of Tatend Taibu and Michael Hussey. Zimbabwe has one of the weakest tails and Australia will have a strong tail, so it will give Hussey a better opportunity to score with tail,if he were allowed to play unlike the case of Taibu, where it'd be safe to assume that ZIM tail will not wag.

The interesting point here'd be if we are extending innings score for batsmen just for the sake of averages, why dont we do that for bowlers. Why dont reward them wickets in incompleted innings based on their prior performances?

I very much agree with what Nick has mentioned regarding the comparision of two Modern Legends. But None the less this analysis mentions a point worth noting. It mentions a place where Sachin has by far allowed Lara to get past him i.e. scoring big runs in one innings. Just to add to what Anant has said if we remove the runs scored in the two innings (375 and 400*) the figure seems to be almost the same!!!

Batsman Tests Innings Not-outs Runs Average Runs per Test Brian Lara 131 232 6 11,953 52.89 91.2 Sachin Tendulkar 141 228 24 11,207 54.94 79.5 Hypo BL(removing 400 and 375) 131 230 5 11,178 49.68 85.3 Hypo BL(removing 400) 131 230 5 11,553 51.35 85.3 Also worth mentioning is the fact that the last row would have been the real statistics provided the Umpire (I guess Billy Bowden) would have correctly adjudged Brain Lara out for a duck Caught Behind of Steve Harmision of the fourth ball he faced. At the end the whole effort of calculating EBA does not seem the worth becuase either way it just what the cricket fans believe about the players. Whatever statistics say Andy Ganteaume or Naveed Nawaz with batting averages 112 and 99 respectively are noway better players than Sachin or Lara!!!

You really should order the first table by the batsman's average, rather than their EBA. It took me a little while to see that there was any change at all between the first table and second table, because the first table is ordered counter-intuitively.

Whats the point? Lara is still behind Tendulkar; n thats where the whole discussion started. Oh by the way, why not tell it to a certain Mr Tony Greig, who thinks Tendulkars 15000 ODI runs are simply because he played an enormous 400 ODI's and that a certain Michaek Bevan would have had much more had he played the similar number of matches.

Perhaps Mr Greig forgot about the bloated figures due to not outs. Talkign of which why not do this excercise on ODI's and lets find out what Mr. bevan was capable of. I think scoring 30 and remaining unbeaten is better than scoring 100 and getting out. Surely, Bevan must have been better cuz he was born in Aussie land.

Gime a break !!

Hi, This analysis was worth if only for the reason to find out that it wasn't worth the trouble!

Give a statistician some numbers (cricket is the mother lode for this) and they will reach the end of the world to slice-and-dice it in different ways with the hope that it may lead to some meaningful insight. Mr Ananth the article was redeemed by the fact that it was posted in the Trivia section, which is why I read it :-)

"The new, improved batting average" ---- dudeeeee no its not, the current system works well, if a batsman can stay not out till the end then he deserves that extra bit of an average, there is real merit in staying not out and guideing yours team to victory. people will high not outs are good finishers. plus how can u apply this system to a # 11 batsman who has a 50% change to get a "not out" everytime he goes out to bat. i think if u wanna waste time on redoing these stats then u should watch every test again and count the balls that beat the bat for every bowler and devide it by the # of overs he bowled and see who the best bowler really is ;)

One cannot help but observe the "limits" put there of 1500 runs (NOT 1000 NOT 2000 but 1500) and the top 25 batsmen (by average) were done very cleverly to exclude a batsman with a high average like Kambli 1000+ at 54.2 and also to exclude three people who crossed 10,000 runs - Gavaskar, Border and Waugh. How can a person like Border, Gavaskar, Sachin or Waugh be compared to someone like Pietersen or Hussey who have just not played enough tests.

What a biased observation. The author observed that Tendulkar played 10 more tests than Lara but he failed to observe that Lara actually played 4 more innings than Tendulkar. And Tendulkar can reduce the gap of 750 runs considerably in 4 more innings. Lara benefited from his two big innings of 400 and 375 against weak English attacks.

interestingly enough, with the two different systems, the order stays the same. I would have thought though that we have an indication as to how many more runs a not out batsman would have scored - the number of runs in their next innings. The old system which treats an innings as being contiunous until the batman is out still seems to make sense - the batsman has scored that many runs before the opposing team has dismissed him - no hypothetical situations. Batting average in just one indicator of how good a batsman is. Not outs is another indicator of how good a batsman is (ie the batsman who holds the innings together). rather than creating hypothetical runs for a batsman, is there a way we could combine batting average with not out/innings

My basic problem with anyone having a problem with not outs has been this:

Why should a sequence 10*, 10*, 10*, 10* and 10 be inferior to a 50?

If anything, shouldn't it be more difficult to split your runs across innings because you have to play yourself in each time?

So how exactly does anybody "benefit" from not-outs?

Hi Anant Very nice analysis I thouroughly enjoyed the observation and how the avarages stack up after taking the EBA is taken into account. Look forward to more such data from your side regards roland

Even by your calculations, Sachin Tendulkar's average is still better than that of Brian Lara. What do you have to say about that?

There are some things statistics cannot do, and one of them is to try to objectively prove subjective feelings. Lara vs Tendulkar is a subjective argument, and the statistical jiggery pokery (is that a phrase?) above adds little to the discussion. Furthermore, if your average is 52, and you score 32 not out, how can you assume that, if not dismissed, the batsman would have gone on to score 52 more runs? On average, per completed innings, the batsman scores 52 runs per innings. It might just be useful to ignore the not outs completely, and just divide the total runs scored by times at bat - which will go some way to predicting how many runs that player will score next time they bat, which is, presumably, what the statistic is there for...

I am sorry, there are a number of statements made in this post which I fail to agree with.

- "Lara has scored nearly 750 more runs in ten fewer Tests than Tendulkar." Ummm, Lara has batted in more innings in 10 fewer tests than Tendulkar. It would be unfair to take the number of tests as an indicator

- "Kumar Sangakkara, in the greatest form currently, has scored 984 runs in his last 10 innings at an innings average of 98.4. If he remains not out with, say, 32 in the next innings, it is fair to assume that he would extend his innings by another 98 runs, to 130, considering his outstanding form." - "Afterwards, recent form takes over. The not-out innings is extended by a rolling innings average of his last 10 played innings."

I fail to understand why it will be fair to assume that, having already scored 32 runs, Sangakkara's score in this example should be augmented by 98 more. If the assumption is that, given time and circumstance, Sangakkara could complete the innings in question, then measurement of his probable score should stop at 98, as that is the average score for Sangakkara in his past ten completed innings.

- "This is applied to each and every innings played by all the batsmen. Care is taken to ensure that the adjusted innings total does not exceed the batsman’s highest score. In other words, Lara's 375 will not be allowed to go past 400. However if the highest score by a batsman is a not-out innings, for example Lara's 400 not out and Tendulkar's unbeaten 248, that specific innings will be allowed to be extended. This, I think, is common sense."

I fail to see how this would accurately define the form of a batsman. I would like to see if Garry Sobers' 'average' worked out this way -- how his debut century (the unbeaten 365) would set up his numbers for the rest of his career.

A batting average will do no good if measured for a small number of innings, as it will never portray an accurate picture. The batting average is a career measure.

Well, this article's not all a waste; it was amusing to learn that Sir Leonard Hutton had scored almost 70,000 test runs.

An interesting analysis, although it doesn't really achieve much as most individuals appear in approximately the same positions in both lists. A better indicator of a players ability and influence would be obtained by multiplying his normal average by his %strike rate/100. Using this method, attractive and attacking players like Gilcrist and Lara would register well above defensive players like Dravid and Kallis.

Calculation is fairly logical but is`nt that too complicated when compared to traditional method.overall good analysis

Calculation is fairly logical but is`nt that too complicated when compared to traditional method.overall good analysis

It doesn't make sense that you extend the score by adding the average. For example, a not-out score of 25 is extended by 45 to 70 if the average is 45. According to the definition of average, the score should only be 45, not 70. The fact that the batsman has already scored 25 doesn't matter, since according to the law of probability, the batsman should get out at 45, not 70.

In the same way, if the batsman has already scored 70 n.o, but if his average is only 45, only 45 should be considered. It was just an aberration that he managed to score more than the average.

Absolutely right, I think you should also point out the fact that Brian Lara probably has the most number of runs in inconsequential tests (I guess his 400 not out is a great example) Stop comparing Lara and Tendulkar. Both are great and any team of any era would have a place for them. Undermining one over the other is just plain simple stupidity!

I didnt know that Len Hutton had scored 69971 runs in tests....

A very interesting idea, it makes alot of sense. However I think it also illustrates the relative strength of normal batting averages.

After your computation, the rankings shown are very similar to the rankings of the conventional batting averages. So is the computation really necessary?

An interesting approach. However, rather than guess at what might have been, a simpler approach is to take one of 2 alternative views: 1. The worst case scenario - the batsman will be dismissed next ball, had he played on, so the calculation would simply be total runs divided by total innings batted (a runs per innings value). 2. A less pessimistic approach might be to subtract from the total runs scored all of the runs scored by the batsman in 'not out' innings, and to divide the difference by the number of innings in which the batsman was dismissed. So we would get a runs per innings value again, but only including those innings in which the batsman was dismissed. I would be most interested to see what the tables above look like with these figures - perhaps Mr Narayanan, you might be able to oblige? Many thanks.

The whole system makes no sense. How is it more fair if it actually reduces your career average? The way the current system works is logical because it allows you to extend your not-out inning until you are dismissed before it is taken into account in your average. Why would you calculate an average based on possible additions to your not-out score when the current method already takes into account the actual runs you add to your score before getting out (albeit in the next game).

This does not seem to be good.In my opinion that is. eg:Say Dravid has an average of 25 in the last 10 test innings.He scores a 200 not out in the next.Then what will you extend?also suppose he scores a 30 not out in the next innings.What will you extend to that 30?You cant predict what a batsman will score as "FORM IS TEMPORARY , CLASS IS PERMANENT".Form of a proven batsman cannot be judged on the basis of 10 innings.Present calculation is fine.It doesn't show vast gulf between 2 class batsmen,with one having a higher average because of not outs.A batsmans one innings will be completed when he gets out.Let it be any number of innings that he remains not out.That makes sense.I dont think a batsman can remain not out in the scores of 50 in say 20 innings and then get out so that his average is 1000.Present method is fair.

Although "forecasting" the completion of a not out innings is an interesting concept, the proposed method does not take into account critical factors relating to (a) a batsman who plays himself in is likely to score more runs than his average, until (b) he reaches some limits of his own endurance/patience at which point he is more prone to being dismissed ( this may also happen if there is a break in the proceedings, e.g. close of play, but that is hard to quantify directly ). In other words, if Dravid or Ponting were to stay 20 n.o., they are more likely to "average (traditional)" 60+ more runs in that innings. So, in that sense, the batsman who stays not out after getting set has actually done the hard work, but cannot reap the rewards of his survival.

If there were enough data, & one were willing to deal with mathematical complexity ( yes, there are some strategies required to complete the model ), I would augment the above method ( which makes sense, in general ) to include the effect of the batsman's not out score on the result -- i.e to create a "conditional probability" table which projects the number of runs to be added based on the runs scored.

I think that, in general, the conditional table will show that, typically, a batsman's "forecasted score" will probably rise initially as he "gets set", & will then fall off as his patience & endurance are tested.

Bradman & Lara are probably anomalies -- once they get going, they rarely seem to tire.

Vivek.

Although an interesting and a new view to look at numbers, the results immediately brings up the queston "All this extra calculations for what?"

The median variation here is -0.8% which is too small for the complexity for most cricket fans.

The complaint about the current batting average is well made (and indeed I have heard it before). Personally I think that the current system does a reasonable job of factoring in the value of a wicket to a batting team. It is essential to remember that a batting average is not merely a benchmark by which to compare a batsmen to others; it is also a criteria by which to gauge a batsmens value to the team. In this light I think the current batting average is an appropriate system.

Your proposal is defintely superior to a different one I have seen that proposed to base the batting average on number of runs devided by number of innings as that system effectively rewarded batsmen who did not set a great value on their wicket.

I do nonetheless think that your proposal falls over in the face of solid analysis. The flaw in the attempt to reward batsmen with additional runs for their not-outs. For all the complexity around the calculations the criteria remains fundamentally arbitary, especially with regard the handling of exceptional circumstances such as very high scores. There are simply to many other variable that are not taken into account.

Leave it the way it is - I say.

This discussion borders on the irrelevant. There are no significant changes in the extended batting averages. To say how much more runs a batsman would have scored when not out based on the players current form is meaningless as is this article. Surely, the statistics on players batting average should be used to find out which players averages are a true reflection of their ability.

hmmmm.. good analysis, but then it adds a subjective element to the batsmen's statistics (something not always reliable). Also "current form" cannot always be determined the number of runs he has scored in the last 10 innings. In such a case, you also need to analyse the type of bowling he faced, situation he has come in, etc.... Its best to leave it as a pure factual measure, instead of going into partial details.....

I have a couple of concerns with this method. I find the idea of taking recent form into account to be a good one, although I suspect that the last ten innings is probably too small a statistical sample. It also does not take into account a particular batsman's conversion rate. For example, if Mark Waugh was not out on 150, it is probably reasonable to expect him to be out for 153. By contrast, Bradman on 150 would probably be very good odds to make 300. I realise you have taken this into account at the very top of the scale, by imposing a top score upper limit, by that seems fairly arbitrary. Would it not be possible instead to simply apply each batsman's conversion average to his not out innings. By that I mean, if a particular batsman averages 90 in all innings in which he reaches 20, then assume he will make 90 if not out for 20. On the other hand, if he averages 40 overall, then give him 40 if not out for nought. This would be an interesting exercise, and would give credit to those batsman who are good at going on and making big scores. It would be interesting to see what this would do to someone like Marvan Atapattu - his low not out scores would get little credit because of his duck habit, but a not out 100 might get him 200.

The average system works perfectly as is. This is due to the fact that not outs are often a key indicator as to the value of an innings. Lara rarely remained unbeaten and that is more of a comment on the type of innings he would play (dashing, aggressive) and the position in the order he batted. Players like Mike Hussey and Jacques Kallis should not be punished for having the excellent and desirable ability to be there at the end of an innings. Sure getting a stream of small not outs boosts the average, however the total value of a player is perfectly assessed on the PWC ratings system whereby players are accredited values based on the importance and execution of their innings.

Not only that, form can change in cricket so quickly that attributing an added score based on the average of the last ten innings is a dangerous precedent in itself. It is far less accurate and more ill-indicative than the not out system.

Batting average is runs divided by total dismissals for a reason.

I agree that the batting average as currently calculated is flawed. However I also can't help but feel that your approach has a fundamental mistake in its logic. It assumes that within an innings a batsman is as likely to get out at a score of 32 as he is to get out at a score of 0. A quick consideration of the data across a range of test batsmen shows this is not true. Almost all batsman are more vulnerable at 0 and 1 than they are at say 20. (Mark Taylor is an interesting exception) Then, for individual batsmen, they have isolated high points (Tendulkar for example is highly vulnerable around 90). So simply adding the weighted average of "nearby" innings suffers from the same problem as the traditional average calculation.

In fact a better calculation would be to take the surrounding scores, and simply replace the not out score with the average of the completed innings which are greater than the not out score under consideration. There is still the problem of what to do if there are no higher completed scores than the one under consideration. In this case I would simply add the overall completed average to the not-out, just as you have already suggested.

we need to do some more copmarison before we can say that this is an improved stats simply because brian Lara has played at number3 in all his test matches whereas Sachin has played at number four and at times number five for most of his career, this leads to fewer bat6teing opportunities and hence the overall runs/test comes down dramatically particularly for someone who has played more that a 100 test matches.

It would be good to see the the EBA in the ODI format, how would it affect the average of the finishers - Bevan, Klusener, Yuvraj etc.

Good theory on a way to include the not-outs. However, going by your findings, there is very marginal difference between the original batting average and your extended batting average. In fact, the order of the top 25 batting averages did not change at all! Given that and given the complexity of computing this and the fact that it cant be computed by direct division by a viewer of the game, I am not sure how useful such a system really is.

Very interesting article. However, there is very close concordance between conventional batting average and EBA in nearly all cases. To me, this indicates that the conventional batting average is a pretty decent judge of a batsman, but care must be taken when applied to batsmen with a high number of not outs.

The fact that the changes are marginal beggars the question - why bother?? The standard evaluation of an average, regardless of the number of not-outs, seems accurate enough to me. And there are plenty of instances of batsmen badly out of form coming good with a huge score. Ponting followed 3 ducks with 197 against India a few years back, Greg Chapell had a string of ducks before hitting back with huge score (200?), Mark Taylor came back from the brink of oblivion with a century - the list is probably endless. So using the past 10 innings to try & predict what a not-out score will end up as is prone to error. A good read though!

You mention that Tendulkar has scored fewer runs in more matches than Lara. However, scoring opportunities don't come by match, they come by innings. And in terms of innings, Tendulkar has played slightly fewer than Lara (228 vs 232). 4 innings does not necessarily account for the run difference of about 700 runs, but it does show that the runs/match difference is rather misleeding in how heavily the two scored relative to one another. After all, it (theoretically) possible for one not to bat at all in a test, and there is little sense in judging one's batting ability on how often one's team required him to bat in a test. If one considers runs per innings, strictly speaking, without factoring in not outs, the figures are 49.15 for Tendulkar vs 51.52 for Lara, or, assuming each played 2 innings per match, 98.3 per 2 innings match for Tendulkar vs 103.04 for Lara, a significantly smaller gap than that suggested by the pure runs-per-test method which penalizes Tendulkar for the fact that India did not require him to bat twice as often as West Indies required Lara to bat twice. If anything, the runs per test statistic reflects more on how much of a batting failure the West Indies team has been over the last decade (barring Lara and a few others) than anything else.

This way of computing the average is flawed in the fact that now Brian Lara now has 12000 runs in Test cricket instead of 11,800 runs. This distorts the amount of runs that the batsman scored which is not good for statistics.

Still does not solve the problems. If you look at the batsman's recent form, you need to look at the recent form of the bowling side! A 32 n.o against Australia is not the same as a 32 n.o against Bangladesh! I also do not agree with extending the current n.o score by the recent average. If Sehwag has a score of 32 n.o and a recent average of 19, I would think he has already done his bit (in that he possibly will not score 19 more). Statistically, I would fit a distribution on the runs scored in his recent matches and find the 'extended score' conditioned on his current score based on this distribution!

Good idea definitely.I have one more suggestion to add to the calculation. Compute the ARuns or extended runs for a not out batsman only when the team has declared in the innings or the team has drawn / chased down the score in the last innings.A batsman should not be rewarded for failing to protect the tail and batting for his average in my opinion.A case in point was Laxman in the Delhi test,most critics thought he didnt bat positively or protect the tail in the first innings,so he should have had a return of 70 odd only instead of 70 odd + 40 odd.Also a distinction could probably be made between 1st innings rolling average and 2nd innings rolling average for the calculation of the extended average since first and second innings averages differ a great deal for most batsmen.Finally they say form is temporary and class is permanent... So instead of say taking the average of last x innings , we could use a weighted average over the career by the formula : [average(n) = x * average(n-1) + (1-x) * current_score where 0

No featured comments at the moment.

Good idea definitely.I have one more suggestion to add to the calculation. Compute the ARuns or extended runs for a not out batsman only when the team has declared in the innings or the team has drawn / chased down the score in the last innings.A batsman should not be rewarded for failing to protect the tail and batting for his average in my opinion.A case in point was Laxman in the Delhi test,most critics thought he didnt bat positively or protect the tail in the first innings,so he should have had a return of 70 odd only instead of 70 odd + 40 odd.Also a distinction could probably be made between 1st innings rolling average and 2nd innings rolling average for the calculation of the extended average since first and second innings averages differ a great deal for most batsmen.Finally they say form is temporary and class is permanent... So instead of say taking the average of last x innings , we could use a weighted average over the career by the formula : [average(n) = x * average(n-1) + (1-x) * current_score where 0

Still does not solve the problems. If you look at the batsman's recent form, you need to look at the recent form of the bowling side! A 32 n.o against Australia is not the same as a 32 n.o against Bangladesh! I also do not agree with extending the current n.o score by the recent average. If Sehwag has a score of 32 n.o and a recent average of 19, I would think he has already done his bit (in that he possibly will not score 19 more). Statistically, I would fit a distribution on the runs scored in his recent matches and find the 'extended score' conditioned on his current score based on this distribution!

This way of computing the average is flawed in the fact that now Brian Lara now has 12000 runs in Test cricket instead of 11,800 runs. This distorts the amount of runs that the batsman scored which is not good for statistics.

You mention that Tendulkar has scored fewer runs in more matches than Lara. However, scoring opportunities don't come by match, they come by innings. And in terms of innings, Tendulkar has played slightly fewer than Lara (228 vs 232). 4 innings does not necessarily account for the run difference of about 700 runs, but it does show that the runs/match difference is rather misleeding in how heavily the two scored relative to one another. After all, it (theoretically) possible for one not to bat at all in a test, and there is little sense in judging one's batting ability on how often one's team required him to bat in a test. If one considers runs per innings, strictly speaking, without factoring in not outs, the figures are 49.15 for Tendulkar vs 51.52 for Lara, or, assuming each played 2 innings per match, 98.3 per 2 innings match for Tendulkar vs 103.04 for Lara, a significantly smaller gap than that suggested by the pure runs-per-test method which penalizes Tendulkar for the fact that India did not require him to bat twice as often as West Indies required Lara to bat twice. If anything, the runs per test statistic reflects more on how much of a batting failure the West Indies team has been over the last decade (barring Lara and a few others) than anything else.

The fact that the changes are marginal beggars the question - why bother?? The standard evaluation of an average, regardless of the number of not-outs, seems accurate enough to me. And there are plenty of instances of batsmen badly out of form coming good with a huge score. Ponting followed 3 ducks with 197 against India a few years back, Greg Chapell had a string of ducks before hitting back with huge score (200?), Mark Taylor came back from the brink of oblivion with a century - the list is probably endless. So using the past 10 innings to try & predict what a not-out score will end up as is prone to error. A good read though!

Very interesting article. However, there is very close concordance between conventional batting average and EBA in nearly all cases. To me, this indicates that the conventional batting average is a pretty decent judge of a batsman, but care must be taken when applied to batsmen with a high number of not outs.

Good theory on a way to include the not-outs. However, going by your findings, there is very marginal difference between the original batting average and your extended batting average. In fact, the order of the top 25 batting averages did not change at all! Given that and given the complexity of computing this and the fact that it cant be computed by direct division by a viewer of the game, I am not sure how useful such a system really is.

It would be good to see the the EBA in the ODI format, how would it affect the average of the finishers - Bevan, Klusener, Yuvraj etc.

we need to do some more copmarison before we can say that this is an improved stats simply because brian Lara has played at number3 in all his test matches whereas Sachin has played at number four and at times number five for most of his career, this leads to fewer bat6teing opportunities and hence the overall runs/test comes down dramatically particularly for someone who has played more that a 100 test matches.

I agree that the batting average as currently calculated is flawed. However I also can't help but feel that your approach has a fundamental mistake in its logic. It assumes that within an innings a batsman is as likely to get out at a score of 32 as he is to get out at a score of 0. A quick consideration of the data across a range of test batsmen shows this is not true. Almost all batsman are more vulnerable at 0 and 1 than they are at say 20. (Mark Taylor is an interesting exception) Then, for individual batsmen, they have isolated high points (Tendulkar for example is highly vulnerable around 90). So simply adding the weighted average of "nearby" innings suffers from the same problem as the traditional average calculation.

In fact a better calculation would be to take the surrounding scores, and simply replace the not out score with the average of the completed innings which are greater than the not out score under consideration. There is still the problem of what to do if there are no higher completed scores than the one under consideration. In this case I would simply add the overall completed average to the not-out, just as you have already suggested.