# The vexed question of 'not outs' in Test cricket

**Due to technical issues, Ananth has not been able to view and respond to the comments. We are working on the issue and hope to have it resolved as soon as possible.**

This article addresses the often-debated question of 'not outs' in Test cricket. 'Batting average' is an archaic statistical measure with a glaring weakness. While other statistical measures have seen many changes over 130 years of Test cricket, this measure with a fundamental flaw has survived unaltered. Let's begin by understanding the flaw and then look at the methods to address it.

So what exactly is the problem? Well, it lies in the manner of handling not outs. Lara played an epic, scoring 400 runs over 13 hours but this innings, as far as determining the batting average is concerned, does not exist. On the other hand, his three first-ball ducks against Australia, England and New Zealand are considered as three innings. While it is true that he was dismissed in the later three innings, it is also a fact that he played long enough to have played four complete innings. Basically 'batting average' should not exclude such innings.

As Milind puts it quite effectively, the batting average computation violates a basic mathematical dictum. Runs are added to the numerator and nothing to the denominator. Absolutely perfect description of the anomaly that exists.

Let us compare the figures of two modern great batsmen.

Batsman Team T I No SNo No % Runs Avge RpI Kallis J.H Saf 162 274 40 5 14.6 13128 56.10 47.91 Lara B.C Win 131 232 6 2 2.6 11953 52.89 51.52

Kallis has played 31 more Tests to score additional 1150 runs but averages just over three runs more. That is because Kallis has 40 not outs compared with Lara's four. It might be due to the way Lara played, his batting positions or more declarations for Kallis who is a part of a stronger team and so on. Let us see how we can address the anomaly which is somewhat unfair to the top-order batsmen.

It should be noted that this problem is more pronounced in ODI matches because of the limited number of overs available and absence of declarations. It is also a fact that two batsmen remain not out in most ODI innings. However ODI batting is measured by the batting average *and* strike-rate, thus lowering the singular importance of batting averages.

I have selected 34 batsmen, who have scored over 2000 Test runs and averaged over 50, for this analysis. Virender Sehwag is just hanging on by the skin of his teeth and a failure in Chennai may very well plunge him below 50. And a reasonable Test at Centurion would push de Villiers past the 50 mark. However the data for all batsmen who have crossed 2000 runs is available for downloading and the link is provided later. The data is current up to match 2073, the Cape Town Test which finished just now.

Batsman | Team | Tests | Inns | No | No % | Runs | Avge |
---|---|---|---|---|---|---|---|

Bradman D.G | Aus | 52 | 80 | 10 | 12.5 | 6996 | 99.94 |

Pollock R.G | Saf | 23 | 41 | 4 | 9.8 | 2256 | 60.97 |

Headley G.A | Win | 22 | 40 | 4 | 10.0 | 2190 | 60.83 |

Sutcliffe H | Eng | 54 | 84 | 9 | 10.7 | 4555 | 60.73 |

Barrington | Eng | 82 | 131 | 15 | 11.5 | 6806 | 58.67 |

EdeC Weekes | Win | 48 | 81 | 5 | 6.2 | 4455 | 58.62 |

Hammond W.R | Eng | 85 | 140 | 16 | 11.4 | 7249 | 58.46 |

Sobers | Win | 93 | 160 | 21 | 13.1 | 8032 | 57.78 |

Hobbs J.B | Eng | 61 | 102 | 7 | 6.9 | 5410 | 56.95 |

Walcott C.L | Win | 44 | 74 | 7 | 9.5 | 3798 | 56.69 |

Hutton L | Eng | 79 | 138 | 15 | 10.9 | 6971 | 56.67 |

Kallis J.H | Saf | 162 | 274 | 40 | 14.6 | 13128 | 56.10 |

Sangakkara | Slk | 115 | 196 | 16 | 8.2 | 10045 | 55.81 |

Tendulkar | Ind | 194 | 320 | 32 | 10.0 | 15645 | 54.32 |

Chappell | Aus | 87 | 151 | 19 | 12.6 | 7110 | 53.86 |

Nourse A.D | Saf | 34 | 62 | 7 | 11.3 | 2960 | 53.82 |

Lara B.C | Win | 131 | 232 | 6 | 2.6 | 11953 | 52.89 |

Miandad | Pak | 124 | 189 | 21 | 11.1 | 8832 | 52.57 |

Clarke M.J | Aus | 89 | 148 | 15 | 10.1 | 6989 | 52.55 |

Dravid R | Ind | 164 | 286 | 32 | 11.2 | 13288 | 52.31 |

Mohd Yousuf | Pak | 90 | 156 | 12 | 7.7 | 7530 | 52.29 |

Amla H.M | Saf | 68 | 118 | 10 | 8.5 | 5610 | 51.94 |

Ponting R.T | Aus | 168 | 287 | 29 | 10.1 | 13378 | 51.85 |

Chanderpaul | Win | 146 | 249 | 42 | 16.9 | 10696 | 51.67 |

Flower A | Zim | 63 | 112 | 19 | 17.0 | 4794 | 51.55 |

Hussey | Aus | 79 | 137 | 16 | 11.7 | 6235 | 51.53 |

Gavaskar | Ind | 125 | 214 | 16 | 7.5 | 10122 | 51.12 |

Waugh S.R | Aus | 168 | 260 | 46 | 17.7 | 10927 | 51.06 |

Younis Khan | Pak | 80 | 140 | 11 | 7.9 | 6580 | 51.01 |

Hayden M.L | Aus | 103 | 184 | 14 | 7.6 | 8626 | 50.74 |

Border A.R | Aus | 156 | 265 | 44 | 16.6 | 11174 | 50.56 |

Richards | Win | 121 | 182 | 12 | 6.6 | 8540 | 50.24 |

Compton | Eng | 78 | 131 | 15 | 11.5 | 5807 | 50.06 |

Sehwag V | Ind | 102 | 177 | 6 | 3.4 | 8559 | 50.05 |

Most cricket followers are au fait with the above table. The one data element not shown normally is the "Not out %". This shows the % of not outs out of the total innings played. Among this elite collection of 34 batsmen, who account for 13% of runs scored in Test cricket, the highest % of not outs has been achieved by Steve Waugh, the middle-order giant from Australia. He has been unbeaten one in six innings. Andy Flower, Shivnarine Chanderpaul and Allan Border have similar numbers. In Flower's case, it has been more a question of a top drawer batsman in a weak team remaining unbeaten as his compatriots were dismissed.

The lowest figure has been achieved by Lara with 2.6%: that means once in 40 innings. Sehwag, with his attacking instincts is the only other batsman who clocks in fewer than 5%.

Out of interest, let me share with the readers some facts related to not outs across the 135 years of Test cricket. Of the 72865 innings played, there have been 9502 not outs, accounting for about 13%. Out of these 9502, 4253 not outs - nearly half - have been at scores below 10 runs.

A simple alternative is to use the Runs per Innings (RpI) instead of the batting average. Unfortunately it is a drastic step taking the other extreme. It affects the middle-order batsmen considerably. Many of their low-score not outs would be considered as completed innings and players like Kallis would be penalised. The graph below illustrates the two extreme situations - batting averages and RpI.

We need something between Batting average and RpI. I am proposing two alternatives to fill this space.

The first method seeks to redefine the not out innings. A dismissal is a dismissal and nothing needs to be done about those. But let us accept that even an Icelander with scant knowledge of cricket would accept that a 13-hour innings should not suddenly cease to exist just because of a declaration. Let us classify not out innings as "real not out" innings and the "Completed (or fulfilled) not out" innings.

The key is to determine a cut-off point beyond which the innings is considered as completed or fulfilled. I considered various values. A fixed figure, say, 25 or 50, would be unfair to weaker batsmen with low averages which means the figure has to be dynamically determined. The batting average itself is a good cut-off but a little stiff. Also we are questioning the very methodology of batting average. So I have zeroed in on a sensible dynamic value - a cut-off point at 50% of the "Average for dismissed innings". Here are couple of examples. Don Bradman's average for dismissed innings is 83.83 and any not out innings below 42 will be considered as a "real not out". Ken Barrington's average for dismissed innings is 50.37 and any not out innings below 25 will be considered as a "real not out". Any other not out innings would be considered as a fulfilled innings.

Let us examine the impact of this method. The table below lists the same 34 batsmen with their RpI and RpFI values, ordered by RpFI.

Batsman | Team | Tests | Inns | No | FulfilNO | Runs | Avge | RpI | RpFI | Chg % |
---|---|---|---|---|---|---|---|---|---|---|

Bradman D.G | Aus | 52 | 80 | 10 | 2 | 6996 | 99.94 | 87.45 | 89.69 | 10.3% |

Headley G.A | Win | 22 | 40 | 4 | 1 | 2190 | 60.83 | 54.75 | 56.15 | 7.7% |

EdeC Weekes | Win | 48 | 81 | 5 | 1 | 4455 | 58.62 | 55.00 | 55.69 | 5.0% |

Sutcliffe H | Eng | 54 | 84 | 9 | 2 | 4555 | 60.73 | 54.23 | 55.55 | 8.5% |

Hobbs J.B | Eng | 61 | 102 | 7 | 4 | 5410 | 56.95 | 53.04 | 55.20 | 3.1% |

Pollock R.G | Saf | 23 | 41 | 4 | 0 | 2256 | 60.97 | 55.02 | 55.02 | 9.8% |

Barrington | Eng | 82 | 131 | 15 | 4 | 6806 | 58.67 | 51.95 | 53.59 | 8.7% |

Walcott C.L | Win | 44 | 74 | 7 | 3 | 3798 | 56.69 | 51.32 | 53.49 | 5.6% |

Hammond W.R | Eng | 85 | 140 | 16 | 3 | 7249 | 58.46 | 51.78 | 52.91 | 9.5% |

Sangakkara | Slk | 115 | 196 | 16 | 3 | 10045 | 55.81 | 51.25 | 52.05 | 6.7% |

Hutton L | Eng | 79 | 138 | 15 | 4 | 6971 | 56.67 | 50.51 | 52.02 | 8.2% |

Lara B.C | Win | 131 | 232 | 6 | 2 | 11953 | 52.89 | 51.52 | 51.97 | 1.7% |

Sobers | Win | 93 | 160 | 21 | 4 | 8032 | 57.78 | 50.20 | 51.49 | 10.9% |

Tendulkar | Ind | 194 | 320 | 32 | 8 | 15645 | 54.32 | 48.89 | 50.14 | 7.7% |

Chappell | Aus | 87 | 151 | 19 | 8 | 7110 | 53.86 | 47.09 | 49.72 | 7.7% |

Nourse A.D | Saf | 34 | 62 | 7 | 2 | 2960 | 53.82 | 47.74 | 49.33 | 8.3% |

Mohd Yousuf | Pak | 90 | 156 | 12 | 3 | 7530 | 52.29 | 48.27 | 49.22 | 5.9% |

Sehwag V | Ind | 102 | 177 | 6 | 3 | 8559 | 50.05 | 48.36 | 49.19 | 1.7% |

Kallis J.H | Saf | 162 | 274 | 40 | 5 | 13128 | 56.10 | 47.91 | 48.80 | 13.0% |

Hayden M.L | Aus | 103 | 184 | 14 | 8 | 8626 | 50.74 | 46.88 | 49.01 | 3.4% |

Gavaskar | Ind | 125 | 214 | 16 | 4 | 10122 | 51.12 | 47.30 | 48.20 | 5.7% |

Younis Khan | Pak | 80 | 140 | 11 | 3 | 6580 | 51.01 | 47.00 | 48.03 | 5.8% |

Clarke M.J | Aus | 89 | 148 | 15 | 2 | 6989 | 52.55 | 47.22 | 47.87 | 8.9% |

Dravid R | Ind | 164 | 286 | 32 | 8 | 13288 | 52.31 | 46.46 | 47.80 | 8.6% |

Miandad | Pak | 124 | 189 | 21 | 4 | 8832 | 52.57 | 46.73 | 47.74 | 9.2% |

Richards | Win | 121 | 182 | 12 | 3 | 8540 | 50.24 | 46.92 | 47.71 | 5.0% |

Ponting R.T | Aus | 168 | 287 | 29 | 6 | 13378 | 51.85 | 46.61 | 47.61 | 8.2% |

Amla H.M | Saf | 68 | 118 | 10 | 0 | 5610 | 51.94 | 47.54 | 47.54 | 8.5% |

Hussey | Aus | 79 | 137 | 16 | 2 | 6235 | 51.53 | 45.51 | 46.19 | 10.4% |

Compton | Eng | 78 | 131 | 15 | 5 | 5807 | 50.06 | 44.33 | 46.09 | 7.9% |

Flower A | Zim | 63 | 112 | 19 | 5 | 4794 | 51.55 | 42.80 | 44.80 | 13.1% |

Chanderpaul | Win | 146 | 249 | 42 | 4 | 10696 | 51.67 | 42.96 | 43.66 | 15.5% |

Waugh S.R | Aus | 168 | 260 | 46 | 9 | 10927 | 51.06 | 42.03 | 43.53 | 14.7% |

Border A.R | Aus | 156 | 265 | 44 | 5 | 11174 | 50.56 | 42.17 | 42.98 | 15.0% |

It is obvious that the RpFI figures for batsmen with a high % of not outs would be much below the Batting average than those with low % of not outs. Bradman drops 10.3% & Kallis drops by 13.1%. Readers can note that the four middle-order batsmen who have already been discussed earlier possessing high % of not outs, viz., Andy Flower, Chanderpaul, Steve Waugh and Border have had the highest drops and occupy the bottom four positions in this table. The lowest drop has been for Lara and Sehwag, with 1.7%. In fact Sehwag, who was 34th in the batting average table moves up to 18th here. Even the high batting average of Kallis drops to below 50.

This is a simple and easy-to-understand method. Anyone can incorporate these figures by inspecting the not out innings of a batsman. I also have to accept that while this addresses the "not out" problem somewhat, the fundamental weakness of having an innings represented in the numerator in the form of runs and being ignored in the denominator exists. Albeit small innings only. At least the 400s and 365s have been taken care of.

However a more intuitive and stronger method is the one that tackles the "Runs" side of the formula to equate every batsman on a fair basis. In this method I will "extend" the not out innings to its natural conclusion or in other words - get the batsmen "out". Clearly this is a case of an extrapolation combining actual runs scored with virtual ones. Does it matter? Let us venture outside the normal realm of things and scrutinise what is in store.

The key question is "by how many runs" to extend these not out innings. When I started working on this idea a few years back, along with Dr.Ashwin Mahesh, we picked out the Batting average. In view of our own fundamental objection to this value we moved on to the RpI and subsequently to the "Average for dismissed innings". This is relatively easy to handle. Just multiply the number of not out innings by the "Average for dismissed innings", get the new total runs and divide by the total innings to derive the Extended Batting average (EBA). This can be added on to any existing table in a jiffy.

A few years back I noticed a flaw in this approach. Sehwag is batting these days like a village team slogger who has forgotten the basics. If, by any chance, he remains not out, however much it is unlikely, should we add nearly 50 runs to his innings? With all due respects to the great Tendulkar, a similar situation exists in his case too. That brings us to Michael Clarke and his purple patch. In the last 10 innings he has averaged over 80. It would be unfair to add only 45 runs or thereabouts.

Hence I decided that, despite the risk of adding complexity, I would add the Runs per innings for the last 10 innings played by the batsman. This is complex since this value has to be determined dynamically for each and every not out innings played by the batsman during his career. It requires tricky computer algorithms. Also note that I have used Runs per innings because we are considering only 10 innings and a couple of not out innings would distort the entire process. Why 10 innings instead of 10 Test matches? Well there have been times when a player played 3-5 Tests a year and it would have taken a few years to play 10 Tests. That is too long a period for a recent form connotation. In general, 10 innings is one long or two short series and would reflect the recent form quite accurately.

Let us peruse the revised figures. The table below lists the same 34 batsmen ordered by EBAvge.

Batsman | Team | Tests | Inns | Runs | Avge | OutAvge | ExtRuns | EBAvge | Chg % |
---|---|---|---|---|---|---|---|---|---|

Bradman D.G | Aus | 52 | 80 | 6996 | 99.94 | 83.83 | 7759 | 96.99 | 3.0% |

Sutcliffe H | Eng | 54 | 84 | 4555 | 60.73 | 54.64 | 5024 | 59.81 | 1.5% |

Pollock R.G | Saf | 23 | 41 | 2256 | 60.97 | 54.43 | 2394 | 58.39 | 4.2% |

EdeC Weekes | Win | 48 | 81 | 4455 | 58.62 | 54.88 | 4654 | 57.46 | 2.0% |

Hammond W.R | Eng | 85 | 140 | 7249 | 58.46 | 46.19 | 8018 | 57.27 | 2.0% |

Headley G.A | Win | 22 | 40 | 2190 | 60.83 | 45.61 | 2275 | 56.88 | 6.5% |

Barrington | Eng | 82 | 131 | 6806 | 58.67 | 50.37 | 7410 | 56.56 | 3.6% |

Hobbs J.B | Eng | 61 | 102 | 5410 | 56.95 | 53.34 | 5645 | 55.34 | 2.8% |

Hutton L | Eng | 79 | 138 | 6971 | 56.67 | 47.89 | 7629 | 55.28 | 2.5% |

Sangakkara | Slk | 115 | 196 | 10045 | 55.81 | 47.56 | 10792 | 55.06 | 2.3% |

Sobers | Win | 93 | 160 | 8032 | 57.78 | 44.06 | 8768 | 54.80 | 5.2% |

Kallis J.H | Saf | 162 | 274 | 13128 | 56.10 | 42.23 | 14905 | 54.40 | 3.0% |

Walcott C.L | Win | 44 | 74 | 3798 | 56.69 | 51.03 | 4001 | 54.07 | 4.6% |

Tendulkar | Ind | 194 | 320 | 15645 | 54.32 | 44.56 | 16888 | 52.77 | 2.8% |

Lara B.C | Win | 131 | 232 | 11953 | 52.89 | 49.76 | 12220 | 52.67 | 0.4% |

Mohd Yousuf | Pak | 90 | 156 | 7530 | 52.29 | 46.19 | 8009 | 51.34 | 1.8% |

Amla H.M | Saf | 68 | 118 | 5610 | 51.94 | 39.92 | 6042 | 51.20 | 1.4% |

Nourse A.D | Saf | 34 | 62 | 2960 | 53.82 | 47.49 | 3167 | 51.08 | 5.1% |

Clarke M.J | Aus | 89 | 148 | 6989 | 52.55 | 42.23 | 7559 | 51.07 | 2.8% |

Chappell | Aus | 87 | 151 | 7110 | 53.86 | 44.57 | 7706 | 51.03 | 5.3% |

Ponting R.T | Aus | 168 | 287 | 13378 | 51.85 | 45.15 | 14646 | 51.03 | 1.6% |

Dravid R | Ind | 164 | 286 | 13288 | 52.31 | 44.71 | 14505 | 50.72 | 3.1% |

Hayden M.L | Aus | 103 | 184 | 8626 | 50.74 | 47.68 | 9244 | 50.24 | 1.0% |

Sehwag V | Ind | 102 | 177 | 8559 | 50.05 | 47.96 | 8854 | 50.02 | 0.1% |

Younis Khan | Pak | 80 | 140 | 6580 | 51.01 | 44.24 | 6966 | 49.76 | 2.5% |

Miandad | Pak | 124 | 189 | 8832 | 52.57 | 41.97 | 9310 | 49.26 | 6.3% |

Chanderpaul | Win | 146 | 249 | 10696 | 51.67 | 34.49 | 12259 | 49.23 | 4.7% |

Hussey | Aus | 79 | 137 | 6235 | 51.53 | 42.50 | 6742 | 49.21 | 4.5% |

Gavaskar | Ind | 125 | 214 | 10122 | 51.12 | 44.14 | 10523 | 49.17 | 3.8% |

Compton | Eng | 78 | 131 | 5807 | 50.06 | 44.40 | 6302 | 48.11 | 3.9% |

Richards | Win | 121 | 182 | 8540 | 50.24 | 44.49 | 8753 | 48.09 | 4.3% |

Waugh S.R | Aus | 168 | 260 | 10927 | 51.06 | 35.47 | 12480 | 48.00 | 6.0% |

Flower A | Zim | 63 | 112 | 4794 | 51.55 | 35.43 | 5337 | 47.65 | 7.6% |

Border A.R | Aus | 156 | 265 | 11174 | 50.56 | 37.04 | 12397 | 46.78 | 7.5% |

Bradman's EBA is 97% of his Batting average, a drop of 3%. Headley drops 6.5%. Sobers drops by 5%. All the middle order stalwarts have drops exceeding 6%. Sehwag has the lowest drop: only 0.1%, virtually no change. Similarly Lara drops by only 0.4%. Amongst these top batsmen not even a single batsman has his EBA higher than his Batting average. This happens lower down the table. Mohsin Khan has the highest increase: 1.4%. The much-maligned Graeme Hick's EBA is 1.3% higher than his Batting average. Darren Ganga follows next with a 0.9% increase. A total of 11 batsmen have higher EBA values. Interested readers can study the Excel sheet for details. Saeed Anwar is the only batsman with more than 4000 runs under his belt and an EBA higher than Batting average.

This method is more elegant and intuitive with complexity of calculations being the sole deterrent. However the concept is very good and any cricket follower can implement the fixed value concept easily. The fixed value can be anything from a slew of values. And we can say with certainty that **every** innings is represented in the numerator and denominator. We have addressed that problem effectively.

Let us revisit the figures of Kallis and Lara.

Batsman Team T I No SNo No % Runs Avge RpI RpAI ExtRuns EBA %Avge Kallis J.H Saf 162 274 40 5 14.6 13128 56.10 47.91 48.80 14905 54.40 97.0% Lara B.C Win 131 232 6 2 2.6 11953 52.89 51.52 51.97 12220 52.67 99.6%Readers can see that Lara's average was nearly 4 fewer than Kallis. However his RpI and RpAI are nearly 3 runs higher. Significantly, the EBA, which is more accurate and a valid measure, is only less than 2 runs below Kallis. EBA probably reflects the central tendency most accurately.

Now for a revised graph. The two alternatives are pictorially represented occupying the space in the middle.

This is not a theoretical exercise - Two alternatives are presented to address a genuine problem. The Spl Not outs method is simple and easy to implement. The Extended Batting average method is more complex and would require a computer to incorporate recent form. However using the Out Bat average or Batting average or RpI or RpFI as the extension basis would be easier to implement. What is needed? Well, an influential organization such as ESPNcricinfo should study the suggestions and start implementing the revised averages: Of course along with the current measures.

To download/view the comprehensive Excel sheet containing the values for all the 264 batsmen who have crossed 2000 Test runs, please **CLICK HERE**.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

Comments have now been closed for this article

I have a simple way. Suppose in a series, 25 innings were played and 1,000 runs were scored. Take the average 1,000/25 = 40. Suppose in those 25 innings, 5 not outs were there. Then take the average again as 1,000/20 = 50. Take the ratio 50/40 = 1.25. Subtract 1. You get 0.25 which is equal to 25%. If a batsman is not out all 100% of the innings, add 25% to the runs he scored and calculate runs per match. If a batsman is not out 50% of the innings, add 50% of 25% i.e. 12.5% extra runs to the total runs scored and work out runs per match. Rate batsmen on 'runs per match' but not 'runs per innings'. After all, in a match runs count not runs per innings.

contd...

In this case, of course, the data points are few, however, over a large number of innings, the median will come closer to the mean in case of consistent performers.

For example, taking your first table, you can very easily see that Sehwag and Lara top the charts because they are "more consistent" (since their ave and RPI are closer). Also, the top batsmen, in terms of consistency and quality more or less remain the same barring Brian Lara who jumps into the top 10 and rightfully so. On the other hand, Steve Waugh goes down to the bottom of the list, again, rightly so.

Ananth, I have been playing with similar thoughts in my head for the last 10 years but never put it down on paper. While your post is very interesting, it is also very complex. On the other hand, I have a simple solution.

To understand the solution, let us try asking the very basis for an average. Isn't it to figure out how many runs can be expected from a player? In other words, say Tendulkar has an "average of 53" - it means that every time he walks in to bat, you can be assured (if not certain) of him contributing 53 runs give or take a few. So, the way the average should be measured is very simply by using the Arithmetic Mean of "Runs per inning"!

This would straightaway take care of outliers like someone scoring say, 162, 10, 7*, 0* & 25* have an average of and having a crazy average of 40, instead of a crazy "average" of 102! (contd..)

The suggested alternatives for stat 'average' are interesting and ideally, I'd like to see one of those, along with the traditinal average when looking at a players stats.

If i had to see just one figure though, i'd go with 'average' as is customary - and feel it is the fairest measure (though not perfect for all the reasons Ananth notes)

not outs count for something. compare Bradman's 299* to Crowe's 299 - there's an aura of unconquered-God-knows-what-he-could-have-gone-on-to about the not in such a case. Note also Lara's 400*.

on a more concrete level, i think taking not outs into account as is done respects WHAT HAPPENED, whereas some of the suggested alternatives that dabble in SPECULATION - on principle, I prefer the former.

Unofficially and subjectively, we can make allowances - e.g. for openers in general or for particular batsmen who tended to attack more while batting with the tail, etc. but officially, I think average remains the most satisfactory measure of a batsman

What is the average of an innings still being played ? Eg: Phil Huges in 3rd test at the time of this writing ? You can't determine the innings... Its indeterminate. Similarly a "not out" from statistical point of view is an "incomplete" innings - it overlaps multiple team's innings. Its not mathematically absurd. It just means that the numerator and denominator are not available yet. (In IT parlance, data is still being downloaded..!).

RpI is a different measure and Ave is different. Trying to combine the 2 is an arbitrary exercise - instead of taking 50% of "Average for dismissed innings" into account, I can take 25% or 50% and arrive at different results.

Finally, even though a cricket it is a game of stats.. and you can "group" players of similar caliber using stats, trying to find a single "stat to get a perfect "ranking" is an exercise in futility because when you combine different parameters, one can easily take slightly different routes and arrive at different rankings.

If Lara's overall average was the same as his home average - 59 ( Or 54 for Sehwag) noone would bother with the so-called not out problem.

Lara and Sehwag fans find it difficult to accept the facts : They were as good as anyone in home conditions. But ,with the exception of SL ( For both Lara and Sehwag. If any thing Sehwag has thrashed Murali even more), Lara was simply not as good as his contemporaries away .

Here's a simple explanation :

If Lara' and Sehwag's flashy shotmaking created great innings on the odd day , the very same led to their getting dismissed more than other batsmen. This simple fact seems to evade attention. So, on the odd occasion when things click we do not attribute it to the way Lara and Sehwag play. But if they get out more often than their contemporaries we must find ways to artificially increase their average.

The correct explanation for Lara's average is that he simply wasn't as good away . Even completely ignoring N.Os - At home is RPI is 56 , away it is 47. At home he was N.O 5 times in fewer innings than away. Again , the away games point to the reality.

Away Lara averages less than 50 in all Test playing nations except for Sri Lanka and Zimbabwe. Less than 40 in some. Lara's reputation is built largely on home performances with very few innings or series performances (Sri Lanka) of note compared to his home performances .

Aggregate all the scores in RpFI for the remaining not out innings and divide it by Runs per Completed Innings. That value can be added to Out Innings & Completed Not Outs to account for every innings.

@mikey76

You say that the only reason why Imran has a higher batting average than Botham is because of his higher number of not outs. You then imply that Botham was a better batsman because he scored more 100s and had a better conversion rate of 50s to100s.

A quick look at Imran's stats shows that he had 18 inns of between 50 & 99 and ended not out in 8 of them as opposed to Botham who was not out in only 2 of 22 inns between 50 & 99.

7 of those not out innings of between 50 & 99 for Imran were above 67.

If you look at all of the other innings where Imran scored at least 67, you will see there are 9 of them and he converted 6 into tons. If he managed to convert 67% of inns he was able to, don't you think he would have doen something similar with the not outs?

I would agrue that, if he'd had the chance, Imran would have scored as many tons as Botham (and in fewer inns.)

I think that over their whole careers, Imran WAS a better batsman than Botham.

This is, with respect, a painfully complicated way of addressing a non-existent problem. There are various reasons why a batting average is an imperfect measure of a player's ability, and an even more imperfect measure of his contribution; but the fact that "not outs" somehow distort the average is not one of them.

A batsman with a lot of "not outs" will not be flattered by his average. If anything, the opposite may be true. This is because the batsman with more not outs will have batted in more innings. And in each innings, he starts with nought and has to bat for some time before he gets his eye in and batting becomes much easier. The more "not outs", the more innings, the more times he bats when batting is at its hardest.

I have a simple way. Suppose in a series, 25 innings were played and 1,000 runs were scored. Take the average 1,000/25 = 40. Suppose in those 25 innings, 5 not outs were there. Then take the average again as 1,000/20 = 50. Take the ratio 50/40 = 1.25. Subtract 1. You get 0.25 which is equal to 25%. If a batsman is not out all 100% of the innings, add 25% to the runs he scored and calculate runs per match. If a batsman is not out 50% of the innings, add 50% of 25% i.e. 12.5% extra runs to the total runs scored and work out runs per match. Rate batsmen on 'runs per match' but not 'runs per innings'. After all, in a match runs count not runs per innings.

contd...

In this case, of course, the data points are few, however, over a large number of innings, the median will come closer to the mean in case of consistent performers.

For example, taking your first table, you can very easily see that Sehwag and Lara top the charts because they are "more consistent" (since their ave and RPI are closer). Also, the top batsmen, in terms of consistency and quality more or less remain the same barring Brian Lara who jumps into the top 10 and rightfully so. On the other hand, Steve Waugh goes down to the bottom of the list, again, rightly so.

Ananth, I have been playing with similar thoughts in my head for the last 10 years but never put it down on paper. While your post is very interesting, it is also very complex. On the other hand, I have a simple solution.

To understand the solution, let us try asking the very basis for an average. Isn't it to figure out how many runs can be expected from a player? In other words, say Tendulkar has an "average of 53" - it means that every time he walks in to bat, you can be assured (if not certain) of him contributing 53 runs give or take a few. So, the way the average should be measured is very simply by using the Arithmetic Mean of "Runs per inning"!

This would straightaway take care of outliers like someone scoring say, 162, 10, 7*, 0* & 25* have an average of and having a crazy average of 40, instead of a crazy "average" of 102! (contd..)

The suggested alternatives for stat 'average' are interesting and ideally, I'd like to see one of those, along with the traditinal average when looking at a players stats.

If i had to see just one figure though, i'd go with 'average' as is customary - and feel it is the fairest measure (though not perfect for all the reasons Ananth notes)

not outs count for something. compare Bradman's 299* to Crowe's 299 - there's an aura of unconquered-God-knows-what-he-could-have-gone-on-to about the not in such a case. Note also Lara's 400*.

on a more concrete level, i think taking not outs into account as is done respects WHAT HAPPENED, whereas some of the suggested alternatives that dabble in SPECULATION - on principle, I prefer the former.

Unofficially and subjectively, we can make allowances - e.g. for openers in general or for particular batsmen who tended to attack more while batting with the tail, etc. but officially, I think average remains the most satisfactory measure of a batsman

What is the average of an innings still being played ? Eg: Phil Huges in 3rd test at the time of this writing ? You can't determine the innings... Its indeterminate. Similarly a "not out" from statistical point of view is an "incomplete" innings - it overlaps multiple team's innings. Its not mathematically absurd. It just means that the numerator and denominator are not available yet. (In IT parlance, data is still being downloaded..!).

RpI is a different measure and Ave is different. Trying to combine the 2 is an arbitrary exercise - instead of taking 50% of "Average for dismissed innings" into account, I can take 25% or 50% and arrive at different results.

Finally, even though a cricket it is a game of stats.. and you can "group" players of similar caliber using stats, trying to find a single "stat to get a perfect "ranking" is an exercise in futility because when you combine different parameters, one can easily take slightly different routes and arrive at different rankings.

If Lara's overall average was the same as his home average - 59 ( Or 54 for Sehwag) noone would bother with the so-called not out problem.

Lara and Sehwag fans find it difficult to accept the facts : They were as good as anyone in home conditions. But ,with the exception of SL ( For both Lara and Sehwag. If any thing Sehwag has thrashed Murali even more), Lara was simply not as good as his contemporaries away .

Here's a simple explanation :

If Lara' and Sehwag's flashy shotmaking created great innings on the odd day , the very same led to their getting dismissed more than other batsmen. This simple fact seems to evade attention. So, on the odd occasion when things click we do not attribute it to the way Lara and Sehwag play. But if they get out more often than their contemporaries we must find ways to artificially increase their average.

The correct explanation for Lara's average is that he simply wasn't as good away . Even completely ignoring N.Os - At home is RPI is 56 , away it is 47. At home he was N.O 5 times in fewer innings than away. Again , the away games point to the reality.

Away Lara averages less than 50 in all Test playing nations except for Sri Lanka and Zimbabwe. Less than 40 in some. Lara's reputation is built largely on home performances with very few innings or series performances (Sri Lanka) of note compared to his home performances .

Aggregate all the scores in RpFI for the remaining not out innings and divide it by Runs per Completed Innings. That value can be added to Out Innings & Completed Not Outs to account for every innings.

@mikey76

You say that the only reason why Imran has a higher batting average than Botham is because of his higher number of not outs. You then imply that Botham was a better batsman because he scored more 100s and had a better conversion rate of 50s to100s.

A quick look at Imran's stats shows that he had 18 inns of between 50 & 99 and ended not out in 8 of them as opposed to Botham who was not out in only 2 of 22 inns between 50 & 99.

7 of those not out innings of between 50 & 99 for Imran were above 67.

If you look at all of the other innings where Imran scored at least 67, you will see there are 9 of them and he converted 6 into tons. If he managed to convert 67% of inns he was able to, don't you think he would have doen something similar with the not outs?

I would agrue that, if he'd had the chance, Imran would have scored as many tons as Botham (and in fewer inns.)

I think that over their whole careers, Imran WAS a better batsman than Botham.

This is, with respect, a painfully complicated way of addressing a non-existent problem. There are various reasons why a batting average is an imperfect measure of a player's ability, and an even more imperfect measure of his contribution; but the fact that "not outs" somehow distort the average is not one of them.

A batsman with a lot of "not outs" will not be flattered by his average. If anything, the opposite may be true. This is because the batsman with more not outs will have batted in more innings. And in each innings, he starts with nought and has to bat for some time before he gets his eye in and batting becomes much easier. The more "not outs", the more innings, the more times he bats when batting is at its hardest.

@agnimile

Your proposed redfinition of batting average (Strike rate * Balls per Innings) is simply the "Runs per Innings" measure that Ananth had included in his article - it's not a new measure at all.

In the same way, Batting Average can be rewritten as (Strike Rate * Balls per Dismissal)

And in test cricket where wickets are the most important resource, measuring a batsman on how many runs he scored before he is dismissed, is the single best metric for evaluating a batsman (despite the minor flaws relating to the lost opportunity it can cost a "set" batsman.)

In T20 cricket, where balls are a much more important resource, then Strike Rate is a better measure for evaluating batsmen.

I think the main need for a "new" metric that somehow combines the resources of wickets & balls in a more sophisticated way than either batting average or strike rate do, is in ODI cricket, where the relative importance of wickets & balls is much more evenly balanced.

Just had a brilliant idea - why not take ratio of batting averages for last 10 innings to the runs scored in current innings for all the batsmen who got out in the same innings (or same day, or same session.... or all three with different weight for all three) and apply that ratio to the average for the batsman in question!

If it aint broke, dont fix it ! You have forgotten that age old axiom ! Much heat, very little light. As someone else pointed out here, a n.o , when the batsman scores greater than 10 runs ( but less than 20), is an opportunity lost, as most of the names above would score in excess of their batting average when they get to more than 10.

I agree this is a non-problem. Gavaskar (as an example) scored 10122 runs and was dismissed 198 times. Divide the two to get an average of 51.12. Simple and fair. If you are desperate to manipulate averages how about something to compensate the players of old who played with longer boundaries pre 1980?

I understand the fundamental principle that the writer is trying to achieve. The writer's point that the numerator is increased by the runs scored, but the denominator is not reflective of the extra inning played (in the case of not outs), is well made and I tend to agree with him here.

However, my belief is that statistics should be reflective of the game "as-is", and not incorporate elements of extrapolation, which goes against this very principle.

As an alternative, I would like to propose that the batting average be a composite statistic combining the elements of strike rate and balls faced per inning (BPI), that is Batting average = Strike Rate * BPI, where BPI =Total Balls faced / Total innings played

In the era before the number of balls faced was tracked, but time occupied at the crease was tracked, this calculation would change slightly Batting average = Strike Rate * Time per Inning (TPI), where Strike Rate = Runs scored / Time occupied at the crease

this is perhap's the stupidest, most pointless thing I have ever read. It makes no sense. what does the writer mean when he states, "Lara played an epic, scoring 400 runs over 13 hours but this innings, as far as determining the batting average is concerned, does not exist" of course the 400 not out exists - it increases his overall average considerably. ridiculous.

How about this one - corrected averages for bowlers considering incomplete spells.

Consider this : A bowler with strike rate of 60 (i.e. a wicket per 10 overs), and has bowled 28 overs and taken two wickets, now statistically he is due for another wicket in next two overs, but the innings comes to conclusion due to declaration or someone else taking a wicket. So our bowler has missed out on an imaginary wicket, thus inflating the bowling average.

I would love to see you addressing this issue as well. I am not sure what problem it will solve, but neither have I understood what problem this particular article have addressed.

Much ado about nothing;think there's little sense in fingering a time tested model;as long as the broader picture of consistency or otherwise is conveyed by the numbers,dont't think there is need for normalisation procedures to be invented all over again-imagine having corrections based on all the variables: viz Era,position in batting order,dismissals (legal or otherwise),kind of bowling oppposition (australia~bangladesh),pitch condition and state of innings (first or fourth);think no body complained

I think it stands out a huge amount, the difference between average and RPI !

I mean just look the lists ! Suddenly sensible things happen to the lists when you prefer RPI.

Weekes joins Pollock right in the clouds - quite rightly.

Steve Waugh goes lower than his brother - quite rightly.

Hutton / Barrington - all the horrible players with good numbers fall, while at the same time most of the great players rise.

Miraculously - Hobbs becomes nearly as good as Sutcliffe ROFL

Nourse goes screaming past Kallis ! (about time someone realised whi is Sth Africa's best ever right hander)

Everything starts to make a bit of sense.

If average isn't a flawed measure - gee I'm proud of that year I averaged 790 (being our teams 4th best batsman). *shakes head

Ridiculously complex solution to a non-problem.

As a couple of other people have commented, a not out is just as likely to have a negative impact on batting average as a positive one, if not more so.

Take Lara for example. Of his 6 not outs, the lowest one was 13. If you take every innings in which Lara scored 13 or more, you find that his average in those innings was 74.79 - in other words, when he reached 13, on average he would go on to score another 61.79 runs. Lara's overall average was 52.89, so being left not out on 13 "cost" him the opportunity to increase his batting average.

You can do this for all of his not outs (obviously exlcuding the 400) and see that they all cost him the opportunity of increasing his average. Now, because he had so few not outs, this didn't impact Lara very much, probably only a couple of percentage points but other players with lots of not outs, particularly those who convert scores into big tons, will be impacted much more.

Honestly, this is an unnecessary, overly complex solution to a problem that never existed. It provides no new information that gives any additional insight, just needlessly complicates what should be a very simple measure.

The batting average computation certainly does not violate a basic mathematical dictum. "Runs" and "Dismissals" are two fundamentally different measurements. These two measures are linked by a self evident risk-return relationship, which can be plotted on a simple yield curve. Any analysis of batsmen and bowlers must start with three measures: "Balls", "Runs" and "Dismissals". The match situation is also relevant. In Test cricket, the match situation is less significant, because the batsman is almost always trying to maximise the number of Runs per Dismissal. In T20 the match situation is also less relevant, because the batman is almost always trying to maximise the number of Runs per Ball. I hope this helps to explain why in Test cricket, a batsman's career batting average is by far the most important qualitative statistical measure and why there is no need to adjust a batman's career batting average for "not outs".

Here one for the next article Ananth. How about an analysis after removing run-outs as they are not a true way of a bowler getting a wicket - many a time it may not even be the batter's fault..

Solved - the great problem that did not exist. For all the complicated number crunching, this does not make any significant difference- I mean all the options . The biggest difference is 7.6 %. Nothing earth shattering here. The average is defined and understood as runs scored per OUT. If we start fiddling with that definition, why don't we then calculate an Average excluding duck outs, that will change the list a little - another 5%. So we will get the best players once they get started. Then we start excluding innings between 1 and 5 or dynamically contributing between 1% and 5% of a batsman's average - to be fair to the tail-enders!! I will take an accountant's rule to this one rather than the mathematician . This is declared closed based on the golden accounting principle of "materiality".

I always get to see Indians writing about how unfortunate Tendulkar is in not being able to the weak Indian bowling and how that has resulted in a lower batting average than that of say,Kallis.

The fact is that the Indian bowling is weak only in overseas conditions and not in home conditions.Just ask the current Australian team ! Kumble,Harbhajan and Ashwin in home conditions are just as deadly as Steyn and McGrath in their conditions and to that extent there is no advantage that Ponting or Kallis has over Tendulkar.

Andrew Ward sums it up perfectly in my opinion - First, claim there is a problem and then claim to have solved it!

Are you telling me, it is Steve Waugh's fault he batted in the middle order and never got a chance to score 400? Or Dhoni's that he actually finished so many ODIs and remained not out?

I think the world can live without needing a computer to understand how a batsman has performed. Peace.

You can't be so scientific about it because intangibles like state of the match, state of a pitch, quality of an attack, personal grouses, a rush of blood on reaching a landmark, 9,10, Jack coming next so I should play some shots, looking for quick runs to declare, I better drop anchor or we could be dismissed in the session, I went drinking last night and I am hung over, I'm playing for records now and well past my best so I'm getting out more now, I couldn't be bothered by the match situation, I'm here to entertain, it's either you or me today....one of us will win.....the point is, so many things go through a batsman's mind that could affect whether a batsman is not out or not that it is an exercise in futility. Just watch and enjoy and leave conjecture to the pub or science for intellectual debate...we have the stats and people will think whatever they want based on what they have seen and enjoyed.

@ Pawan Mathur - I sure can't disagree with that ! (Murray)

Waves to Kimbers ridiculously overcrowded cordon from a sprinkler hole near deep fine leg !

I wonder why no-one who played for Australia before 1970 will willingly talk to Steve Waugh ? I wonder why each and every one of them, thinks batting for not outs, by cynically un farming strike is pathetic ? ( Poor Stan McCabe's average). Could it possibly be they understand cricket, and aren't impressed with batting averages ?

Nice attempt Ananth - not sure there aren't too many variables to find a proper solution ?

There is no fundamental flaw here. Average as is being calculated now makes perfect sense. Unlike other "It figures" nothing has been achieved by this exercise. Considering the effort put into to compile this, only 1 thing comes to my mind - "What an unproductive way to waste time of such an intelligent person"

A flawed argument here which seems to fail to understand what batting averages actually tell us. A batting average tells you the price of one's wicket, and how many runs he makes before being dismissed on average - plain and simple. While you may have bowlers with batting averages in the 30's as a consequence, that is a tribute to them for standing firm at the crease. Trying to statistically determine a point at which an innings becomes an innings is terribly flawed, even when one considers the arguments against the current system. Some things are meant to be left alone. As for the argument about innings not having taken place... This is an argument hardly worth refuting. It took place... The only thing that did not take place was a dismissal, and so a batsman should not be punished statistically just because the team has to come first in the case of a declaration. This is a gross over-analysis over something which is really quite a trivial matter. There is no problem here.

Hey drinks.break, how do you that only few can rival Anantha's knowledge of Statistics. Have read his previous articles. He tries to overly complicate things for no gain and then lashes out at people who comment against him.

I have a PHD in Statistics and I find this analysis completely incorrect and useless.

Completely agree with Andrew Ward. Nothing to see here - move along.

@ Ryan Stephen. Yes its simplistic and we can argue the validity of assuming exponential distribution to cricket scores (the little i looked it seemed alright tb h). Perhaps one could go a bit further and use a gamma distribution but with two parameters it becomes far harder to explain to the average Joe. Also bear go mind that an average is a summary statistic. It should be simple. If we wanted a truly rigorous model, we could try quantify every variable. But consider SRT. About 200 tests, probably about 100 team mates. Probably over 500 opponents and 50 grounds and 1000 days of weather conditions. Parameters from all of that is insane. Average focuses only on the internal measure. What the batsman can do and control.

This whole notion of devising a single statistic for assessing recent form is just silly. Anyone who cares -- selectors, coaches, spectators, viewers, use far more than batting average to judge the form of a player. As others have said, the quality of a batsman's innings has more to do with the playing conditions and match situation than it does with whether they're averaging a score of 30 or 40 over the last 10 innings.

Take Joe Root, for example. He has a very high ODI average because he's been not out a lot so far in his fledgling career, but whether you believe his batting average reflects the quality of his play, or some other runs per innings calculation, it's clear that the reason why so many people are predicting a great future for the young man is nothing to do with his stats, and everything to do with having seen his poise, and the way he has tailored each innings perfectly to the match situation he found himself in.

Hi Ananth, got your email regarding the problems with this site. The comments box is a pain in the rear as well because it doesn't line up with what you type. So I will keep this brief as I can't keep track of any spelling errors. IMO - the not out issue is a bit more difficult to address than the work you have done. IMO - Not Outs need to be addressed in the match scenario they occurred. If Clarke is Not Out 60 in an innings but has hit the winning runs, it is mission complete. I am not sure whether you are adding the runs per 10-innings on top of a not out score or just replacing?

Ananth, Any particular reason for the change in article overall layout settings. I sincerely believe the previous layouts were far better with some marvelous comments. Very sad to read comments such as "pointless exercise", "the writer does not know basic of statistics" without even bothering to know how much you have written or even the context in which the article has come out. I think this changed layout is one of the reasons why the top commentators like Murray Archer, bol, Gerry the merry, Bcg vikram (any a few more) are staying away from commenting even when the article is in its second day

The batting average has always been, should always be, and will always be a simple division of runs scored by times out. Charles Davis has argued persuasively that when an individual innings is terminated by a declaration, the average of that batsman is most likely reduced, as he would have been more than likely to bat on for more further runs than his average had the declaration not taken place. What a ridiculous argument that the innings is deemed to have not taken place! It is a fall of a batsman's wicket that has not taken place.

So if a batsman scores a 100 not out followed by a 30, and 20 his average will be 50 under your method. Yet if he scores 90 (out), then a 30, and a 33, his average is a real 51. That's completely silly. And worse, if he scores a 90 (out), and then a 20 not out, and then a zero, his average is 55!!! How can a 90, 20 not out, and a zero be better than a 100 not out, 30 and 20. In each innings the latter batsman has outscored (by runs) the former, and each batsman has been out twice. See my previous comment about incorporating strike rates and the implied strength of the opposition.

Ther is nothing wrong with the batting average. Now here is another measure that can be useful. Batting avg times strike rate times 2 of the last two years. Divided by 100 so that strike rate is per ball, not per 100 balls. So a man with a batting avg of 50 and strike rate of 60 per 100 balls would have a score of 60. Plus any time the batsman scores more than the team batting's batting average FOR THAT innings, he gets 4 bonus points. When more than 1.5 times, 7 points and when more than twice, 10 points. What that does is reward batsmen with a higher strike rate, and reward batsmen when the going is tough (they are scoring when their team mates are failing, i.e the bowling is good or the pitch is tough). Now that measure would be useful. And Lara woudl go up relatively, not down. Check for yourself.

Why don't we take it 5 steps further and count elegant 4s as a 5 and an edge through slips as 3 runs. Each shot could be given a judges score so, just like ski jumping, the style factor comes into play. Also have a loading factor based on the bowling average of the bowler at the time when out, or not out for that matter. The bowling average of course has to be recalculated based on lucky wickets as compared to genuine good bowling multiplied by a "played and missed" coefficient. Now can someone please put this into an algorithm so I can confirm that the current Australian batting line up is no good. Or I could just turn the TV on.

In cricket, statistics is a big anomoly. Pak never has to face saeed Ajmal and Kallis and Amla never have to face Steyn and company. Openers play a diffrent set of conditions compared to second or third wicket. Lara has to play lone innings too often compared to Tendulkar who had Dravid and Ganguly for support. Mcgrath had warne so there was never a let-up in Aussies attack. SA bowlers get diffrent set of fielders compared to Indians and you really cant compare their strike rates. Balls faced does not really show the time spent in crease (india can push 18 overs per hour with spin while Aussies struggle to get their quota of 13 overs per hour). Simply you cannot compare. with just 8 quality teams, comparision excludes ones own team, Tendulkar never got a chance to play weak Indian attack, or Indian attack never had to bowl to him so they dont look as weak as bangaldeshis.

Hey, Jayan Gopinathan and Andrew Billard, have you read any of Anantha's other articles? You may disagree with his presuppositions for this article (and I do), but there is no question that he almost invariably produces statistical work that cleverly illuminates some otherwise uncharted aspect of cricket, and he does it with a knowledge of sport and statistics that few can rival.

This article may be the exception that proves the rule, but whatever the case, lay off the man and play the ball - stick to the issue, in other words!

I totally agree with Andrew Ward's comments above. This article is useless and is trying to solve a problem that doesn't exist. The writer doesn't understand the basic principles of Mathematics and Statistics.

Hi Ananth, interesting piece as always. Wonder if there is an issue with using the last 10 innings - does that not give undue preference to recent performances when trying to determine an overall, career-length figure?

And also interested to see your response to Andrew Ward's comments!

Can someone please explain to me the initial problem which needs a solution? I see all the solutions but I can't understand the problem. Batting average gives an average of runs per dismissal. There is nothing biased about it. If you end on 400* then why should you have an innings added to the denominator? If a batsman ends on 50* why should he have an innings added? He could have gone on to score a double century, perhaps unbeaten.

This article appears to be completely missing the point in my eyes.

If we assume the use of the batting average introduces a bias then this should show up with (for instance) the average score for all innings at number 5 being higher than the average score for all innings by an opening batsmen. I don't know what the average average is for each position but could I suggest that any new rating method can prove it's accuracy by seeing if the average rating for each of the top 5 batting positions comes out more or less equal. Any lower than 5 and you start to get wicketkeepers and allrounders into the mix.

To put it another way, if the average is a flawed measure the average average for the top 5 batting positions should have a degree of variation matching our assumptions.

This is incredibly pointless exercise. You have created another measure without pointing out any benefits of your new measure. If you can show your new technique can predict something better than average or serve some purpose better than the conventional average, then it would make sense. Otherwise, it's just pointless. There are infinitely many ways to calculate a "batting average" - without showing some benefit of one over the other, it's just number shuffling.

I think one statistic that could be valuable is 'average in results'. At least you know each innings was meaningful. There are many important innings (and wickets) in draws too, but a lot of mega scores have been tactically worthless, if not deleterious. Why didn't any Victorian defend Brad Hodge's double ton? Because it cost Australia a possible win. Same for many Subcontinental and Windies tallies.

Eccelente! Perfecto! Magnifico! Bellissimo! Awesome! Amazing! Others might have different methods but this is significantly better than not doing anything.

The fundamental problem with batting averages is that they vary from epoch to epoch as different laws, equipment, conditions, and techniques prevail. This methodology does nothing to address this flaw, leaving the interpretation and comparison of extended batting averages as at best a flawed statistical measure. Arguably less flawed, but flawed nonetheless.

Its simple....a batting average is the number of runs scored per dismissal. How is that unfair to top order batsman? Kallis has a higher average because he is more consistent than Lara. And its not a larger problem in ODI...you are creating a problem which does not exist. Glen McGrath has a ODI avergae of 3.83....despite the high % of not outs.

This new batting average format is reducing the averages of all-rounders/lower-order batsmen to compare them to the likes of opening-batsmen, and to display the difference in quality of batting between them. A lower-order ODI batsmen who may average near 40 means he is successful at doing his role at the end of the innings, but does not mean hes equal or better than an opening-batsmen who may average the same. The new batting average format is to compare all batsmen the same as if they were all doing the same batting role within the team. It may help people who don't know the game to determine who is a better quality batsmen, but M.Bevan was know as a great finisher for ODI and his current batting average justifies it for the role he played.

I actually thought you handled this quite well in your last article by using runs per test.

Major problem - Length of career and accumulation of runs. Someone who maintains a batting average of 50 over a 200 innings career is significantly better than someone who averaged 50 over 75 innings. The longer a career the more bowlers you face, the more grounds you bat on and the more variety you must face to prove yourself. Add to this the inevitable change in form one will experience over a longer career and number of innings MUST be considered.>>>>>>>>>>>>>>Kallis consistently bats at 56, but has remained at this level over a much longer time than Lara so there would have to be some statistical provision for this as it is inarguably a real value. >>>>Length of career proves a lot about the abilities of a batsmen and to repeat a point, it shows a player who has faced far more challenges than those of shorter careers.

The simple batting average is interesting, but difficult to form a truly fine judgement upon. In these days of instant statistics, far better to look as well at two further measures of success - partnerships and percentage of innings scored. As we all know, 100 in a total of 500 for three dec is worth considerably less than 50 is a score of 175 all out. Add to that data on partnerships, and we get a better idea of batsman's ability. Batting is not just about the individual's scoring runs - what price Jimmy and Monty at Cardiff? How important is it to be able to give the strike to the in-form player, rather than face endless dot balls? Batting, like the game itself, is too complex an activity to be compressed into one figure, impressive as it may, at first, appear ...

Anantha, what is it you are actually trying to find out here?

Because if it is the number of runs a batsman would average, had he not been not out, then you are going in the wrong direction. A batsman is most vulnerable on zero. A not-out reduces, not increases the average. Batsman down the order have other advantages - no new ball, tired bowlers - which more than offset this, but that is a different problem. Neither of the proposed methods have any grounding in how a batsman plays when they are at the crease and well set.

Conversely, if you wanted to know how many runs a batsman brings to a team, adjusting for team-mates and batting position, then you really need to consider strike-rate. Because that is what determines how long a player has to score, then normalise for a standard line-up.

But it isn't clear what this is trying to achieve?

I think this analysis is a huge waste of time, yours and ours. Batting average calculates the runs a batsman scores before being dismissed, no complications, FULL STOP. May be try and come up with a solution for the BCCI to accept the DRS, that may be better received by most people here

Interesting ideas, although I wonder whether a simple point has been missed with these methods of estimating what a not out innings would have been completed at. The current not out calc effectively allows a batsman to finish his innings in the next match (or second innings of the same match), so removing speculation. Eg, a batsman finishing 20 not out comes back in the next match and "continues" his innings until he is out, say for 25, giving a completed total innings of 45 (which counts as 1 completed innings in the average). It's not really much different to a batsman being not out at tea and coming back out after a nice slice of cake to finish his innings, so why should he end up with a different average just because the amount of time with the pads off (20 mins or a week) differs?

@Nicholas J De Klerk, that model has flaws too, it is simplistic for actuarial purposes, refer to my previous comment as to why it is flawed.

What a joke! Fundamentally flawed. Of course the not outs contribute to the average, as other readers have mentioned.

The only other stat you could have is some kind of variance-meter to show how close a batsmens scores were to his average. In the example you pointed out about Lara's 3 ducks and 400+ for example, would contribute to a high variance to that batsman.

The rest is waffle as far as i'm concerned.

The problem with the current system is that it assumes that if you are not out, then you would score your average (say 50) on top of what you have scored already, so a score of 50 not out has a similar effect to your average as scoring 50 + 50 = 100.

For example: Score 100 and a duck, your average is 50. (100 / 2 = 50) Score 50 not out and a duck, your average is also 50. (50 / 1 = 50)

This is the problem that the author is talking about, and it is caused by a method that assumes that if your average at the moment is 50, then 30 Not out becomes worth an 80 in average terms, as it generally assumes that you would score your average after being left not out.

This is why the best system is one that takes the average of your past scores given the fact that you have already reached 50* and uses this score to calculate your projected (expected) score. Giving a much more accurate picture of what you would have done had you been given the opportunity to keep batting until dismissed.

In this scenario our projected "dismissed" score could make use of data from other similar players. For example, if a player scores 50* on debut (First match), we take the average score of all players on debut who scored >= 50, then use this average as a replacement innings total. We could refine this for conditions and opposition by taking average of all innings dismissed on debut beyond 50 in the sub-continent against Australia (For example), if the 50 not out was scored by an Indian batsman on debut at home against Aus. The right coding could automatically add a filter of "sub-continent pitch" or "Australian bowlers" to the data query which checks all past innings. So an intricate program is written once, but from then onwards the complex process will be automated, so not much continuous maintenance of the program will be necessary.

Add me to those who believe the entire premise of this article is fatally flawed. To say that Lara's 400* "as far as determining the batting average is concerned, does not exist" is flat out wrong.

Lara's average for that Test Series was 83.33. If that innings did not exist his average would have been 16.67. So in what way does his 400* not count towards his batting average?

For first-class matches, the batting average is a perfectly acceptable statistic and has been in no way rendered obsolete by the modern game.

As the article says, statisticians have solved some of the shortcomings of the batting average in one day games (esp. T20s) by publishing a batsman's strike rate, but that does not apply to the longer form of the game. Only very rarely does an anomalous result occur, and they are easily dealt with in a variety of different ways, like excluding players with fewer than 10 innings (whose averages could be unrepresentative no matter what type of average you use).

You are trying to in-fact change the definition of the average and what it is supposed to represent.

Everybody knows that statistics such as these have limitations - but at the least the average itself if logically consistent and isn't applying to any silly constructed points. The fact that the difference in one run could change a 'real not out' to a 'not real not out' is so absurd - you are in fact adding a runs aspect to the denominator when they already exist in the numerator.

Typical innings culminate in an embarrassing dismissal; bails flying all over the place, a little tickle off the bat or an awkward strike on the pad. This is what I think of a 'completed innings'.

If the batsman can avoid this and keep his innings alive that should be recognized in this measure. Because what ends his innings may be factors beyond his control. Running out of partners, declarations, achieving the target score.

I like what you've tried but the average is pure - yours is a constructed.

We can modify the method I mentioned previously yet further: You could take last five dismissed innings above the not out score on sub-continent pitches if the innings was played in sub-continent. Or last five in green tops if innings is played in SA/Eng. You could restrict it to last five against the current opposition team. I guess there are endless modification that could enhance the accuracy of our projected not out scores, and hence, by adding more and more complexity, we will asymptotically tend towards the perfect projected batting average as we continuously add more prediction factors to the model. The only bottleneck now is the shortage of data for certain players early in their career. Our projections could then make use of data from other similar players. For example, if a player scores 50* on debut, what extended "dismissed" score would we use? Solution in next comment.

There is VERY good statistical reasoning for why Not Outs are treated as they are in Cricket. It works on EXACTLY the same way mortality statistics are calculated, or medical trials are done. Say you give 100 patients a medicine and observe them to see if they die or not. You cannot possibly watch all of them until they eventually die, as that could take years. You rather observe for say 2 years. Those that die lived for however long they lived, being 2 months, a year or 23 months. Those that survived until two years, all you know is that they lived AT LEAST 2 years. They could die the next day, they could live another 60 years, You do not know. The exact same statistical process is applied to cricket. when the teams innings ceases, either by scoring the required total, declaration or all out, the not out batsman scored AT LEAST as many runs as they have for their dismissal, which would have come at some point in the future. It is all based on a Proportional Hazards Model.

A residual problem with the Expected Batting Average method is that if the last 10 innings by a no. 6 batsmen were 10*, 7*, 13*, etc. then we would be adding an average of about 10 runs as the extension to the not out score, which does not reflect his recent form. For this reason we can't just take runs per innings as our extension to the not out score, it leaves us with the same inaccuracy encountered in the original RPI approach, albeit on a lesser scale. Perhaps a better method would be to take the last 5 dismissed innings that were above the current not out score, and use this as our extended score, or "dismissed" score. This would incorporate recent form to an extent, but most importantly it would project how much runs this batsmen is expected to score GIVEN that he has reached the current not out score. It simply takes the last 5 dismissed scores above current not out score, and divides total by 5, and replaces the not out score with this average as a projected "dismissed" score

Classical example of of data manipulation and confusion!!!!!!!!!

In a very simple and layman word - define it this way

Average Batting Score is the number of runs scored for each wicket lost. It's not Average runs Scored in each Match/Inning.

Lets not create the bracket of less significant, average significant, highly significant ...tch

Average of Bewan, Hussey , Dhoni are high because their has been more not outs against their stats?????????? is this really the reason? no ways!! or shall we say bowlers could not get them out them even in the death over where AVERAGE wicket rate is higher. These player have to come and build a new innings all over again unlike the stats which suggest of higher conversion rates from 50-100 of successful player and high percentage of low scores. Take it this way what would be the average score of Sehwag, Sachin, Ponting, Lara who have high percentage of low score. I am not comparing the players but drawing other complex dimension... keep it simple

The method is flawed because it does not give the batsman credit for what he has *already achieved* in his not out innings. A 0 n.o and a 400 n.o counts the same! We can speculate what more the batsman could have done, but there's no need to guess what he has already done. The attempt to incorporate recent form is neither here nor there. The very notion of a career average (the final output of this exercise) suppresses form fluctuations over time.

A simple solution is to apply something like the Duckworth-Lewis principle. How to count, say, a 36 n.o? Look at all completed innings where the batsman crossed 36, and compute the average of those innings. Apply that (conditional) average score to the 36 n.o, and "complete" the innings this way. Statistically speaking, you are extrapolating a score based on something like the hazard rate. So a 100 n.o by Sehwag (who tends to score big hundreds) will earn more credit than a 100 n.o by practically any other batsman.

Correct me if I misunderstand what a batting average is; It is the average number of runs scored per dismissal - this makes no assumptions about the number of runs a batsman will make in an innings, or match. Not outs are a part of this, and allow for batsmen to accumulate runs before losing their wickets.

RPI correctly quantify the number of runs a batsman will make in an innings, and for this method, normalising the not out's would be a valid excercise. Then, taking the form factor (i.e. the last 10 innings) into account, we could justify an incomplete / completed innings, even if the batsman has not been dismissed. This should provide a more accurated prediction of what a batsman is likely to score in an innings / match.

Otherwise, an interesting exposition, Anantha. Thanks

When a batsman finishes not out, he might have been 80% of the way towards the score at which he would have got out, he might have beeen 30% of the way there... In fact, he might have been anywhere between 1% and 99% of the way there. So why not say that on average, he would have been 50% of the way there, so all not outs can be counted as half an innings? This also has the advantage of being very simple and understandable!

It seems unreasonable that you add 50 to an 50 n.o innings as well as a 400 n.o. innings to get the extended average. What if we fit a probability curve to each batsman's recorded scores (n.o. or out) and then find the average mean/median etc of that distribution? Or is the data too noisy to do such a calculation?

The statement "the batting average computation violates a basic mathematical dictum" I think speaks volumes of the perception of those who went about creating these laborious alternatives. Sure, it looks great for a mathematician, but please, this is sport. It is not about mathematical fairness, it is about being fair to the sportsman and his skill. There is nothing wrong at all with averages and the not outs are put in it. If a batsman averages say 650 in a certain 3 match test series, all of the game's followers will be pretty aware of how it got that high. It is a reward of sorts, for the batsman who not only piled on runs, but was "so good that he was rarely or never dismissed by opposition". In fact, think it is one of the better and more 'human' features of batting stats- rewarding a batsman who puts a price on his wicket, even when all his team needs is say 30 runs in fourth innings.

Interesting methods. I would try one more variant for the 'extended innings' method which is to use a random score from the distribution of completed scores (outs) for the individual instead of the rpi from previous 10 innings. This might be more probabilistic than deterministic but this might retain the uncertainty that is inherent in the sport and also reflect the batsman's general propensity. The drawback is that it ignores the form the batsman was in when he was not out. So one more alteration would be to choose a random score from previous 'x' completed scores. This would be better than rpi as one huge score followed by poor outings would skew the extended runs more than it should.

This seems like a fairly futile exercise. It is well understood that average is not a good point of comparison, but not for the reasons addressed. Of far more significance is the ground (Asian grounds tend to support higher aggregates), level of the bowling opposition (eg. Aussie greats never had to face Warne and McGrath) date - there have been far more huge scores in the last decade with better quality pitches, etc.

Ironically, these points are taken into account for the player ratings, which are often derided and ignored. The problem with the ratings is that the algorithm's are secret - so people don't trust them.

We'd be better off making the ratings more open and also adjusting them to provide career ratings rather than just current ratings.

there's nothing wrong with the current 'average' calculation for time-limited 2 innings cricket - it's nothing to do with numbers of innings, but number of times out. Simple, runs scored divided by times dismissed. It's right to be sceptical about the value of such an average in limited overs cricket, where opportunities to score differ so much depending on where you are in the batting order

It is true that "not outs" can greatly skew averages. People always say that Imran Khan was a better batsman than I.T Botham simply because Imran averages more. Yet when you look at their careers Botham scored 14 test hundreds to Imran's 6. Botham also had a much better conversion rate of 100/50. The simple reason why Imrans test average is higher is because of 25 not outs in his career. Botham, a far more carefree player had just 6 in a 161 innings.

There is no flaw in average calculation. Its simple. It gives an advantage if you are not dismissed, which is fair. I think one has to understand the game.

Total Runs scored divided by number of times dismissed.

I have issues with just looking at Average of a batsmen, instead of looking of other statistical figures like may be standard deviation or things like that to determine how consistent a batsman is. For e.g. I think a batsmen who scores 8 50's in eight innings should be better than a batsmen who scores a 400, and 7 ducks.

I think this is an interesting idea as you are rightly addressing the issue of anomaly in numerator and denominator whilst giving importance to current form. I think it's a good effort and more work should be done in this regard. However one concern I have is your methods RpFI & EB Average have got a fair amount of prediction (or assumption) in it. On the other hand the 'extreme' methods like RPI or the current system has no prediction involved in it. I think there is a case for saying that predicting what the batsman would do is an issue. Lara, for instance went through a string of low scores before that epic 400. So, say if was left not out at 4 in that innings, your method would add a paltry sum and calculate his EB Average - as he was 'out of form' coming into that monstrous 400! Predictive methods should not replace the current method or RPI. I guess it can be given a place along side non predictive methods. Cricket nerds won't mind additional methods it just suits our purpose!

The issue does not exist, as has been said repeatedly already. If you must tinker with 'not outs', the only fair way would be to say that any uncompleted score below that batsman's highest score (not some absurd 'average') would class as a 'real' not out - which, as will be noted, would leave all figures completely unchanged, which is as nonsensical as the above lucubrations. Almost all the batsmen in the list given are accepted by everyone who knows cricket as being among the greatest - and this is not because of their averages (or only marginally so) but because they were great batsmen (the only great batsmen excluded are those early geniuses who lacked the opportunity to play enough innings to score enough runs to qualify).

Now throw in the nature of the opposition into the calculation and the ease of the pitch please!

The whole premise here makes no sense to me. It's not that Lara's 400* doesn't exist in the statistics. It significantly increases his average. By your methods you're trying to REDUCE it's significance, by adding one to the denominator even though he was never dismissed.

It's ludicrous to say that small not out scores are genuine not outs, and large not out scores should be considered as completed innings.

By playing the middle order, Steve Waugh lost the opportunity to score 400 runs in one innings (due to lack of time in the match and partners), and so it's fair that all his incomplete innings improve his average rather than counting as dismissals.

There is no problem here. You're trying to solve an issue that doesn't exist.

For mathematical purists this might seem good, and for stat heads to KNOW this might be advantageous. But as a general use stat, I think it adds little to the game.

This formula & method is insignificant.Cricket is played at all different standards for one.For example,India,England & Australia play the game at a very high level,with many away tests.West Indies get to play Zimbabwe,Bangladesh & that allows Shiv Chanderpaul & others to gain easy runs.It is so difficult to compare players.Tendulkar has played a much larger amount of away tests compared to home tests,so how can his average be compared to Lara or Ponting,who have played equal or even more home tests?Dravid played 94 away tests,just 70 in India,that makes averaging 50+ alot harder.

Averages are a personal thing / milestone for a batsman. As a batsman improves over his career, he can improve his averages and so on... as shown, NOT OUTS don't really affect an average of an accomplished batsman by much. What I would like to know is 'when a batsman walks out to bat, how many runs is he likely to score?', and it is this that needs a better evaluation system. Alternatively, an analysis of a batsman that tells me that over a given series, based on past performance, a batsman will score between x & y runs. We have the computational power, just need an adaptive set of algorithms... that would be interesting....

You are unnecessarily complicating things. The whole analysis is useless. I have a problem with the Lara-Kallis example itself. If you wanted to say that Kallis has a greater average than Lara due to his number of not outs, you should have given an example of an innings where Kallis remained not out rather than the 400* of Lara's. I mean as it is Lara has fewer not outs so the not counting the 400* as an innings helps him! Stupid analysis.

lol

analysis shows Lara > Tendulkar .......

age old question ....

in my opinion ... Tendulkar had a 1000+ run head start on Lara , and Lara got to 10,000 and 11,000 runs before Tendulkar and Lara palyed less Tests than Tendulkar , and less innings to get to 10,000 runs .... He also faced fewer deliveries ...

Tendulkar only surpasses Lara cuz Lara played less cricket , and also Lara did not get the opportunity to rack up runs against Bangladesh or Zimbabwe ! ....

(you know I would have rated Pointing as better than both lara and tendulkar at one point - but Pointing's horrible last 3 years effectively messed up his career stats )

Not Outs, White Clothes and Bouncers are a few of Test Cricket's enduring charms. And, if a batsman, by his undeniable skill, has not been dismissed, why should he be penalised?

Good article.

I think a more interesting approach would be along the lines of baseball's "earned runs" applied to a batsman. Runs earned before the batsman gets a letoff from the fielding team.

I know that the data isn't there to do that sort of analysis, but we really should be tracking these things by now. In fact, I bet there are people employed by T20 franchises who really are tracking such metrics, only the data isn't public yet.

I think you're missing the point of averages. It's average of runs scored between dismissals, not/innings. Lara's 400* does affect his average,but it doesn't count for a dismissal If somebody is repeatedly not dismissed why should it reduce his effective average? Why should 30* for Tendulkar (say) reduce his EBA? The bowler hasn't dismissed him and he's not to blame for a declaration, the team having won or the rest of the team being dismissed. The 50% adjustment is far more arbitrary than the average. I think it also suffers from making the EBA vary depending on the order of scores (an average dropping or climbing changes whether a not out counts as a dismissal or not so if somebody's EBA is 40 and makes 19* it wouldn't count, then they fail a few times and their EBA drops to 37. If they failed and their EBA dropped to 37 then the 19* will drop their EBA further, so he'd have different stats depending on the order he makes his runs. I see that as a worse flaw than anything on average

This is silly. Both propositions imagine runs (either more or less than the actual runs scored) in order to solve a statistical, not cricket, problem. Making up numbers is a greater "statistical problem" than leaving the denominator unmolested. Leave te averages alone please.

The only foolproof means of determining the relative merits of two players is arguing over a few beers in grubby whites after your own day's play. To which end, Lara over Tendulkar every day. Get the beers in.

I appreciate the author's efforts to find a middle ground in balancing the not out statistical issue. But I believe it's a futile endeavour. Just as many arguments can be made that not outs boost averages, as they restrict a batsman's scoring potential per innings. Interesting to examine from another angle but the avg is what it is: runs per dismissal. Adding extra covenants would only muddy the water further and shed little light on any debates.

OK. So you wanted to prove that Lara is better than Kallis and you found a formula that helps you prove your point. I think all this discussion is completely useless. The kind of extrapolations you are using are neither mathematically sound, nor agree with basic cricketing common sense. It is not like a batsman is equally likely to get out on any given score. Usually, a set batsman is less likely to get out than a new batsman. Hence, I actually fully support the current system of not counting the dismissals, but counting the runs scored. The reason is simple, the runs have been scored, and dismissal has not taken place. That is what the average measures, and it has nothing to do with the number of innings, as it should (not).

The EBA method is more of a speculation, i.e like Kallis with a 10 No. Out innings can go on to score 50 runs. Speculative stats can be misleading. What if he had faced one more ball and got out for 10 or what if he got a double ton? The current batting average, though flawed takes into account only the 'real' stats.

Btw, following on from Dark_Harlequin's comments, while I recognise it's not exactly the same thing (because bowlers get more than one opportunity per innings) there is surely an intuitive similarity in certain circumstances between "runs in the numerator but nothing in the denominator" for batsmen and for bowlers.

For example, the opposition is only left with 30 runs for victory in the 4th innings. The opening bowlers both end up with figures of 6-1-15-0. Why should they be "penalised" in terms of average for those 15 runs, when they had no real opportunity to take any wickets? (How many bowlers strike at better than a wicket per 36 balls?)

To me there is something intuitively wrong with carrying over those runs to the next innings, yet we manage happily enough. Could we not do the same with not outs? It just means that the average of a #6 isn't directly comparable to that of a #3 - for the very same reason that the average of a spinner isn't directly comparable to an opening quick.

(Cont'd) Therefore the not outs have a value that ought to be recognised.

This does mean, of course, that average is not a directly comparable metric across the whole batting order. But that is the point: different batsmen have different roles.

In any case, intelligent cricket watchers have always known that the batting average of openers should always be considered a few runs better in comparison with any other position, because they usually face the most testing spells of the match. That is, we recognise the limitations of the average with respect to batting position, and adjust our attitude accordingly.

Thank you for the interesting suggestions. However, I'm not sure that not outs are quite as statistically anomalous as they at first appear. The issue, I believe, comes down to whether we use the average as a measure of individual ability or importance to the team. I prefer the latter.

That is, cricket is a team game, and every player has a particular role to fulfill, and I believe average gives a good indication of how successfully a batsman has done that. For example, Steve Waugh's high rate of not outs betrays a particular strength of his, which other middle order batsmen have found hard to emulate, namely, creating valuable partnerships with the tail.

Maybe I'm viewing Waugh's career over-generously, and maybe the stats would show something different, but my memory is that Waugh had a great ability to bring out the best from the lower order, by carefully farming the strike so as to instill greater confidence in them. This was only possible as long as he remained not out. (Cont'd)

This is crazy. I know cricketers enjoy their statistics and all that but this is taking it ad nauseum. The way it is currently expressed is fine; the average for bowlers is the number of runs they give away per wicket taken and the average for batsmen is the number of runs they score per wicket 'given'.

Besides, we have all heard of the old idiom 'there are 3 types of lies: lies, damned lies and statistics' - they are good as a guide, but rely on them at your peril!

Interesting methods. I would try one more variant for the 'extended innings' method which is to use a random score from the distribution of completed scores (outs) for the individual instead of the rpi from previous 10 innings. This might be more probabilistic than deterministic but this might retain the uncertainty that is inherent in the sport and also reflect the batsman's general propensity. The drawback is that it ignores the form the batsman was in when he was not out. So one more alteration would be to choose a random score from previous 'x' completed scores. This would be better than rpi as one huge score followed by poor outings would skew the extended runs more than it should.

I think this is an interesting idea as you are rightly addressing the issue of anomaly in numerator and denominator whilst giving importance to current form. I think it's a good effort and more work should be done in this regard. However one concern I have is your methods RpFI & EB Average have got a fair amount of prediction (or assumption) in it. On the other hand the 'extreme' methods like RPI or the current system has no prediction involved in it. I think there is a case for saying that predicting what the batsman would do is an issue. Lara, for instance went through a string of low scores before that epic 400. So, say if was left not out at 4 in that innings, your method would add a paltry sum and calculate his EB Average - as he was 'out of form' coming into that monstrous 400! Predictive methods should not replace the current method or RPI. I guess it can be given a place along side non predictive methods. Cricket nerds won't mind additional methods it just suits our purpose!

The whole premise here makes no sense to me. It's not that Lara's 400* doesn't exist in the statistics. It significantly increases his average. By your methods you're trying to REDUCE it's significance, by adding one to the denominator even though he was never dismissed.

It's ludicrous to say that small not out scores are genuine not outs, and large not out scores should be considered as completed innings.

By playing the middle order, Steve Waugh lost the opportunity to score 400 runs in one innings (due to lack of time in the match and partners), and so it's fair that all his incomplete innings improve his average rather than counting as dismissals.

There is no problem here. You're trying to solve an issue that doesn't exist.

This is crazy. I know cricketers enjoy their statistics and all that but this is taking it ad nauseum. The way it is currently expressed is fine; the average for bowlers is the number of runs they give away per wicket taken and the average for batsmen is the number of runs they score per wicket 'given'.

Besides, we have all heard of the old idiom 'there are 3 types of lies: lies, damned lies and statistics' - they are good as a guide, but rely on them at your peril!

Thank you for the interesting suggestions. However, I'm not sure that not outs are quite as statistically anomalous as they at first appear. The issue, I believe, comes down to whether we use the average as a measure of individual ability or importance to the team. I prefer the latter.

That is, cricket is a team game, and every player has a particular role to fulfill, and I believe average gives a good indication of how successfully a batsman has done that. For example, Steve Waugh's high rate of not outs betrays a particular strength of his, which other middle order batsmen have found hard to emulate, namely, creating valuable partnerships with the tail.

Maybe I'm viewing Waugh's career over-generously, and maybe the stats would show something different, but my memory is that Waugh had a great ability to bring out the best from the lower order, by carefully farming the strike so as to instill greater confidence in them. This was only possible as long as he remained not out. (Cont'd)

(Cont'd) Therefore the not outs have a value that ought to be recognised.

This does mean, of course, that average is not a directly comparable metric across the whole batting order. But that is the point: different batsmen have different roles.

In any case, intelligent cricket watchers have always known that the batting average of openers should always be considered a few runs better in comparison with any other position, because they usually face the most testing spells of the match. That is, we recognise the limitations of the average with respect to batting position, and adjust our attitude accordingly.

Btw, following on from Dark_Harlequin's comments, while I recognise it's not exactly the same thing (because bowlers get more than one opportunity per innings) there is surely an intuitive similarity in certain circumstances between "runs in the numerator but nothing in the denominator" for batsmen and for bowlers.

For example, the opposition is only left with 30 runs for victory in the 4th innings. The opening bowlers both end up with figures of 6-1-15-0. Why should they be "penalised" in terms of average for those 15 runs, when they had no real opportunity to take any wickets? (How many bowlers strike at better than a wicket per 36 balls?)

To me there is something intuitively wrong with carrying over those runs to the next innings, yet we manage happily enough. Could we not do the same with not outs? It just means that the average of a #6 isn't directly comparable to that of a #3 - for the very same reason that the average of a spinner isn't directly comparable to an opening quick.

The EBA method is more of a speculation, i.e like Kallis with a 10 No. Out innings can go on to score 50 runs. Speculative stats can be misleading. What if he had faced one more ball and got out for 10 or what if he got a double ton? The current batting average, though flawed takes into account only the 'real' stats.

OK. So you wanted to prove that Lara is better than Kallis and you found a formula that helps you prove your point. I think all this discussion is completely useless. The kind of extrapolations you are using are neither mathematically sound, nor agree with basic cricketing common sense. It is not like a batsman is equally likely to get out on any given score. Usually, a set batsman is less likely to get out than a new batsman. Hence, I actually fully support the current system of not counting the dismissals, but counting the runs scored. The reason is simple, the runs have been scored, and dismissal has not taken place. That is what the average measures, and it has nothing to do with the number of innings, as it should (not).

I appreciate the author's efforts to find a middle ground in balancing the not out statistical issue. But I believe it's a futile endeavour. Just as many arguments can be made that not outs boost averages, as they restrict a batsman's scoring potential per innings. Interesting to examine from another angle but the avg is what it is: runs per dismissal. Adding extra covenants would only muddy the water further and shed little light on any debates.

This is silly. Both propositions imagine runs (either more or less than the actual runs scored) in order to solve a statistical, not cricket, problem. Making up numbers is a greater "statistical problem" than leaving the denominator unmolested. Leave te averages alone please.

The only foolproof means of determining the relative merits of two players is arguing over a few beers in grubby whites after your own day's play. To which end, Lara over Tendulkar every day. Get the beers in.

I think you're missing the point of averages. It's average of runs scored between dismissals, not/innings. Lara's 400* does affect his average,but it doesn't count for a dismissal If somebody is repeatedly not dismissed why should it reduce his effective average? Why should 30* for Tendulkar (say) reduce his EBA? The bowler hasn't dismissed him and he's not to blame for a declaration, the team having won or the rest of the team being dismissed. The 50% adjustment is far more arbitrary than the average. I think it also suffers from making the EBA vary depending on the order of scores (an average dropping or climbing changes whether a not out counts as a dismissal or not so if somebody's EBA is 40 and makes 19* it wouldn't count, then they fail a few times and their EBA drops to 37. If they failed and their EBA dropped to 37 then the 19* will drop their EBA further, so he'd have different stats depending on the order he makes his runs. I see that as a worse flaw than anything on average

Good article.

I think a more interesting approach would be along the lines of baseball's "earned runs" applied to a batsman. Runs earned before the batsman gets a letoff from the fielding team.

I know that the data isn't there to do that sort of analysis, but we really should be tracking these things by now. In fact, I bet there are people employed by T20 franchises who really are tracking such metrics, only the data isn't public yet.