Statistics March 4, 2013

# The vexed question of 'not outs' in Test cricket

The statistical measurement of a player's batting average is one that has survived unaltered for 130 years of Test cricket - but it suffers from a fundamental flaw in the way 'not outs' are handled

Due to technical issues, Ananth has not been able to view and respond to the comments. We are working on the issue and hope to have it resolved as soon as possible.

This article addresses the often-debated question of 'not outs' in Test cricket. 'Batting average' is an archaic statistical measure with a glaring weakness. While other statistical measures have seen many changes over 130 years of Test cricket, this measure with a fundamental flaw has survived unaltered. Let's begin by understanding the flaw and then look at the methods to address it.

So what exactly is the problem? Well, it lies in the manner of handling not outs. Lara played an epic, scoring 400 runs over 13 hours but this innings, as far as determining the batting average is concerned, does not exist. On the other hand, his three first-ball ducks against Australia, England and New Zealand are considered as three innings. While it is true that he was dismissed in the later three innings, it is also a fact that he played long enough to have played four complete innings. Basically 'batting average' should not exclude such innings.

As Milind puts it quite effectively, the batting average computation violates a basic mathematical dictum. Runs are added to the numerator and nothing to the denominator. Absolutely perfect description of the anomaly that exists.
Let us compare the figures of two modern great batsmen.

```Batsman     Team    T   I  No SNo No %  Runs  Avge  RpI
Kallis J.H   Saf  162 274  40   5 14.6 13128 56.10 47.91
Lara B.C     Win  131 232   6   2  2.6 11953 52.89 51.52
```

Kallis has played 31 more Tests to score additional 1150 runs but averages just over three runs more. That is because Kallis has 40 not outs compared with Lara's four. It might be due to the way Lara played, his batting positions or more declarations for Kallis who is a part of a stronger team and so on. Let us see how we can address the anomaly which is somewhat unfair to the top-order batsmen.

It should be noted that this problem is more pronounced in ODI matches because of the limited number of overs available and absence of declarations. It is also a fact that two batsmen remain not out in most ODI innings. However ODI batting is measured by the batting average and strike-rate, thus lowering the singular importance of batting averages.

I have selected 34 batsmen, who have scored over 2000 Test runs and averaged over 50, for this analysis. Virender Sehwag is just hanging on by the skin of his teeth and a failure in Chennai may very well plunge him below 50. And a reasonable Test at Centurion would push de Villiers past the 50 mark. However the data for all batsmen who have crossed 2000 runs is available for downloading and the link is provided later. The data is current up to match 2073, the Cape Town Test which finished just now.

BatsmanTeamTestsInnsNoNo %RunsAvge

Pollock R.GSaf234149.8225660.97
Sutcliffe HEng5484910.7455560.73
BarringtonEng821311511.5680658.67
EdeC WeekesWin488156.2445558.62
Hammond W.REng851401611.4724958.46
SobersWin931602113.1803257.78
Hobbs J.BEng6110276.9541056.95
Walcott C.LWin447479.5379856.69
Hutton LEng791381510.9697156.67
Kallis J.HSaf1622744014.61312856.10
SangakkaraSlk115196168.21004555.81
TendulkarInd1943203210.01564554.32
ChappellAus871511912.6711053.86
Nourse A.DSaf3462711.3296053.82
Lara B.CWin13123262.61195352.89
Clarke M.JAus891481510.1698952.55
Dravid RInd1642863211.21328852.31
Mohd YousufPak90156127.7753052.29
Amla H.MSaf68118108.5561051.94
Ponting R.TAus1682872910.11337851.85
ChanderpaulWin1462494216.91069651.67
Flower AZim631121917.0479451.55
HusseyAus791371611.7623551.53
Waugh S.RAus1682604617.71092751.06
Younis KhanPak80140117.9658051.01
Hayden M.LAus103184147.6862650.74
Border A.RAus1562654416.61117450.56
RichardsWin121182126.6854050.24
ComptonEng781311511.5580750.06
Sehwag VInd10217763.4855950.05

Most cricket followers are au fait with the above table. The one data element not shown normally is the "Not out %". This shows the % of not outs out of the total innings played. Among this elite collection of 34 batsmen, who account for 13% of runs scored in Test cricket, the highest % of not outs has been achieved by Steve Waugh, the middle-order giant from Australia. He has been unbeaten one in six innings. Andy Flower, Shivnarine Chanderpaul and Allan Border have similar numbers. In Flower's case, it has been more a question of a top drawer batsman in a weak team remaining unbeaten as his compatriots were dismissed.

The lowest figure has been achieved by Lara with 2.6%: that means once in 40 innings. Sehwag, with his attacking instincts is the only other batsman who clocks in fewer than 5%.

Out of interest, let me share with the readers some facts related to not outs across the 135 years of Test cricket. Of the 72865 innings played, there have been 9502 not outs, accounting for about 13%. Out of these 9502, 4253 not outs - nearly half - have been at scores below 10 runs.

A simple alternative is to use the Runs per Innings (RpI) instead of the batting average. Unfortunately it is a drastic step taking the other extreme. It affects the middle-order batsmen considerably. Many of their low-score not outs would be considered as completed innings and players like Kallis would be penalised. The graph below illustrates the two extreme situations - batting averages and RpI.

We need something between Batting average and RpI. I am proposing two alternatives to fill this space.

The first method seeks to redefine the not out innings. A dismissal is a dismissal and nothing needs to be done about those. But let us accept that even an Icelander with scant knowledge of cricket would accept that a 13-hour innings should not suddenly cease to exist just because of a declaration. Let us classify not out innings as "real not out" innings and the "Completed (or fulfilled) not out" innings.

The key is to determine a cut-off point beyond which the innings is considered as completed or fulfilled. I considered various values. A fixed figure, say, 25 or 50, would be unfair to weaker batsmen with low averages which means the figure has to be dynamically determined. The batting average itself is a good cut-off but a little stiff. Also we are questioning the very methodology of batting average. So I have zeroed in on a sensible dynamic value - a cut-off point at 50% of the "Average for dismissed innings". Here are couple of examples. Don Bradman's average for dismissed innings is 83.83 and any not out innings below 42 will be considered as a "real not out". Ken Barrington's average for dismissed innings is 50.37 and any not out innings below 25 will be considered as a "real not out". Any other not out innings would be considered as a fulfilled innings.

Let us examine the impact of this method. The table below lists the same 34 batsmen with their RpI and RpFI values, ordered by RpFI.

BatsmanTeamTestsInnsNoFulfilNORunsAvgeRpIRpFIChg %

EdeC WeekesWin488151445558.6255.0055.695.0%
Sutcliffe HEng548492455560.7354.2355.558.5%
Hobbs J.BEng6110274541056.9553.0455.203.1%
Pollock R.GSaf234140225660.9755.0255.029.8%
BarringtonEng82131154680658.6751.9553.598.7%
Walcott C.LWin447473379856.6951.3253.495.6%
Hammond W.REng85140163724958.4651.7852.919.5%
SangakkaraSlk1151961631004555.8151.2552.056.7%
Hutton LEng79138154697156.6750.5152.028.2%
Lara B.CWin131232621195352.8951.5251.971.7%
SobersWin93160214803257.7850.2051.4910.9%
TendulkarInd1943203281564554.3248.8950.147.7%
ChappellAus87151198711053.8647.0949.727.7%
Nourse A.DSaf346272296053.8247.7449.338.3%
Mohd YousufPak90156123753052.2948.2749.225.9%
Sehwag VInd10217763855950.0548.3649.191.7%
Kallis J.HSaf1622744051312856.1047.9148.8013.0%
Hayden M.LAus103184148862650.7446.8849.013.4%
Younis KhanPak80140113658051.0147.0048.035.8%
Clarke M.JAus89148152698952.5547.2247.878.9%
Dravid RInd1642863281328852.3146.4647.808.6%
RichardsWin121182123854050.2446.9247.715.0%
Ponting R.TAus1682872961337851.8546.6147.618.2%
Amla H.MSaf68118100561051.9447.5447.548.5%
HusseyAus79137162623551.5345.5146.1910.4%
ComptonEng78131155580750.0644.3346.097.9%
Flower AZim63112195479451.5542.8044.8013.1%
ChanderpaulWin1462494241069651.6742.9643.6615.5%
Waugh S.RAus1682604691092751.0642.0343.5314.7%
Border A.RAus1562654451117450.5642.1742.9815.0%

It is obvious that the RpFI figures for batsmen with a high % of not outs would be much below the Batting average than those with low % of not outs. Bradman drops 10.3% & Kallis drops by 13.1%. Readers can note that the four middle-order batsmen who have already been discussed earlier possessing high % of not outs, viz., Andy Flower, Chanderpaul, Steve Waugh and Border have had the highest drops and occupy the bottom four positions in this table. The lowest drop has been for Lara and Sehwag, with 1.7%. In fact Sehwag, who was 34th in the batting average table moves up to 18th here. Even the high batting average of Kallis drops to below 50.

This is a simple and easy-to-understand method. Anyone can incorporate these figures by inspecting the not out innings of a batsman. I also have to accept that while this addresses the "not out" problem somewhat, the fundamental weakness of having an innings represented in the numerator in the form of runs and being ignored in the denominator exists. Albeit small innings only. At least the 400s and 365s have been taken care of.

However a more intuitive and stronger method is the one that tackles the "Runs" side of the formula to equate every batsman on a fair basis. In this method I will "extend" the not out innings to its natural conclusion or in other words - get the batsmen "out". Clearly this is a case of an extrapolation combining actual runs scored with virtual ones. Does it matter? Let us venture outside the normal realm of things and scrutinise what is in store.

The key question is "by how many runs" to extend these not out innings. When I started working on this idea a few years back, along with Dr.Ashwin Mahesh, we picked out the Batting average. In view of our own fundamental objection to this value we moved on to the RpI and subsequently to the "Average for dismissed innings". This is relatively easy to handle. Just multiply the number of not out innings by the "Average for dismissed innings", get the new total runs and divide by the total innings to derive the Extended Batting average (EBA). This can be added on to any existing table in a jiffy.

A few years back I noticed a flaw in this approach. Sehwag is batting these days like a village team slogger who has forgotten the basics. If, by any chance, he remains not out, however much it is unlikely, should we add nearly 50 runs to his innings? With all due respects to the great Tendulkar, a similar situation exists in his case too. That brings us to Michael Clarke and his purple patch. In the last 10 innings he has averaged over 80. It would be unfair to add only 45 runs or thereabouts.

Hence I decided that, despite the risk of adding complexity, I would add the Runs per innings for the last 10 innings played by the batsman. This is complex since this value has to be determined dynamically for each and every not out innings played by the batsman during his career. It requires tricky computer algorithms. Also note that I have used Runs per innings because we are considering only 10 innings and a couple of not out innings would distort the entire process. Why 10 innings instead of 10 Test matches? Well there have been times when a player played 3-5 Tests a year and it would have taken a few years to play 10 Tests. That is too long a period for a recent form connotation. In general, 10 innings is one long or two short series and would reflect the recent form quite accurately.

Let us peruse the revised figures. The table below lists the same 34 batsmen ordered by EBAvge.

BatsmanTeamTestsInnsRunsAvgeOutAvgeExtRunsEBAvgeChg %

Sutcliffe HEng5484455560.7354.64502459.811.5%
Pollock R.GSaf2341225660.9754.43239458.394.2%
EdeC WeekesWin4881445558.6254.88465457.462.0%
Hammond W.REng85140724958.4646.19801857.272.0%
BarringtonEng82131680658.6750.37741056.563.6%
Hobbs J.BEng61102541056.9553.34564555.342.8%
Hutton LEng79138697156.6747.89762955.282.5%
SangakkaraSlk1151961004555.8147.561079255.062.3%
SobersWin93160803257.7844.06876854.805.2%
Kallis J.HSaf1622741312856.1042.231490554.403.0%
Walcott C.LWin4474379856.6951.03400154.074.6%
TendulkarInd1943201564554.3244.561688852.772.8%
Lara B.CWin1312321195352.8949.761222052.670.4%
Mohd YousufPak90156753052.2946.19800951.341.8%
Amla H.MSaf68118561051.9439.92604251.201.4%
Nourse A.DSaf3462296053.8247.49316751.085.1%
Clarke M.JAus89148698952.5542.23755951.072.8%
ChappellAus87151711053.8644.57770651.035.3%
Ponting R.TAus1682871337851.8545.151464651.031.6%
Dravid RInd1642861328852.3144.711450550.723.1%
Hayden M.LAus103184862650.7447.68924450.241.0%
Sehwag VInd102177855950.0547.96885450.020.1%
Younis KhanPak80140658051.0144.24696649.762.5%
ChanderpaulWin1462491069651.6734.491225949.234.7%
HusseyAus79137623551.5342.50674249.214.5%
ComptonEng78131580750.0644.40630248.113.9%
RichardsWin121182854050.2444.49875348.094.3%
Waugh S.RAus1682601092751.0635.471248048.006.0%
Flower AZim63112479451.5535.43533747.657.6%
Border A.RAus1562651117450.5637.041239746.787.5%

Bradman's EBA is 97% of his Batting average, a drop of 3%. Headley drops 6.5%. Sobers drops by 5%. All the middle order stalwarts have drops exceeding 6%. Sehwag has the lowest drop: only 0.1%, virtually no change. Similarly Lara drops by only 0.4%. Amongst these top batsmen not even a single batsman has his EBA higher than his Batting average. This happens lower down the table. Mohsin Khan has the highest increase: 1.4%. The much-maligned Graeme Hick's EBA is 1.3% higher than his Batting average. Darren Ganga follows next with a 0.9% increase. A total of 11 batsmen have higher EBA values. Interested readers can study the Excel sheet for details. Saeed Anwar is the only batsman with more than 4000 runs under his belt and an EBA higher than Batting average.

This method is more elegant and intuitive with complexity of calculations being the sole deterrent. However the concept is very good and any cricket follower can implement the fixed value concept easily. The fixed value can be anything from a slew of values. And we can say with certainty that every innings is represented in the numerator and denominator. We have addressed that problem effectively.
Let us revisit the figures of Kallis and Lara.

```Batsman   Team    T   I No SNo No %  Runs  Avge  RpI  RpAI ExtRuns  EBA  %Avge
Kallis J.H Saf  162 274 40   5 14.6 13128 56.10 47.91 48.80 14905  54.40 97.0%
Lara B.C   Win  131 232  6   2  2.6 11953 52.89 51.52 51.97 12220  52.67 99.6%
```
Readers can see that Lara's average was nearly 4 fewer than Kallis. However his RpI and RpAI are nearly 3 runs higher. Significantly, the EBA, which is more accurate and a valid measure, is only less than 2 runs below Kallis. EBA probably reflects the central tendency most accurately.

Now for a revised graph. The two alternatives are pictorially represented occupying the space in the middle.

This is not a theoretical exercise - Two alternatives are presented to address a genuine problem. The Spl Not outs method is simple and easy to implement. The Extended Batting average method is more complex and would require a computer to incorporate recent form. However using the Out Bat average or Batting average or RpI or RpFI as the extension basis would be easier to implement. What is needed? Well, an influential organization such as ESPNcricinfo should study the suggestions and start implementing the revised averages: Of course along with the current measures.