THE CORDON HOME

BLOGS ARCHIVES
SELECT BLOG
March 4, 2013

Statistics

The vexed question of 'not outs' in Test cricket

Anantha Narayanan
Among batsmen with 4000-plus runs, Saeed Anwar is the only one to have an Extended Batting average greater than his Test average  © ESPNcricinfo Ltd
Enlarge

Due to technical issues, Ananth has not been able to view and respond to the comments. We are working on the issue and hope to have it resolved as soon as possible.

This article addresses the often-debated question of 'not outs' in Test cricket. 'Batting average' is an archaic statistical measure with a glaring weakness. While other statistical measures have seen many changes over 130 years of Test cricket, this measure with a fundamental flaw has survived unaltered. Let's begin by understanding the flaw and then look at the methods to address it.

So what exactly is the problem? Well, it lies in the manner of handling not outs. Lara played an epic, scoring 400 runs over 13 hours but this innings, as far as determining the batting average is concerned, does not exist. On the other hand, his three first-ball ducks against Australia, England and New Zealand are considered as three innings. While it is true that he was dismissed in the later three innings, it is also a fact that he played long enough to have played four complete innings. Basically 'batting average' should not exclude such innings.

As Milind puts it quite effectively, the batting average computation violates a basic mathematical dictum. Runs are added to the numerator and nothing to the denominator. Absolutely perfect description of the anomaly that exists.
Let us compare the figures of two modern great batsmen.

Batsman     Team    T   I  No SNo No %  Runs  Avge  RpI
Kallis J.H   Saf  162 274  40   5 14.6 13128 56.10 47.91
Lara B.C     Win  131 232   6   2  2.6 11953 52.89 51.52

Kallis has played 31 more Tests to score additional 1150 runs but averages just over three runs more. That is because Kallis has 40 not outs compared with Lara's four. It might be due to the way Lara played, his batting positions or more declarations for Kallis who is a part of a stronger team and so on. Let us see how we can address the anomaly which is somewhat unfair to the top-order batsmen.

It should be noted that this problem is more pronounced in ODI matches because of the limited number of overs available and absence of declarations. It is also a fact that two batsmen remain not out in most ODI innings. However ODI batting is measured by the batting average and strike-rate, thus lowering the singular importance of batting averages.

I have selected 34 batsmen, who have scored over 2000 Test runs and averaged over 50, for this analysis. Virender Sehwag is just hanging on by the skin of his teeth and a failure in Chennai may very well plunge him below 50. And a reasonable Test at Centurion would push de Villiers past the 50 mark. However the data for all batsmen who have crossed 2000 runs is available for downloading and the link is provided later. The data is current up to match 2073, the Cape Town Test which finished just now.

BatsmanTeamTestsInnsNoNo %RunsAvge
 
Bradman D.GAus52801012.5699699.94
Pollock R.GSaf234149.8225660.97
Headley G.AWin2240410.0219060.83
Sutcliffe HEng5484910.7455560.73
BarringtonEng821311511.5680658.67
EdeC WeekesWin488156.2445558.62
Hammond W.REng851401611.4724958.46
SobersWin931602113.1803257.78
Hobbs J.BEng6110276.9541056.95
Walcott C.LWin447479.5379856.69
Hutton LEng791381510.9697156.67
Kallis J.HSaf1622744014.61312856.10
SangakkaraSlk115196168.21004555.81
TendulkarInd1943203210.01564554.32
ChappellAus871511912.6711053.86
Nourse A.DSaf3462711.3296053.82
Lara B.CWin13123262.61195352.89
MiandadPak1241892111.1883252.57
Clarke M.JAus891481510.1698952.55
Dravid RInd1642863211.21328852.31
Mohd YousufPak90156127.7753052.29
Amla H.MSaf68118108.5561051.94
Ponting R.TAus1682872910.11337851.85
ChanderpaulWin1462494216.91069651.67
Flower AZim631121917.0479451.55
HusseyAus791371611.7623551.53
GavaskarInd125214167.51012251.12
Waugh S.RAus1682604617.71092751.06
Younis KhanPak80140117.9658051.01
Hayden M.LAus103184147.6862650.74
Border A.RAus1562654416.61117450.56
RichardsWin121182126.6854050.24
ComptonEng781311511.5580750.06
Sehwag VInd10217763.4855950.05

Most cricket followers are au fait with the above table. The one data element not shown normally is the "Not out %". This shows the % of not outs out of the total innings played. Among this elite collection of 34 batsmen, who account for 13% of runs scored in Test cricket, the highest % of not outs has been achieved by Steve Waugh, the middle-order giant from Australia. He has been unbeaten one in six innings. Andy Flower, Shivnarine Chanderpaul and Allan Border have similar numbers. In Flower's case, it has been more a question of a top drawer batsman in a weak team remaining unbeaten as his compatriots were dismissed.

The lowest figure has been achieved by Lara with 2.6%: that means once in 40 innings. Sehwag, with his attacking instincts is the only other batsman who clocks in fewer than 5%.

Out of interest, let me share with the readers some facts related to not outs across the 135 years of Test cricket. Of the 72865 innings played, there have been 9502 not outs, accounting for about 13%. Out of these 9502, 4253 not outs - nearly half - have been at scores below 10 runs.


A simple alternative is to use the Runs per Innings (RpI) instead of the batting average. Unfortunately it is a drastic step taking the other extreme. It affects the middle-order batsmen considerably. Many of their low-score not outs would be considered as completed innings and players like Kallis would be penalised. The graph below illustrates the two extreme situations - batting averages and RpI.

Average v Runs per Innings
 © Anantha Narayanan
Enlarge

We need something between Batting average and RpI. I am proposing two alternatives to fill this space.

The first method seeks to redefine the not out innings. A dismissal is a dismissal and nothing needs to be done about those. But let us accept that even an Icelander with scant knowledge of cricket would accept that a 13-hour innings should not suddenly cease to exist just because of a declaration. Let us classify not out innings as "real not out" innings and the "Completed (or fulfilled) not out" innings.

The key is to determine a cut-off point beyond which the innings is considered as completed or fulfilled. I considered various values. A fixed figure, say, 25 or 50, would be unfair to weaker batsmen with low averages which means the figure has to be dynamically determined. The batting average itself is a good cut-off but a little stiff. Also we are questioning the very methodology of batting average. So I have zeroed in on a sensible dynamic value - a cut-off point at 50% of the "Average for dismissed innings". Here are couple of examples. Don Bradman's average for dismissed innings is 83.83 and any not out innings below 42 will be considered as a "real not out". Ken Barrington's average for dismissed innings is 50.37 and any not out innings below 25 will be considered as a "real not out". Any other not out innings would be considered as a fulfilled innings.

Let us examine the impact of this method. The table below lists the same 34 batsmen with their RpI and RpFI values, ordered by RpFI.

BatsmanTeamTestsInnsNoFulfilNORunsAvgeRpIRpFIChg %
 
Bradman D.GAus5280102699699.9487.4589.6910.3%
Headley G.AWin224041219060.8354.7556.157.7%
EdeC WeekesWin488151445558.6255.0055.695.0%
Sutcliffe HEng548492455560.7354.2355.558.5%
Hobbs J.BEng6110274541056.9553.0455.203.1%
Pollock R.GSaf234140225660.9755.0255.029.8%
BarringtonEng82131154680658.6751.9553.598.7%
Walcott C.LWin447473379856.6951.3253.495.6%
Hammond W.REng85140163724958.4651.7852.919.5%
SangakkaraSlk1151961631004555.8151.2552.056.7%
Hutton LEng79138154697156.6750.5152.028.2%
Lara B.CWin131232621195352.8951.5251.971.7%
SobersWin93160214803257.7850.2051.4910.9%
TendulkarInd1943203281564554.3248.8950.147.7%
ChappellAus87151198711053.8647.0949.727.7%
Nourse A.DSaf346272296053.8247.7449.338.3%
Mohd YousufPak90156123753052.2948.2749.225.9%
Sehwag VInd10217763855950.0548.3649.191.7%
Kallis J.HSaf1622744051312856.1047.9148.8013.0%
Hayden M.LAus103184148862650.7446.8849.013.4%
GavaskarInd1252141641012251.1247.3048.205.7%
Younis KhanPak80140113658051.0147.0048.035.8%
Clarke M.JAus89148152698952.5547.2247.878.9%
Dravid RInd1642863281328852.3146.4647.808.6%
MiandadPak124189214883252.5746.7347.749.2%
RichardsWin121182123854050.2446.9247.715.0%
Ponting R.TAus1682872961337851.8546.6147.618.2%
Amla H.MSaf68118100561051.9447.5447.548.5%
HusseyAus79137162623551.5345.5146.1910.4%
ComptonEng78131155580750.0644.3346.097.9%
Flower AZim63112195479451.5542.8044.8013.1%
ChanderpaulWin1462494241069651.6742.9643.6615.5%
Waugh S.RAus1682604691092751.0642.0343.5314.7%
Border A.RAus1562654451117450.5642.1742.9815.0%

It is obvious that the RpFI figures for batsmen with a high % of not outs would be much below the Batting average than those with low % of not outs. Bradman drops 10.3% & Kallis drops by 13.1%. Readers can note that the four middle-order batsmen who have already been discussed earlier possessing high % of not outs, viz., Andy Flower, Chanderpaul, Steve Waugh and Border have had the highest drops and occupy the bottom four positions in this table. The lowest drop has been for Lara and Sehwag, with 1.7%. In fact Sehwag, who was 34th in the batting average table moves up to 18th here. Even the high batting average of Kallis drops to below 50.

This is a simple and easy-to-understand method. Anyone can incorporate these figures by inspecting the not out innings of a batsman. I also have to accept that while this addresses the "not out" problem somewhat, the fundamental weakness of having an innings represented in the numerator in the form of runs and being ignored in the denominator exists. Albeit small innings only. At least the 400s and 365s have been taken care of.

However a more intuitive and stronger method is the one that tackles the "Runs" side of the formula to equate every batsman on a fair basis. In this method I will "extend" the not out innings to its natural conclusion or in other words - get the batsmen "out". Clearly this is a case of an extrapolation combining actual runs scored with virtual ones. Does it matter? Let us venture outside the normal realm of things and scrutinise what is in store.

The key question is "by how many runs" to extend these not out innings. When I started working on this idea a few years back, along with Dr.Ashwin Mahesh, we picked out the Batting average. In view of our own fundamental objection to this value we moved on to the RpI and subsequently to the "Average for dismissed innings". This is relatively easy to handle. Just multiply the number of not out innings by the "Average for dismissed innings", get the new total runs and divide by the total innings to derive the Extended Batting average (EBA). This can be added on to any existing table in a jiffy.

A few years back I noticed a flaw in this approach. Sehwag is batting these days like a village team slogger who has forgotten the basics. If, by any chance, he remains not out, however much it is unlikely, should we add nearly 50 runs to his innings? With all due respects to the great Tendulkar, a similar situation exists in his case too. That brings us to Michael Clarke and his purple patch. In the last 10 innings he has averaged over 80. It would be unfair to add only 45 runs or thereabouts.

Hence I decided that, despite the risk of adding complexity, I would add the Runs per innings for the last 10 innings played by the batsman. This is complex since this value has to be determined dynamically for each and every not out innings played by the batsman during his career. It requires tricky computer algorithms. Also note that I have used Runs per innings because we are considering only 10 innings and a couple of not out innings would distort the entire process. Why 10 innings instead of 10 Test matches? Well there have been times when a player played 3-5 Tests a year and it would have taken a few years to play 10 Tests. That is too long a period for a recent form connotation. In general, 10 innings is one long or two short series and would reflect the recent form quite accurately.

Let us peruse the revised figures. The table below lists the same 34 batsmen ordered by EBAvge.

BatsmanTeamTestsInnsRunsAvgeOutAvgeExtRunsEBAvgeChg %
 
Bradman D.GAus5280699699.9483.83775996.993.0%
Sutcliffe HEng5484455560.7354.64502459.811.5%
Pollock R.GSaf2341225660.9754.43239458.394.2%
EdeC WeekesWin4881445558.6254.88465457.462.0%
Hammond W.REng85140724958.4646.19801857.272.0%
Headley G.AWin2240219060.8345.61227556.886.5%
BarringtonEng82131680658.6750.37741056.563.6%
Hobbs J.BEng61102541056.9553.34564555.342.8%
Hutton LEng79138697156.6747.89762955.282.5%
SangakkaraSlk1151961004555.8147.561079255.062.3%
SobersWin93160803257.7844.06876854.805.2%
Kallis J.HSaf1622741312856.1042.231490554.403.0%
Walcott C.LWin4474379856.6951.03400154.074.6%
TendulkarInd1943201564554.3244.561688852.772.8%
Lara B.CWin1312321195352.8949.761222052.670.4%
Mohd YousufPak90156753052.2946.19800951.341.8%
Amla H.MSaf68118561051.9439.92604251.201.4%
Nourse A.DSaf3462296053.8247.49316751.085.1%
Clarke M.JAus89148698952.5542.23755951.072.8%
ChappellAus87151711053.8644.57770651.035.3%
Ponting R.TAus1682871337851.8545.151464651.031.6%
Dravid RInd1642861328852.3144.711450550.723.1%
Hayden M.LAus103184862650.7447.68924450.241.0%
Sehwag VInd102177855950.0547.96885450.020.1%
Younis KhanPak80140658051.0144.24696649.762.5%
MiandadPak124189883252.5741.97931049.266.3%
ChanderpaulWin1462491069651.6734.491225949.234.7%
HusseyAus79137623551.5342.50674249.214.5%
GavaskarInd1252141012251.1244.141052349.173.8%
ComptonEng78131580750.0644.40630248.113.9%
RichardsWin121182854050.2444.49875348.094.3%
Waugh S.RAus1682601092751.0635.471248048.006.0%
Flower AZim63112479451.5535.43533747.657.6%
Border A.RAus1562651117450.5637.041239746.787.5%

Bradman's EBA is 97% of his Batting average, a drop of 3%. Headley drops 6.5%. Sobers drops by 5%. All the middle order stalwarts have drops exceeding 6%. Sehwag has the lowest drop: only 0.1%, virtually no change. Similarly Lara drops by only 0.4%. Amongst these top batsmen not even a single batsman has his EBA higher than his Batting average. This happens lower down the table. Mohsin Khan has the highest increase: 1.4%. The much-maligned Graeme Hick's EBA is 1.3% higher than his Batting average. Darren Ganga follows next with a 0.9% increase. A total of 11 batsmen have higher EBA values. Interested readers can study the Excel sheet for details. Saeed Anwar is the only batsman with more than 4000 runs under his belt and an EBA higher than Batting average.

This method is more elegant and intuitive with complexity of calculations being the sole deterrent. However the concept is very good and any cricket follower can implement the fixed value concept easily. The fixed value can be anything from a slew of values. And we can say with certainty that every innings is represented in the numerator and denominator. We have addressed that problem effectively.
Let us revisit the figures of Kallis and Lara.

Batsman   Team    T   I No SNo No %  Runs  Avge  RpI  RpAI ExtRuns  EBA  %Avge
Kallis J.H Saf  162 274 40   5 14.6 13128 56.10 47.91 48.80 14905  54.40 97.0%
Lara B.C   Win  131 232  6   2  2.6 11953 52.89 51.52 51.97 12220  52.67 99.6%
Readers can see that Lara's average was nearly 4 fewer than Kallis. However his RpI and RpAI are nearly 3 runs higher. Significantly, the EBA, which is more accurate and a valid measure, is only less than 2 runs below Kallis. EBA probably reflects the central tendency most accurately.

Now for a revised graph. The two alternatives are pictorially represented occupying the space in the middle.

Average and extended average v Runs per fulfilled Innings and Runs per Innings
 © Anantha Narayanan
Enlarge

This is not a theoretical exercise - Two alternatives are presented to address a genuine problem. The Spl Not outs method is simple and easy to implement. The Extended Batting average method is more complex and would require a computer to incorporate recent form. However using the Out Bat average or Batting average or RpI or RpFI as the extension basis would be easier to implement. What is needed? Well, an influential organization such as ESPNcricinfo should study the suggestions and start implementing the revised averages: Of course along with the current measures.

To download/view the comprehensive Excel sheet containing the values for all the 264 batsmen who have crossed 2000 Test runs, please CLICK HERE.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

RSS Feeds: Anantha Narayanan

Keywords: Stats

© ESPN Sports Media Ltd.

Posted by vsssarma on (April 6, 2013, 5:59 GMT)

I have a simple way. Suppose in a series, 25 innings were played and 1,000 runs were scored. Take the average 1,000/25 = 40. Suppose in those 25 innings, 5 not outs were there. Then take the average again as 1,000/20 = 50. Take the ratio 50/40 = 1.25. Subtract 1. You get 0.25 which is equal to 25%. If a batsman is not out all 100% of the innings, add 25% to the runs he scored and calculate runs per match. If a batsman is not out 50% of the innings, add 50% of 25% i.e. 12.5% extra runs to the total runs scored and work out runs per match. Rate batsmen on 'runs per match' but not 'runs per innings'. After all, in a match runs count not runs per innings.

Posted by SouthPaw on (April 2, 2013, 10:36 GMT)

contd...

In this case, of course, the data points are few, however, over a large number of innings, the median will come closer to the mean in case of consistent performers.

For example, taking your first table, you can very easily see that Sehwag and Lara top the charts because they are "more consistent" (since their ave and RPI are closer). Also, the top batsmen, in terms of consistency and quality more or less remain the same barring Brian Lara who jumps into the top 10 and rightfully so. On the other hand, Steve Waugh goes down to the bottom of the list, again, rightly so.

Posted by SouthPaw on (April 2, 2013, 10:35 GMT)

Ananth, I have been playing with similar thoughts in my head for the last 10 years but never put it down on paper. While your post is very interesting, it is also very complex. On the other hand, I have a simple solution.

To understand the solution, let us try asking the very basis for an average. Isn't it to figure out how many runs can be expected from a player? In other words, say Tendulkar has an "average of 53" - it means that every time he walks in to bat, you can be assured (if not certain) of him contributing 53 runs give or take a few. So, the way the average should be measured is very simply by using the Arithmetic Mean of "Runs per inning"!

This would straightaway take care of outliers like someone scoring say, 162, 10, 7*, 0* & 25* have an average of and having a crazy average of 40, instead of a crazy "average" of 102! (contd..)

Posted by waspsting on (March 24, 2013, 11:31 GMT)

The suggested alternatives for stat 'average' are interesting and ideally, I'd like to see one of those, along with the traditinal average when looking at a players stats.

If i had to see just one figure though, i'd go with 'average' as is customary - and feel it is the fairest measure (though not perfect for all the reasons Ananth notes)

not outs count for something. compare Bradman's 299* to Crowe's 299 - there's an aura of unconquered-God-knows-what-he-could-have-gone-on-to about the not in such a case. Note also Lara's 400*.

on a more concrete level, i think taking not outs into account as is done respects WHAT HAPPENED, whereas some of the suggested alternatives that dabble in SPECULATION - on principle, I prefer the former.

Unofficially and subjectively, we can make allowances - e.g. for openers in general or for particular batsmen who tended to attack more while batting with the tail, etc. but officially, I think average remains the most satisfactory measure of a batsman

Posted by   on (March 18, 2013, 2:44 GMT)

What is the average of an innings still being played ? Eg: Phil Huges in 3rd test at the time of this writing ? You can't determine the innings... Its indeterminate. Similarly a "not out" from statistical point of view is an "incomplete" innings - it overlaps multiple team's innings. Its not mathematically absurd. It just means that the numerator and denominator are not available yet. (In IT parlance, data is still being downloaded..!).

RpI is a different measure and Ave is different. Trying to combine the 2 is an arbitrary exercise - instead of taking 50% of "Average for dismissed innings" into account, I can take 25% or 50% and arrive at different results.

Finally, even though a cricket it is a game of stats.. and you can "group" players of similar caliber using stats, trying to find a single "stat to get a perfect "ranking" is an exercise in futility because when you combine different parameters, one can easily take slightly different routes and arrive at different rankings.

Posted by   on (March 15, 2013, 10:40 GMT)

If Lara's overall average was the same as his home average - 59 ( Or 54 for Sehwag) noone would bother with the so-called not out problem.

Lara and Sehwag fans find it difficult to accept the facts : They were as good as anyone in home conditions. But ,with the exception of SL ( For both Lara and Sehwag. If any thing Sehwag has thrashed Murali even more), Lara was simply not as good as his contemporaries away .

Posted by   on (March 15, 2013, 10:35 GMT)

Here's a simple explanation :

If Lara' and Sehwag's flashy shotmaking created great innings on the odd day , the very same led to their getting dismissed more than other batsmen. This simple fact seems to evade attention. So, on the odd occasion when things click we do not attribute it to the way Lara and Sehwag play. But if they get out more often than their contemporaries we must find ways to artificially increase their average.

The correct explanation for Lara's average is that he simply wasn't as good away . Even completely ignoring N.Os - At home is RPI is 56 , away it is 47. At home he was N.O 5 times in fewer innings than away. Again , the away games point to the reality.

Away Lara averages less than 50 in all Test playing nations except for Sri Lanka and Zimbabwe. Less than 40 in some. Lara's reputation is built largely on home performances with very few innings or series performances (Sri Lanka) of note compared to his home performances .

Comments have now been closed for this article

ABOUT THE AUTHOR

Anantha Narayanan
Anantha spent the first half of his four-decade working career with corporates like IBM, Shaw Wallace, NCR, Sime Darby and the Spinneys group in IT-related positions. In the second half, he has worked on cricket simulation, ratings, data mining, analysis and writing, amongst other things. He was the creator of the Wisden 100 lists, released in 2001. He has written for ESPNcricinfo and CastrolCricket, and worked extensively with Maruti Motors, Idea Cellular and Castrol on their performance ratings-related systems. He is an armchair connoisseur of most sports. His other passion is tennis, and he thinks Roger Federer is the greatest sportsman to have walked on earth.

All articles by this writer