# Tackling not-outs, and answering reader queries

First let me explain the reasons for undertaking this whole exercise of extended batting averages:

Let me start by replacing the first para of my article with the following, just to put to bed the Tendulkar v Lara arguments. Consider the following two outstanding batsmen, among the best of their generation.

Batsman | Tests | Innings | Not-outs | Runs | Average |
---|---|---|---|---|---|

Viv Richards | 105 | 182 | 12 | 8540 | 50.24 |

Jacques Kallis | 111 | 189 | 31 | 9197 | 58.31 |

Richards’ average is nearly eight behind Kallis', but is he that far behind? One of the main reasons for the difference in average has been the wide disparity in not-outs between the two, 12 against 31. It might be partly because of the way Richards played, almost always in an attacking mode. Both Richards and Kallis have similar Batting Position Index values - which is the average batting position at which a batsman has batted in - of 4.16 (Richards) and 3.77 (Kallis), indicating almost similar batting positions. This analysis seeks a way to normalise such situations.

Now to respond to some of the comments that came in:

The 1500 runs cut-off wasn’t meant to exclude Vinod Kambli, as someone suggested (Kambli is incidentally one of my favourite players). It was determined that the overall runs per Test for a top-order batsmen was around 75. The 1500 runs meant that one would have played 20 tests, which is a fair number of games. It also allowed me to include Hussey, which ensured further discussion on this phenomenal cricketer. Selecting the top 25 batsmen was again done to allow to include Lara and Pietersen, who were two of the 5 batsmen whose EBA was greater than their Batting Average.

The average of last ten innings could be construed as an arbitrary decision. Come to think of it, if I had taken five innings, it would have seemed too few, while 20 might have seemed too many. Ten innings represents about seven tests, which in turn is a minimum of two Test series.

Chris made a valid point about the order of the first table, stating that it should have been ordered by batting average rather than the EBA. A valid point, and I apologise for overlooking the significance. Unfortunately I had split the EBA-ordered wide table into two smaller ones and should have re-ordered the same.

A number of people have commented that this exercise was not needed since the final EBA table is more or less the same as the batting average table. My argument is that the result does not invalidate the analysis process.

**The question of not-outs**

The extension of not-out innings has attracted the most comments and rightly so. The approach I have taken can be construed as arbitrary. However it must be remembered that what has been done is neither a statistical extension nor a simulation-based computation. It is a fourth-dimension prediction and should be taken as it is. I can only repeat that the EBA should be taken to complement the current and much more understood batting average. The EBA can never be a substitute for batting averages since the common man can neither compute the same on his own nor understand the same easily.

When the concept was first created, the batting average was added to the not-out innings. It was only when I reworked the same concept for this blog did I change it slightly to include current form.

Some of the responses to the not-out issue have been interesting. Stuart says:

A batting average measures the number of runs between dismissals. If you get 20* and 27, that is equivalent to a single innings of 47 for your batting average. It also means you cobbled together 47 runs before you got out, whether it was over two innings or one. As it stands, interpreted correctly, a batting average is a perfect measure and needs no adjustments or fiddling.

That’s a fine analysis, and we could take this as an additional measure.

One of the best alternatives, and quite simple to implement also, was provided by Arvind Agarwal. It is given below.

EBA = Batting Average x (1 - (Not Out Inngs / Total Inngs) ^ 2. The computed values are:
Lara = 52.80 (0.998 x Average)

Sachin = 53.82 (0.980 x Average)

Bradman = 97.93 (0.980 x Average)

Ponting = 58.08 (0.977 x Average)

M Hussey = 82.04 (0.945 x Average)

My gut feel is that Arvind's computations match mine almost completely without getting into any of the not-out extension complications and very easy to compute. Again this has to be taken as an additional measure rather than a replacement of the batting average.

There have been suggestions to take into account the match conditions, bowling attack etc., but it would be too complicated an exercise for this simple task. Similarly, the idea of using weighted averages instead of using the average of the last ten innings is a good one, but it makes the process more difficult and the results difficult to comprehend for the non-statiscally oriented people.

Glossus has suggested considering only those innings in which the batsman was dismissed, and ignoring the not-out innings. The table below has the results for this exercise.

Batsman | Tests | Career average | Out batting average | Extended batting average |
---|---|---|---|---|

Don Bradman | 52 | 99.94 | 83.83 | 97.81 |

Michael Hussey | 18 | 86.18 | 69.05 | 81.34 |

George Headley | 22 | 60.83 | 45.61 | 61.33 |

Herbert Sutcliffe | 54 | 60.73 | 54.64 | 60.54 |

Graeme Pollock | 23 | 60.97 | 54.43 | 59.68 |

Everton Weekes | 48 | 58.62 | 54.88 | 58.53 |

Ricky Ponting | 112 | 59.40 | 49.46 | 58.52 |

Wally Hammond | 85 | 58.46 | 46.19 | 58.43 |

Garry Sobers | 93 | 57.78 | 44.06 | 58.16 |

Ken Barrington | 82 | 58.67 | 50.37 | 58.11 |

Eddie Paynter | 20 | 59.23 | 48.31 | 57.71 |

Jack Hobbs | 61 | 56.95 | 53.34 | 56.52 |

Jacques Kallis | 111 | 58.21 | 42.42 | 56.43 |

Len Hutton | 79 | 56.67 | 47.89 | 56.41 |

Kumar Sangakkara | 68 | 55.74 | 46.16 | 56.26 |

Clyde Walcott | 44 | 56.69 | 51.03 | 56.14 |

Rahul Dravid | 113 | 56.26 | 47.60 | 55.54 |

Mohammad Yousuf | 77 | 55.72 | 48.84 | 55.28 |

Sachin Tendulkar | 141 | 54.94 | 44.33 | 53.90 |

Dudley Nourse | 34 | 53.82 | 47.49 | 53.40 |

Brian Lara | 131 | 52.89 | 49.76 | 52.97 |

Kevin Pietersen | 30 | 52.69 | 50.44 | 52.84 |

Greg Chappell | 87 | 53.86 | 44.57 | 52.79 |

Matthew Hayden | 91 | 52.57 | 49.19 | 52.50 |

Javed Miandad | 124 | 52.57 | 41.97 | 51.62 |

Charles Davis, in his blog , has commented on this computation. Some of the answers to Charles can be found elsewhere in this article. Our first basis was the career average and would probably have been more apt. However I must point out to Charles that the "not exceeding the highest score" idea was only done to prevent extremely high scores, especially when batsmen (like Sangakkara/Yousuf/Kallis) are going through an outstanding run of form. That restriction may not be needed if the career average is used. However I must point out that the standard deviation differential between the career average and last 10 innings, according to Charles himself, is less than 10%. Charles, many thanks for your comments.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

Comments have now been closed for this article

I don't think EBA can be more than normal batting average, as shown for Sir Sobers and other follwoing batsman: Garry Sobers 93 57.78 44.06 58.16 George Headley 22 60.83 45.61 61.33 Brian Lara 131 52.89 49.76 52.97 Kevin Pietersen 30 52.69 50.44 52.84 Kumar Sangakkara 68 55.74 46.16 56.26

Kindly ractify.

There is nothing in the calculation methodology to conclude that the EBA cannot be greater than the normal Average. In fact it can be seen that I have referred to these 5 batsmen in my article.

Ananth

The problem with not outs is that the batsmen's mission is to score as many runs as possible without getting out. So, if a batsmen scored 70 runs and isn't out by the end of the innings it should be worth more then a batsman who scored 70 and is out. Normally the rational for not deducting the inning for averages purposes is "who knows how many more runs he would have scored". I would argue that a better system would value each run in a not-out score as 1.5 times that of a completed score (or any other multiplier). So, that 70 not out may become 105 for statistical purposes. This woulf give greater weight to the accomplishment of both batting tasks - scoring while protecting your wicket.

Hmm, good point Malcolm. I wonder if anyone out there would care to perform a statistical analysis of the top 25 players' MODE, to determine their most likely score on walking out to the middle? It may raise a few eyebrows, not to mention the ire of millions!!

The runs per innings is also deceptive. If you are a number 5 or 6 batsman coming in after four really good batsmen, you could realistically be called on to get small scores or have a large amount of your innings cut off by decalrations. You would then end up with a low average runs per innings. It would not be a true reflection of your talent which is what the average is supposed to be. Obviously there are more sophisticated statistical techniques that could be used to analyse the performance of a player but the average, strike rate and conversion rate that you get are an excellent indication of the quality of the player, remembering, of course that the accuracy of any statistic increases as the number of obvservation increases.

Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.

Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.

Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.

An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground. I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.

In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.

Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...

To complicated for me, i prefer runs/ innings. Nice and simple average and what really matters in the match.

Not if he got out or not but how many did he make.

Wether a team gets 300/0 or 300 all out in an ODI does not really matter in the game; when trying to restrict the chase its just 300 runs.

I don't think EBA can be more than normal batting average, as shown for Sir Sobers and other follwoing batsman: Garry Sobers 93 57.78 44.06 58.16 George Headley 22 60.83 45.61 61.33 Brian Lara 131 52.89 49.76 52.97 Kevin Pietersen 30 52.69 50.44 52.84 Kumar Sangakkara 68 55.74 46.16 56.26

Kindly ractify.

There is nothing in the calculation methodology to conclude that the EBA cannot be greater than the normal Average. In fact it can be seen that I have referred to these 5 batsmen in my article.

Ananth

The problem with not outs is that the batsmen's mission is to score as many runs as possible without getting out. So, if a batsmen scored 70 runs and isn't out by the end of the innings it should be worth more then a batsman who scored 70 and is out. Normally the rational for not deducting the inning for averages purposes is "who knows how many more runs he would have scored". I would argue that a better system would value each run in a not-out score as 1.5 times that of a completed score (or any other multiplier). So, that 70 not out may become 105 for statistical purposes. This woulf give greater weight to the accomplishment of both batting tasks - scoring while protecting your wicket.

Hmm, good point Malcolm. I wonder if anyone out there would care to perform a statistical analysis of the top 25 players' MODE, to determine their most likely score on walking out to the middle? It may raise a few eyebrows, not to mention the ire of millions!!

The runs per innings is also deceptive. If you are a number 5 or 6 batsman coming in after four really good batsmen, you could realistically be called on to get small scores or have a large amount of your innings cut off by decalrations. You would then end up with a low average runs per innings. It would not be a true reflection of your talent which is what the average is supposed to be. Obviously there are more sophisticated statistical techniques that could be used to analyse the performance of a player but the average, strike rate and conversion rate that you get are an excellent indication of the quality of the player, remembering, of course that the accuracy of any statistic increases as the number of obvservation increases.

Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.

Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.

Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.

An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground. I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.

In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.

Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...

To complicated for me, i prefer runs/ innings. Nice and simple average and what really matters in the match.

Not if he got out or not but how many did he make.

Wether a team gets 300/0 or 300 all out in an ODI does not really matter in the game; when trying to restrict the chase its just 300 runs.

I was just wondering what the adjusted stats for Sangakkara the batsman would look like using this method after going through this week's Ask Steven column. He seems to have scored over 2000 runs at a 90+ average. From my calculations, the average drops to around 71 from 96 (including the latest test vs Eng)

An off-topic question: you mentioned players' Batting Position Index. Are there any weighted versions of this stat? It is clear that a player who has played 100 tests at 1 and 100 tests at 11 (batting position index of 6) will not have a similar result as a player who has played 200 times at 6. Do you know of a version that weights this such that the first player would have a position index much closer to 11?

Not sure if this is the correct forum for this 'theory', but why do bowlers' runs conceeded also have to include fielders' mistakes (such as dropped catches, misfields and overthrows)? Why can't something like they have in baseball be instituted? (whenever I bowled I used to loath others causing runs to be added to my tally and by reactions of the internationals I think they are of similar opinion)

As my comments (unpublished) on the previous article, I hope these are only warm-up for you guys. I had mentioned two interesting studies you guys could take up. Here is a third -

What is an indicator for the WORTH of a batsman? Where worth is defined as his contribution to a victory or a battling draw.

No featured comments at the moment.

As my comments (unpublished) on the previous article, I hope these are only warm-up for you guys. I had mentioned two interesting studies you guys could take up. Here is a third -

What is an indicator for the WORTH of a batsman? Where worth is defined as his contribution to a victory or a battling draw.

Not sure if this is the correct forum for this 'theory', but why do bowlers' runs conceeded also have to include fielders' mistakes (such as dropped catches, misfields and overthrows)? Why can't something like they have in baseball be instituted? (whenever I bowled I used to loath others causing runs to be added to my tally and by reactions of the internationals I think they are of similar opinion)

An off-topic question: you mentioned players' Batting Position Index. Are there any weighted versions of this stat? It is clear that a player who has played 100 tests at 1 and 100 tests at 11 (batting position index of 6) will not have a similar result as a player who has played 200 times at 6. Do you know of a version that weights this such that the first player would have a position index much closer to 11?

I was just wondering what the adjusted stats for Sangakkara the batsman would look like using this method after going through this week's Ask Steven column. He seems to have scored over 2000 runs at a 90+ average. From my calculations, the average drops to around 71 from 96 (including the latest test vs Eng)

To complicated for me, i prefer runs/ innings. Nice and simple average and what really matters in the match.

Not if he got out or not but how many did he make.

Wether a team gets 300/0 or 300 all out in an ODI does not really matter in the game; when trying to restrict the chase its just 300 runs.

In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.

Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...

the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground. I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.

Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.

Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.

Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.