Trivia - batting December 3, 2007

Tackling not-outs, and answering reader queries

Readers have written in with replies and suggestions on batting averages, and the extended version

First let me explain the reasons for undertaking this whole exercise of extended batting averages:

  • The purpose was not to replace the conventional batting average. It was a suggestion to complement the batting average.
  • It was not a Tendulkar v Lara article. Their figures were just used for comparison.

    Let me start by replacing the first para of my article with the following, just to put to bed the Tendulkar v Lara arguments. Consider the following two outstanding batsmen, among the best of their generation.

    Richards and Kallis in Tests
    Batsman Tests Innings Not-outs Runs Average
    Viv Richards 105 182 12 8540 50.24
    Jacques Kallis 111 189 31 9197 58.31

    Richards’ average is nearly eight behind Kallis', but is he that far behind? One of the main reasons for the difference in average has been the wide disparity in not-outs between the two, 12 against 31. It might be partly because of the way Richards played, almost always in an attacking mode. Both Richards and Kallis have similar Batting Position Index values - which is the average batting position at which a batsman has batted in - of 4.16 (Richards) and 3.77 (Kallis), indicating almost similar batting positions. This analysis seeks a way to normalise such situations.

    Now to respond to some of the comments that came in:

    The 1500 runs cut-off wasn’t meant to exclude Vinod Kambli, as someone suggested (Kambli is incidentally one of my favourite players). It was determined that the overall runs per Test for a top-order batsmen was around 75. The 1500 runs meant that one would have played 20 tests, which is a fair number of games. It also allowed me to include Hussey, which ensured further discussion on this phenomenal cricketer. Selecting the top 25 batsmen was again done to allow to include Lara and Pietersen, who were two of the 5 batsmen whose EBA was greater than their Batting Average.

    The average of last ten innings could be construed as an arbitrary decision. Come to think of it, if I had taken five innings, it would have seemed too few, while 20 might have seemed too many. Ten innings represents about seven tests, which in turn is a minimum of two Test series.

    Chris made a valid point about the order of the first table, stating that it should have been ordered by batting average rather than the EBA. A valid point, and I apologise for overlooking the significance. Unfortunately I had split the EBA-ordered wide table into two smaller ones and should have re-ordered the same.

    A number of people have commented that this exercise was not needed since the final EBA table is more or less the same as the batting average table. My argument is that the result does not invalidate the analysis process.

    The question of not-outs

    The extension of not-out innings has attracted the most comments and rightly so. The approach I have taken can be construed as arbitrary. However it must be remembered that what has been done is neither a statistical extension nor a simulation-based computation. It is a fourth-dimension prediction and should be taken as it is. I can only repeat that the EBA should be taken to complement the current and much more understood batting average. The EBA can never be a substitute for batting averages since the common man can neither compute the same on his own nor understand the same easily.

    When the concept was first created, the batting average was added to the not-out innings. It was only when I reworked the same concept for this blog did I change it slightly to include current form.

    Some of the responses to the not-out issue have been interesting. Stuart says:

    A batting average measures the number of runs between dismissals. If you get 20* and 27, that is equivalent to a single innings of 47 for your batting average. It also means you cobbled together 47 runs before you got out, whether it was over two innings or one. As it stands, interpreted correctly, a batting average is a perfect measure and needs no adjustments or fiddling.

    That’s a fine analysis, and we could take this as an additional measure.

    One of the best alternatives, and quite simple to implement also, was provided by Arvind Agarwal. It is given below.

    EBA = Batting Average x (1 - (Not Out Inngs / Total Inngs) ^ 2. The computed values are: Lara = 52.80 (0.998 x Average)
    Sachin = 53.82 (0.980 x Average)
    Bradman = 97.93 (0.980 x Average)
    Ponting = 58.08 (0.977 x Average)
    M Hussey = 82.04 (0.945 x Average)

    My gut feel is that Arvind's computations match mine almost completely without getting into any of the not-out extension complications and very easy to compute. Again this has to be taken as an additional measure rather than a replacement of the batting average.

    There have been suggestions to take into account the match conditions, bowling attack etc., but it would be too complicated an exercise for this simple task. Similarly, the idea of using weighted averages instead of using the average of the last ten innings is a good one, but it makes the process more difficult and the results difficult to comprehend for the non-statiscally oriented people.

    Glossus has suggested considering only those innings in which the batsman was dismissed, and ignoring the not-out innings. The table below has the results for this exercise.

    Out batting average, and extended batting averages
    Batsman Tests Career average Out batting average Extended batting average
    Don Bradman 52 99.94 83.83 97.81
    Michael Hussey 18 86.18 69.05 81.34
    George Headley 22 60.83 45.61 61.33
    Herbert Sutcliffe 54 60.73 54.64 60.54
    Graeme Pollock 23 60.97 54.43 59.68
    Everton Weekes 48 58.62 54.88 58.53
    Ricky Ponting 112 59.40 49.46 58.52
    Wally Hammond 85 58.46 46.19 58.43
    Garry Sobers 93 57.78 44.06 58.16
    Ken Barrington 82 58.67 50.37 58.11
    Eddie Paynter 20 59.23 48.31 57.71
    Jack Hobbs 61 56.95 53.34 56.52
    Jacques Kallis 111 58.21 42.42 56.43
    Len Hutton 79 56.67 47.89 56.41
    Kumar Sangakkara 68 55.74 46.16 56.26
    Clyde Walcott 44 56.69 51.03 56.14
    Rahul Dravid 113 56.26 47.60 55.54
    Mohammad Yousuf 77 55.72 48.84 55.28
    Sachin Tendulkar 141 54.94 44.33 53.90
    Dudley Nourse 34 53.82 47.49 53.40
    Brian Lara 131 52.89 49.76 52.97
    Kevin Pietersen 30 52.69 50.44 52.84
    Greg Chappell 87 53.86 44.57 52.79
    Matthew Hayden 91 52.57 49.19 52.50
    Javed Miandad 124 52.57 41.97 51.62

    Charles Davis, in his blog , has commented on this computation. Some of the answers to Charles can be found elsewhere in this article. Our first basis was the career average and would probably have been more apt. However I must point out to Charles that the "not exceeding the highest score" idea was only done to prevent extremely high scores, especially when batsmen (like Sangakkara/Yousuf/Kallis) are going through an outstanding run of form. That restriction may not be needed if the career average is used. However I must point out that the standard deviation differential between the career average and last 10 innings, according to Charles himself, is less than 10%. Charles, many thanks for your comments.

    Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

  • Comments have now been closed for this article

    • testli5504537 on March 10, 2008, 5:10 GMT

      I don't think EBA can be more than normal batting average, as shown for Sir Sobers and other follwoing batsman: Garry Sobers 93 57.78 44.06 58.16 George Headley 22 60.83 45.61 61.33 Brian Lara 131 52.89 49.76 52.97 Kevin Pietersen 30 52.69 50.44 52.84 Kumar Sangakkara 68 55.74 46.16 56.26

      Kindly ractify.

      There is nothing in the calculation methodology to conclude that the EBA cannot be greater than the normal Average. In fact it can be seen that I have referred to these 5 batsmen in my article.


    • testli5504537 on January 14, 2008, 13:06 GMT

      The problem with not outs is that the batsmen's mission is to score as many runs as possible without getting out. So, if a batsmen scored 70 runs and isn't out by the end of the innings it should be worth more then a batsman who scored 70 and is out. Normally the rational for not deducting the inning for averages purposes is "who knows how many more runs he would have scored". I would argue that a better system would value each run in a not-out score as 1.5 times that of a completed score (or any other multiplier). So, that 70 not out may become 105 for statistical purposes. This woulf give greater weight to the accomplishment of both batting tasks - scoring while protecting your wicket.

    • testli5504537 on December 19, 2007, 20:54 GMT

      Hmm, good point Malcolm. I wonder if anyone out there would care to perform a statistical analysis of the top 25 players' MODE, to determine their most likely score on walking out to the middle? It may raise a few eyebrows, not to mention the ire of millions!!

    • testli5504537 on December 19, 2007, 10:59 GMT

      The runs per innings is also deceptive. If you are a number 5 or 6 batsman coming in after four really good batsmen, you could realistically be called on to get small scores or have a large amount of your innings cut off by decalrations. You would then end up with a low average runs per innings. It would not be a true reflection of your talent which is what the average is supposed to be. Obviously there are more sophisticated statistical techniques that could be used to analyse the performance of a player but the average, strike rate and conversion rate that you get are an excellent indication of the quality of the player, remembering, of course that the accuracy of any statistic increases as the number of obvservation increases.

    • testli5504537 on December 19, 2007, 9:19 GMT

      Precisely my point, Malcolm. In terms of expectation from an innings when a batsman walks in, you should be expecting a modal value, which in all batsmen's case whether a Lara or a Harmison is less than 20.

      Batting averages are higher or lower depending on 3 factors - a) The shape of the distribution - while the U shape holds in general, for the better batsmen, the % of cases in the 10s, 20s and 30s tends to be higher, b) The really high scores, and I find this is a huge influencer - the really great batsmen tend to run up very high scores which drives their avergae and c) the proportion of not-out innings.

      Of the three, only a) the shape of the distribution really influences the modal value. My point being that if you compare someone with an average of 40 with someone with a 55, your start expectations may be significantly different; but between 55 and 57, your start expectations should be no different.

    • testli5504537 on December 19, 2007, 7:05 GMT

      An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

    • testli5504537 on December 19, 2007, 7:03 GMT

      An average is also the sum of observations multiplied by the proability of that observation occuring ie the sum of all (x*P(x)). So while the probability of scoring on or around the average is probably very low, the average does take into the consideration that you might, as the fielding team, spend a few days watching a Lara compile a 400 run innings. To determine the plausibility of the statistic, you should ask your self, if you were fielding captain and Lara walked in, would you be expecting the mode (the value with the hghest probability) or the average.

    • testli5504537 on December 18, 2007, 21:52 GMT

      the average should suggest to the spectator how many runs a particular player is likely to make before he leaves the field.. After all, having made 25 not out twice is less likely to help his team win than a battling fifty, followed by a 0 not out. both players would have an average of fifty but everyone would know that once player A got into the twenties, he would be on shaky, unfamiliar ground. I think therefore that Runs Per Innings is fairer to spectators, rather than the deceitful Average that is currently employed.

    • testli5504537 on December 18, 2007, 16:58 GMT

      In many distributions the average as a measure of central tendency not only provides the 'average value' i.e. the the area under the curve divided by the no of instances, but is also a good predictor of the most likely value.

      Batting averages though mean nothing of the sort. Batsman's scores invariably have an inverted bell (of U shaped) distribution - the greatest # of outs is single digit, followed by a large number before the batsman crosses twenty. Then you get a few instances around the actual 'batting average' and again the # of instances goes up towards the higher scores. This holds for every batsman, and Lara's and Dravid's distributions are not that dissimilar as you would expect. So the least probability is actually around the batting average. So if Ponting has an 'average' of 59, he is actually least likely to score b/w 50 - 60. So not really sure what the average indicates for a given innings. At best it can be used the way we use it - comparing greats across time...

    • testli5504537 on December 18, 2007, 13:02 GMT

      To complicated for me, i prefer runs/ innings. Nice and simple average and what really matters in the match.

      Not if he got out or not but how many did he make.

      Wether a team gets 300/0 or 300 all out in an ODI does not really matter in the game; when trying to restrict the chase its just 300 runs.

    • No featured comments at the moment.