June 16, 2010

Achieving the right consistency - I

A comprehensive statistical analysis of consistency for Test batsmen
39

 Mark Richardson wasn't the most attractive batsman, but with him you knew, more than with any other player, what you were going to get © Getty Images

My first few analyses for It Figures are all going to be broadly about the same thing, and that thing could broadly be called consistency. I'll bet that, at some time or other, everyone reading this post has criticised a cricketer for being inconsistent. I've done it myself but, whenever I have, I've had a nagging doubt: is performing brilliantly in one match and terribly in the next really any worse (or better) than being moderately good in two games on the trot? Maybe some stats can help us to unpick this issue.

I'm going to start by looking at batsmen. More specifically, my focus, in this first post, is batsmen's innings-to-innings consistency. If Batsman A has scores of 0, 138, 11, 0, & 101, and Batsman B has scores of 52, 50, 45, 48, & 55, then they both have the same average (50.00). However, there's a very obvious difference between the ways in which they've achieved the mark that we won't appreciate, if we concentrate on the average alone.

There are two big questions here, for me: (i) is it possible and instructive to identify batsmen with more or less consistent careers, and to quantify how much variability their records show? and (ii) does it matter? Is there any way in which a run of scores like Batsman A's is demonstrably better or worse - for himself and/or his team - than that of Batsman B?

Mister Hugely Reliable

S Rajesh comes close to answering the first of my questions in this It Figures avant la lettre column from 2006. He proposed a consistency index that is derived by dividing a batsman's average by the standard deviation (SD) of runs scored in each of his innings. I think he's on exactly the right lines, here, but I think the index can be improved in two ways. Firstly, I'm twitchy about combining one measure - the batting average - that makes an adjustment for not-out innings with another - the SD of the same dataset - that does not. For this reason, I'd rather rely on simple runs-per-innings (RPI), in this context. This way, both halves of the sum are quantifying the same thing and, although both may be affected by not-out innings, they are both affected equally. The second modification I have made is to turn the sum upside-down, so we have SD divided by RPI. Mathematically, this makes no difference to the ranking of results (although it means that low numbers, rather than high ones, indicate greater consistency).

The advantage of doing these two things is that the number you end up with has a solid interpretation: it is the percentage of deviation around the mean that is observed, on average, throughout the dataset. Dividing the SD by the mean is a trick statisticians use quite often; they call the result the coefficient of variation (CoV). As Rajesh pointed out, it's important to perform this scaling, rather than concentrating on SDs on their own, otherwise the batsmen who score most runs will always appear to have more variability in their records. A batsman with scores of 5, 30, and 100 has the same CoV as one with scores of 10, 60, and 200, though they have very different SDs.

So much for the theory; what about the results? Table 1 shows the batsmen who have been most and least consistent on an innings-to-innings basis throughout Test history, with a few notable figures picked out from the middle of the table.

Top of the lot is Kiwi opener Mark Richardson. He may not have set the world alight compared to some of his dashing contemporaries, but his solidity as an opening batsman can easily be overlooked: he reached double figures in 80% of his Test innings (a very high proportion, as noted in another Numbers Game a few years ago), and only ever registered one duck. What stopped him from threatening the real top rank of the game was that, though he'd seldom get out cheaply, he was also pretty unlikely to score very heavily, as a total of four centuries from 65 innings and a top score of 145 attests. These characteristics are perfect for a low CoV, because they imply that a large majority of his innings fell in a relatively tight range in the middle of possible scores. Cricket will always find a way of surprising you but, to a greater extent than with any other batsman, you knew what you were going to get from Richardson.

 Name M I R Ave RPI SD CoV 1. MH Richardson 38 65 2,776 44.77 42.71 35.16 0.823 2. H Sutcliffe 54 84 4,555 60.73 54.23 45.24 0.834 3. TL Goddard 41 78 2,516 34.47 32.26 27.11 0.840 4. SM Katich 49 85 3,792 48.00 44.61 38.72 0.868 5. MS Dhoni 43 66 2,428 42.60 36.79 32.36 0.880 6. JB Hobbs 60 102 5,410 56.95 53.04 46.68 0.880 7. IR Redpath 66 120 4,737 43.46 39.48 34.89 0.884 8. JB Stollmeyer 32 56 2,159 42.33 38.55 34.20 0.887 9. PE Richardson 34 56 2,061 37.47 36.80 32.86 0.893 10. A Ranatunga 93 155 5,105 35.70 32.94 29.44 0.894 ... 32. JH Kallis 136 229 10,760 54.62 46.99 44.95 0.957 ... 35. AR Border 156 265 11,174 50.56 42.17 40.49 0.960 ... 47. KP Pietersen 62 111 5,166 49.20 46.54 45.54 0.978 ... 56. DG Bradman 50 80 6,996 99.94 87.45 86.65 0.991 ... 85. RS Dravid 138 238 11,372 54.15 47.78 48.91 1.024 ... 97. RT Ponting 143 241 11,828 55.27 49.08 50.89 1.037 ... 107. Inzamam-ul-Haq 119 198 8,829 50.16 44.59 46.63 1.046 108. SR Tendulkar 166 271 13,447 55.57 49.62 51.92 1.046 ... 113. IVA Richards 121 182 8,540 50.24 46.92 49.43 1.053 ... 115. SM Gavaskar 124 214 10,122 51.12 47.30 49.96 1.056 116. SR Waugh 168 260 10,927 51.06 42.03 44.51 1.059 ... 131. GS Sobers 93 160 8,032 57.78 50.20 54.02 1.076 ... 226. BC Lara 130 230 11,912 53.18 51.79 62.43 1.205 ... 238. V Sehwag 75 128 6,608 53.72 51.63 64.71 1.254 ... 245. DW Randall 47 79 2,470 33.38 31.27 40.90 1.308 246. Zaheer Abbas 78 124 5,062 44.80 40.82 54.00 1.323 247. SE Gregory 58 100 2,282 24.54 22.82 30.33 1.329 248. LG Rowe 29 49 2,047 43.55 41.78 55.98 1.340 249. GJ Whittall 46 82 2,207 29.43 26.91 36.45 1.354 250. DL Amiss 50 88 3,612 46.31 41.05 55.74 1.358 251. MS Atapattu 90 156 5,502 39.02 35.27 49.93 1.416 252. Mohammad Ashraful 55 107 2,306 22.39 21.55 30.70 1.425 253. Wasim Akram 104 147 2,898 22.64 19.71 28.15 1.428 254. MH Mankad 44 72 2,109 31.48 29.29 46.06 1.572 qual. 2,000 Test runs; complete list available here

I was slightly surprised to see MS Dhoni riding high in this list. His reputation is for a more free-spirited kind of play than might be expected to generate a low CoV. But it turns out that any such assumptions do him a bit of a disservice: his Test record is that of a reliable runscorer, rather than a hit-or-miss gunslinger. Simon Katich's presence next to him is perhaps more in keeping with his reputation.

It is intriguing to see both Herbert Sutcliffe and Sir Jack Hobbs in the top half-dozen of this list. There could surely be no firmer foundation for a partnership as successful as theirs than the kind of shared dependability this statistic suggests. If they both had more mercurial profiles then, though they each might have scored as many runs, they would have been unlikely to have shared so many significant partnerships.

The fact that Jacques Kallis has fallen down the list somewhat compared to Rajesh's analysis is, to a small extent, a reflection of my slightly different methods, but it's more to do with the fact that his record has become a wee bit more inconsistent in the 4 years since Rajesh wrote his column.

According to this analysis, the least consistent batsman in Test history is Vinoo Mankad. His career has the opposite profile to Mark Richardson's: there is a very high proportion of low scores in his record (he only got into double figures 57% of the time) but, when he got in, he often went on to score big hundreds (including two doubles in one series against New Zealand in 1955/56). In contrast to Richardson's reliable-but-unspectacular record, Mankad's performances were an awful lot less predictable.

Wasim Akram's position at the bottom of the list is very largely ascribable to the effect of one mammoth score of 257* in the midst of a dataset that characteristically reflects a much more modest level of achievement (there's a good argument for calling this the most out-of-character innings in Test history, as discussed in a recent Ask Steven). If that one innings is excluded from his record, his CoV reverts to a much more run-of-the-mill 1.119.

Marvan Atapattu's status is probably not surprising for a man who started his Test career with a famous string of failures, but ended up with 6 double-centuries under his belt.

So...?

The unanswered question I find most intriguing is whether, in the grand scheme of things, any of this matters. As cricket fans, we're quite used to berating inconsistent batsmen ("you never know what you're going to get: one day, he's brilliant; the next, he couldn't buy a run") but, then again, we may have a paradoxical tendency to look down our noses at those with the least variable records ("he's good at getting in, but he never goes on to register a matchwinning score"). Is either of these positions more justifiable than the other?

I've come up with two ways of answering this question. The first is to examine whether consistent batsmen, ultimately, score more runs than their more mercurial counterparts. It's all very well to invent hypothetical 50-averaging batsmen with consistent and inconsistent records, like I did in my introduction, but it may be that, in the real cricketing world, batsmen with one profile or the other are more likely to achieve a decent average.

To explore this, I used a statistical technique called regression (to be more precise: univariate ordinary least squares linear regression), which enables us to assess the relationship between two variables. The results are shown in Figure 1. Each batsman's CoV is plotted against his average, with the typical relationship between the two (the regression line) indicated by the red dotted line. You can see that, although there's an awful lot of scatter around the trend, the datapoints generally appear to line up with a slight downwards slope. This suggests that there is a weak but identifiable association between the two variables, with more consistent batsmen tending to average slightly more (for any statsheads, that means that r 2 is a pretty dismal 0.065, but p<0.001 for the slope coefficient).

 Fig 1 Association between consistency (coefficient of variation) and success (average) for Test batsmen © Gabriel Rogers

Clearly, there are plenty of examples that do not fit the general trend too well, but it appears that, on average, consistency is associated with higher runscoring. Actually, a more pronounced correlation would have been surprising, because we didn't see a very obvious hierarchy in the consistency list - no one is suggesting that Mark Richardson was, in any meaningful way, a better batsman than Brian Lara. Nevertheless, it does seem to be the case that consistency is, by and large, a positive thing for individual batsmen. This may seem like an obvious finding, but I don't think it's been demonstrated before.

My second way to assess the value of batting consistency was to see whether it has a positive effect for the team. So I looked to see if there's any correlation between each batsman's CoV and his record of winning matches. I did this in exactly the same way, plotting one variable against the other, and drawing a univariate regression line through the results. For Test match cricket, there was a very weak, but still detectable, association between CoV and percentage of matches won (r 2=0.015; p=0.005); this vaguely suggests that, the more consistent a batsman is, the more likely he is to be on the winning side. It's a pretty unsatisfactory analysis, though, with an awful lot of noise around the hint of a signal. What I was more interested to find is that the correlation gets quite a bit stronger when, instead of winning record, you look at each batsman's not-losing record. The results of this analysis are shown in Figure 2. You can see a relatively shallow, but pretty obvious, upwards slope to the dataset, showing that, on average, the most consistent batsmen are also those who have lost the lowest proportion of the Test matches in which they have played.

 Fig 2 Association between consistency (coefficient of variation) and losing record for Test batsmen © Gabriel Rogers

The fact that consistency is associated with not-losing more strongly than it is with winning suggests that consistent batsmen really come into their own when it comes to securing draws for their teams. (And, indeed, regressing CoV against draw-rate produces a strongly significant result [p=0.002].) So, if you've got a team packed with consistent batsmen, you might not win too many more games, but you might draw some that less consistent teams would lose. I'm not quite sure how to explain this finding in cricketing terms; if you've got any bright ideas, please feel free to comment!

Once again from the top in pyjamas

The remainder of this post repeats the above analysis for ODI cricket.

Table 2 lists the most and least consistent batsmen in ODIs, The list is topped by Australia's two great "finishers" - Michaels Hussey and Bevan. We're used to seeing them high on lists of ODI stats, but it's worth remembering that - because CoV, as I have calculated it, relies on RPI rather than average - the high number of not-outs in each of their records has no direct influence on their excellent consistency ratings. Plenty of players have higher RPIs that these two; it's only once not-outs are factored in that their averages rise so high (although that doesn't necessarily mean the not-outs inflate their average, as is often assumed; Charles Davis has done good work on this). Accordingly, it is notable that consistency stats for these two players agree with their conventional records: I conclude that it was the dependability - as much as the volume - of their contributions that marked them out as matchwinners for their team.

 Name M I R Ave RPI SD CoV 1. MG Bevan 232 196 6,912 53.58 35.27 25.60 0.726 2. MEK Hussey 137 113 4,029 53.01 35.65 26.04 0.730 3. RR Sarwan 156 146 5,098 43.95 34.92 27.98 0.801 4. AH Jones 87 87 2,784 35.69 32.00 25.72 0.804 5. NH Fairbrother 75 71 2,092 39.47 29.46 23.81 0.808 6. IR Bell 78 76 2,483 35.47 32.67 26.45 0.810 7. GP Thorpe 82 77 2,380 37.19 30.91 25.04 0.810 8. CG Greenidge 128 127 5,134 45.04 40.43 33.01 0.817 9. AJ Lamb 122 118 4,010 39.31 33.98 27.87 0.820 10. RG Twose 87 81 2,717 38.81 33.54 27.54 0.821 11. Javed Miandad 233 218 7,381 41.70 33.86 27.81 0.821 12. DM Jones 164 161 6,068 44.62 37.69 31.03 0.823 ... 17. Zaheer Abbas 62 60 2,572 47.63 42.87 35.58 0.830 ... 20. ML Hayden 160 154 6,131 44.11 39.81 33.38 0.838 21. GC Smith 153 151 5,732 40.37 37.96 31.86 0.839 ... 24. S Chanderpaul 261 245 8,648 41.78 35.30 29.96 0.849 ... 26. Inzamam-ul-Haq 374 348 11,701 39.53 33.62 28.58 0.850 ... 28. MS Dhoni 159 141 5,246 50.44 37.21 31.79 0.855 29. JH Kallis 298 284 10,809 46.59 38.06 32.68 0.859 ... 31. Mohammad Yousuf 275 261 9,458 42.80 36.24 31.25 0.862 32. RS Dravid 335 309 10,644 39.42 34.45 29.71 0.863 ... 40. KP Pietersen 96 86 3,202 45.10 37.23 32.83 0.882 ... 49. RT Ponting 341 332 12,623 42.79 38.02 34.11 0.897 ... 55. IVA Richards 187 167 6,721 47.00 40.25 36.45 0.906 ... 74. BC Lara 295 285 10,348 40.90 36.31 34.28 0.944 ... 105. SR Tendulkar 442 431 17,598 45.12 40.83 40.14 0.983 ... 147. BB McCullum 171 145 3,569 29.02 24.61 26.77 1.088 148. DI Gower 114 111 3,170 30.78 28.56 31.17 1.091 149. Mohammad Ashraful 157 151 3,298 23.90 21.84 24.09 1.103 150. Kapil Dev 225 198 3,783 23.79 19.11 21.10 1.105 151. WW Hinds 119 111 2,880 28.51 25.95 28.83 1.111 152. RS Kaluwitharana 189 181 3,711 22.22 20.50 22.79 1.112 153. VVS Laxman 86 83 2,338 30.76 28.17 31.66 1.124 154. L Vincent 102 99 2,413 27.11 24.37 28.05 1.151 155. H Masakadza 95 95 2,601 28.58 27.38 32.86 1.200 156. KO Otieno 89 87 2,016 23.44 23.17 28.16 1.215 qual. 2,000 ODI runs; complete list available here

It's surprising to see Ian Bell so high in the list (actually, it's kinda surprising to learn that Ian Bell has 2,000 ODI runs). His reputation may not suggest limited-overs strength but, if you look at his record, it's clear that he has very seldom failed completely in ODIs. Of course, there's a relative lack of dramatic successes, too, and we've already seen that these two characteristics tend to produce a low CoV.

Another unexpected finding is that Sachin Tendulkar's ODI record is not an especially consistent one. To an extent, this is just because his scores encompass such a wide range. Obviously, Tendulkar has a much higher proportion of big scores in his record than someone like Ian Bell. Less predictably, however, he also has a higher proportion of cheap dismissals, and it is the swing from one extreme to the other that produces a higher CoV. Tendulkar's customary position at the top of the ODI order may provide part of the explanation for both of these features: if he hits his stride in any given innings, he will have full opportunity to score plenty of runs, and this may not be true of those who bat lower in the order; on the other hand, he also has the challenge of facing the new ball with close catchers in place - something that may raise his probability of early dismissal. It is notable that there are relatively few opening batsmen amongst those with the lowest CoVs (although I checked, and this is not a systematic bias in the dataset: batting position is not, in itself, predictive of consistency).

In contrast, it is anything but a shock to see Tendulkar's teammate, VVS Laxman, near the foot of the table: has any batsman more dramatically personified sublime-to-the-ridiculous swings of achievement in the ODI era?

I don't know if you've noticed it, but I think there's quite a difference between the ODI consistency list and the Test match equivalent, in Table 1, above. It seems to me that - the odd diminutive ginger anomaly aside - there is some sort of hierarchy going on in the ODI list. Players in the top part of the table are, by and large, better than those who come lower down. So we might expect to see a more pronounced correlation between ODI CoV and batting average than was the case in the analogous Test analysis. Any such expectation would be right on the money, as Figure 3 shows. There is a fairly strong association between the two variables (r 2=0.196; p<0.001), with a clear downward trend, suggesting that lower CoVs - indicating greater consistency - are associated with higher averages. I'm pretty sure that there is no mathematical reason why consistency should appear to be more valuable in ODI cricket than it is in the five-day game. If anyone can think of a cricketing reason, please do put it in the comments.

 Fig 3 Association between consistency (coefficient of variation) and success (average) for ODI batsmen © Gabriel Rogers

And, if it's positive for individual batsmen to be more consistent in their ODI runscoring, you'd expect their teams to see some benefit. As Figure 4 shows, this would appear to be another reasonable inference: batsmen with lower CoVs are, on average, those who win most ODIs (r 2=0.163; p<0.001).

 Fig 4 Association between consistency (coefficient of variation) and winning record for ODI batsmen © Gabriel Rogers

Conclusions

So what if, playground style, you're offered first pick between two batsmen with different records? Given a choice between a consistent batsman and an inconsistent one who averages more, you'd be a fool not to go for the one with the higher average. However, if your choice was between two batsmen with similar averages but different CoVs, I'd go for the more consistent one every time, as a result of what I've learned in this analysis. My expectation would be that he'd help me win - or at least not lose - a greater proportion of my matches. What is more, if I was provided with no information at all about the batsmen's averages, but did know their CoVs, I'd favour whoever had the more consistent record, because it would be a reasonable - though far from infallible - guess that he'd also have the better average.

One corollary of this conclusion may be that "matchwinning" performers are a bit of a myth. In the future, I want to do some work on whether there is such a thing as a true matchwinner in cricket (what analysts in other sports sometimes refer to as clutch players) but, with this analysis as my starting point, my provisional view is that the kind of batsman who quietly gets on with contributing on a match-to-match basis may be of at least as much value as one who has an exceptional game once in a while.

I've got some related posts coming up about consistency among bowlers, and swings of form over longer periods.

All stats calculated Jun 10, 2010 (i.e. all Tests up to England v Bangladesh at Manchester, Jun 4-6, 2010 [Test # 1959] and all ODIs up to Zimbabwe v Sri Lanka at Harare, Jun 9, 2010 [ODI # 2990]).

Technical appendix

Anyone who isn't incredibly fascinated by statistical methods doesn't need to read this bit, but I like to give a precise account of what I've done, in case anyone cares.

Technical note #1. My SDs are calculated as population SDs (i.e. if we're going to be really geeky, I have not adopted Bessel's correction). The reason for this is that, in batsmen's complete careers, we're dealing with all the observations that are available to us. This is unusual, for a statistician: normally, we have a limited sample of observations from which we want to draw inferences about a wider population (ask 1,000 people how they intend to vote, and you can predict what the entire electorate is going to do; give 500 people a drug, and you can tell how effective it'll be for everyone... that sort of thing). Here, though, the data we have is all we're going to get, so it's appropriate not to use the tiny correction that's normally strictly necessary. If anyone wants to replicate my analyses in Excel or Access, you need to use the StDevP function, not the normal StDev. If anyone else read the foregoing, didn't understand a word of it, but wonders whether it makes a difference to my outputs, the answer is no: the effect is tiny, but I believe it's more correct, so that's what I've done.

Technical note #2. Any statisticians reading this analysis might have been slightly concerned that the regression I presented in Figure 1 is unduly influenced by the highest-averaging batsmen (a phenomenon statisticians refer to as leverage). I did some sensitivity analyses that established that this isn't the case (p remains <0.001 when the dataset is restricted to batsmen averaging <60 or <50).

Technical note #3. Another concern statisticians might have with my regressions is that I've picked two covariates of consistency and analysed them separately (univariately). A more comprehensive model would be a multivariate one - that is, one that bundles everything up in the same analysis. So I did that. For Test cricket, I regressed CoV against average, losing percentage, and an interaction term. The only significant covariate was average (p<0.001). This suggests that the reason more consistent batsmen lose fewer Test matches is that they average more: there's no independent effect of consistency on not-losing. For ODIs, the multivariate results - regressing CoV against average, winning percentage, and an interaction term - are more interesting: all three covariates come up p<0.05 (and r 2 rises to 0.415). This suggests that more consistent batsmen are likely to win more games even if they don't average any higher, and the significant interaction term indicates that, the more games you win, the more consistency is of value in raising your average.

Technical note #4. The super eagle-eyed may have noticed that the scatterplots showing CoV -v- Average are rather bushier than those showing CoV -v- Results. This is because winning (or losing) percentages are constrained at both ends, and I found that the good number of players have won or lost all of their games were skewing results around, somewhat. Accordingly, all the CoV -v- Results analyses are limited to players with 40 or more innings. As a rule, I don't like doing this because, although tiny samples can produce weird results, their weirdness should balance out on either side of the average, so it's my preference to use all the data that's going. In this instance, though, I found I got much more sensible results by adopting an artificial constraint.

Technical note #5. I take the view that Australia v. ICC World XI, 14–17 October 2005, was not a Test match; similarly, these games are never included in my ODI stats.

• Gabriel Rogers on June 24, 2010, 11:05 GMT

Thanks all for some interesting comments. Here are a few quick responses to some that I think raise particularly important issues. More than one contributor has tumbled to what - I am sure - is the correct explanation for my finding that consistency appears to be more strongly correlated with success in ODI cricket than it is in Tests. The reason is that good players are more likely to have their innings interrupted (to finish not-out; this is what keynotespeaker refers to as "right-censoring"). This may also explain why Sachin Tendulkar's consistency rating was surprisingly low. If, on four successive occasions on which he opens an ODI innings, Tendulkar is playing well enough to score 20, 42, 60, and 158, he is likely to score... well... 20, 42, 60, and 158. Now, if Mike Hussey is also playing well enough to score 20, 42, 60, and 158, but is batting in his customary position lower down the order, he might well end up with scores of 20, 17*, 60, and 43*. Because his scores are artificially compressed into a much narrower range, he would have a lower CoV (even though each player had the capacity to score exactly the same number of runs, and even though the batting averages, for those two sequences, are identical). This is a slightly exagerrated example, but it shows why batting down the innings, in ODIs, might be expected to lead to lower CoVs.

This is exactly what David Barry suggests. But it turns out that the story isn't all that conclusive. I did exactly what David asked, and split ODI records according to batting position. And, it's true, there is a slightly stronger correlation between CoV and average for batsmen at nos. 3-6, but it really isn't that much more pronounced than that seen in opening batsmen - see the graph here. Importantly, there's still a clear trend for more consistent opening batsmen to have higher averages so, although the effect amongst middle-order batsmen may HAve a bit more influence, it certainly isn't driving the finding exclusively.

Sanchez: The fella with an average over 60 and CoV near zero is Bermudan Chris Douglas, who scored 69 and 53 in his two ODIs.

Rajat: you raise some really interesting points, to which I can't do full justice, here. In brief, my views are that you're completely right about #2 (I do have nonparametric methods for doing this sort of analysis, but it's way beyond the scope of this kind of column to introduce them here). You're arguably right about #4 (though it'd be an interesting argument, and I'd still fight my corner). You're not necessarily right about #3 - whether variance is increased or decreased will depend on which particular innings have been interrupted; on the whole, it will tend towards decreasing. What you suggest in #5 might be an interesting alternative analysis, but it'd be computationally tiresome, and I don't see why it's necessarily any more correct than what I've done. Your really important criticism is #1. If you're right about that, my whole analysis is on very dodgy ground. I don't think you are right about it, though. Dividing SD by RPI, to calculate CoV, simply produces a dimensionless measure of variance. It would be absolutely inappropriate to regress RPI against SD, because those two measures are always going to be strongly correlated, even in the presence of constant levels of performance. But CoV is SD scaled to account for this relationship (effectively, to turn it into a percentage). So it isn't true to say that "higher RPIs will lead to lower CoVs by definition", because higher levels of scoring also produce higher SDs, with the net result that CoVs will only be related to RPIs (or averages) in the event that higher-scoring batsmen tend to be more consistent. As a simple demonstration of this, note that there's plenty of players with lower CoVs than Bradman's; he is so far ahead of the pack when it comes to runscoring expectation that, if high RPIs automatically translated into low CoVs, he'd be certain to top the list. In fact, he's below, say, Shaun Pollock and Dwayne Bravo.

Deepak: forgive me, but all you've demonstrated there is that, if you take a measure that doesn't fit your ideas, and multiply it by batting average, then you end up with a list that looks more like the batting average. Well, yeah, you do.

Singhe: I wasn't especially surprised by Cook's high ranking (not least because his stats are, in many ways, eerily similar to Mark Richardson's). I'm sure England fans will recognise him as a prime example of the hypothetical "Batsman B" in my introduction: his infamous habit of getting out for 60 reflects just this status.

Souvik: undoubtedly, yes, you could come up with better predictors of match outcome (though I'm afraid I don't really understand your suggestion). That wasn't the purpose of this analysis, though.

Chetan Makani: you get the spotter's badge! There was a small mistake in my query which ended up with M equalling matches in which the given batsmen batted, and not matches in which he'd played. Hence the error with number of games for Bradman (and any other player who played in matches in which he didn't bat). The key numbers for the analysis are unaffected, though.

• Engle on June 21, 2010, 2:52 GMT

Consistency is most important for the opening batsmen over all other batting positions.

• Squishy on June 21, 2010, 1:51 GMT

I believe that in a past issue of It Figures, someone ran an analysis looking at what 'not-out' figures were likely to have been if the innings had been played to completion. It'd be interesting to use those figures (though not necessarily statistically correct), to replace the not-out RPI scores in your database, and see if the consistency achieved by middle-order batsmen in ODIs is altered, or if the consistency / average correlation is still seen.

• James on June 20, 2010, 22:11 GMT

A bit late on this, but I'd love to see a similar study but with bowler economy rates in ODIs. I have a theory that the current Aussie attack is very inconsistent. Doug Bollinger and Mitchell Johnson seem to be brilliant one game (or 2 or 3 :), rubbish the next, Bollinger in particular. I'm not sure those are the kind of bowlers you want to build an attack around. Just an idea anyway...

• Souvik on June 20, 2010, 17:09 GMT

Gabriel, Nice analysis. But I think r^2 =0.015 is indicative of a very low S/N. For test matches, perhaps RPI is not the best of measures to gauge a batsman's contribution to a "non-loss" ... for test cricket, we should go back to some old school measures ..... MPI ... minutes spent per innings and WLWC ... wickets lost while at the crease .... do this for batsmen who have batted no. 6 or higher in 40 or more completed second innings (... See Moredeclared or all out), so we ignore all rain affected matches. A cross plot between second innings MPI and WLWC should be a much better measure of batsmen who "save" test matches. My guess is there will be a negative correlation (less wickets for more minutes) for guys like Ken Barrington, Rahul Dravid, Steve Waugh, Jimmy "Padams", Chanderpaul, possibly Hazare, and some others. Take the names with the highest r^2 and plot the win loss ratio for only those games for which the correlation was performed. You ought to see a positive correlation.

• Chetan Makani on June 20, 2010, 0:39 GMT

Nice work, although I have spotted an error in your table. It says that Bradman played in 50 matches, when in fact he played in 52. All the best.

• Vipin on June 20, 2010, 0:07 GMT

Well... you could have as well given a link to any corporate finance book with risk return explained. problem is that cricket is not like finance. Anyone who knows his cricket will say that it is ok that if you get out before you get your eye in.. say 30-50 balls in test however it is sinful if you don't make a good use of times when you get in comfortably.

Another point, it is wrong that ODI list necesarily shows a more realistic pecking order. Issue is that test matches are won by bowlers while ODIs are won by batsmen hence consistent batsmen will show better slope

• Singhe on June 19, 2010, 23:13 GMT

Great and technically sound analysis. I think consistency has to be measured over a long time which is the "best" simulation for a variety of conditions: I took the top ten with a minimum HUNDRED TEST innings and a straight analysis. They are: Hobbs, Redpath, Ranatunga, Barrington, M.Amarnath, AN Cook, Fredricks, Chanderpaul, SM Pollock, CH Lloyd. Couple of surprises: -Cook and Pollock on the list -Three (Fredricks, Chanderpaul and Lloyd) from little Guyana: could have been four with Butcher having 78 innings.

• krishna darooka on June 19, 2010, 15:07 GMT

A great research and finding,I suggest this should be recognised in selecting a player and yearly award should be initiated for the best player.

How about doing similar analysis for the bowlers?

• deepak on June 19, 2010, 10:09 GMT

contd The next 10 from my analysis proves my point... 21 IVA Richards 22 SM Gavaskar 23 SR Waugh 24 L Hutton 25 DCS Compton 26 GS Sobers 27 GS Chappell 28 ML Hayden 29 Javed Miandad 30 TT Samaraweera And People like Dhoni & Richardson drop off the list of the tope players. Purely from a cricketing point of view would any captain have Richardson (rank 1 in your list) rather than Dravid (rank 85 in your list)?? Technically, this shows the dangers of blindly applying statistics without using clustering.

• Gabriel Rogers on June 24, 2010, 11:05 GMT

Thanks all for some interesting comments. Here are a few quick responses to some that I think raise particularly important issues. More than one contributor has tumbled to what - I am sure - is the correct explanation for my finding that consistency appears to be more strongly correlated with success in ODI cricket than it is in Tests. The reason is that good players are more likely to have their innings interrupted (to finish not-out; this is what keynotespeaker refers to as "right-censoring"). This may also explain why Sachin Tendulkar's consistency rating was surprisingly low. If, on four successive occasions on which he opens an ODI innings, Tendulkar is playing well enough to score 20, 42, 60, and 158, he is likely to score... well... 20, 42, 60, and 158. Now, if Mike Hussey is also playing well enough to score 20, 42, 60, and 158, but is batting in his customary position lower down the order, he might well end up with scores of 20, 17*, 60, and 43*. Because his scores are artificially compressed into a much narrower range, he would have a lower CoV (even though each player had the capacity to score exactly the same number of runs, and even though the batting averages, for those two sequences, are identical). This is a slightly exagerrated example, but it shows why batting down the innings, in ODIs, might be expected to lead to lower CoVs.

This is exactly what David Barry suggests. But it turns out that the story isn't all that conclusive. I did exactly what David asked, and split ODI records according to batting position. And, it's true, there is a slightly stronger correlation between CoV and average for batsmen at nos. 3-6, but it really isn't that much more pronounced than that seen in opening batsmen - see the graph here. Importantly, there's still a clear trend for more consistent opening batsmen to have higher averages so, although the effect amongst middle-order batsmen may HAve a bit more influence, it certainly isn't driving the finding exclusively.

Sanchez: The fella with an average over 60 and CoV near zero is Bermudan Chris Douglas, who scored 69 and 53 in his two ODIs.

Rajat: you raise some really interesting points, to which I can't do full justice, here. In brief, my views are that you're completely right about #2 (I do have nonparametric methods for doing this sort of analysis, but it's way beyond the scope of this kind of column to introduce them here). You're arguably right about #4 (though it'd be an interesting argument, and I'd still fight my corner). You're not necessarily right about #3 - whether variance is increased or decreased will depend on which particular innings have been interrupted; on the whole, it will tend towards decreasing. What you suggest in #5 might be an interesting alternative analysis, but it'd be computationally tiresome, and I don't see why it's necessarily any more correct than what I've done. Your really important criticism is #1. If you're right about that, my whole analysis is on very dodgy ground. I don't think you are right about it, though. Dividing SD by RPI, to calculate CoV, simply produces a dimensionless measure of variance. It would be absolutely inappropriate to regress RPI against SD, because those two measures are always going to be strongly correlated, even in the presence of constant levels of performance. But CoV is SD scaled to account for this relationship (effectively, to turn it into a percentage). So it isn't true to say that "higher RPIs will lead to lower CoVs by definition", because higher levels of scoring also produce higher SDs, with the net result that CoVs will only be related to RPIs (or averages) in the event that higher-scoring batsmen tend to be more consistent. As a simple demonstration of this, note that there's plenty of players with lower CoVs than Bradman's; he is so far ahead of the pack when it comes to runscoring expectation that, if high RPIs automatically translated into low CoVs, he'd be certain to top the list. In fact, he's below, say, Shaun Pollock and Dwayne Bravo.

Deepak: forgive me, but all you've demonstrated there is that, if you take a measure that doesn't fit your ideas, and multiply it by batting average, then you end up with a list that looks more like the batting average. Well, yeah, you do.

Singhe: I wasn't especially surprised by Cook's high ranking (not least because his stats are, in many ways, eerily similar to Mark Richardson's). I'm sure England fans will recognise him as a prime example of the hypothetical "Batsman B" in my introduction: his infamous habit of getting out for 60 reflects just this status.

Souvik: undoubtedly, yes, you could come up with better predictors of match outcome (though I'm afraid I don't really understand your suggestion). That wasn't the purpose of this analysis, though.

Chetan Makani: you get the spotter's badge! There was a small mistake in my query which ended up with M equalling matches in which the given batsmen batted, and not matches in which he'd played. Hence the error with number of games for Bradman (and any other player who played in matches in which he didn't bat). The key numbers for the analysis are unaffected, though.

• Engle on June 21, 2010, 2:52 GMT

Consistency is most important for the opening batsmen over all other batting positions.

• Squishy on June 21, 2010, 1:51 GMT

I believe that in a past issue of It Figures, someone ran an analysis looking at what 'not-out' figures were likely to have been if the innings had been played to completion. It'd be interesting to use those figures (though not necessarily statistically correct), to replace the not-out RPI scores in your database, and see if the consistency achieved by middle-order batsmen in ODIs is altered, or if the consistency / average correlation is still seen.

• James on June 20, 2010, 22:11 GMT

A bit late on this, but I'd love to see a similar study but with bowler economy rates in ODIs. I have a theory that the current Aussie attack is very inconsistent. Doug Bollinger and Mitchell Johnson seem to be brilliant one game (or 2 or 3 :), rubbish the next, Bollinger in particular. I'm not sure those are the kind of bowlers you want to build an attack around. Just an idea anyway...

• Souvik on June 20, 2010, 17:09 GMT

Gabriel, Nice analysis. But I think r^2 =0.015 is indicative of a very low S/N. For test matches, perhaps RPI is not the best of measures to gauge a batsman's contribution to a "non-loss" ... for test cricket, we should go back to some old school measures ..... MPI ... minutes spent per innings and WLWC ... wickets lost while at the crease .... do this for batsmen who have batted no. 6 or higher in 40 or more completed second innings (... See Moredeclared or all out), so we ignore all rain affected matches. A cross plot between second innings MPI and WLWC should be a much better measure of batsmen who "save" test matches. My guess is there will be a negative correlation (less wickets for more minutes) for guys like Ken Barrington, Rahul Dravid, Steve Waugh, Jimmy "Padams", Chanderpaul, possibly Hazare, and some others. Take the names with the highest r^2 and plot the win loss ratio for only those games for which the correlation was performed. You ought to see a positive correlation.

• Chetan Makani on June 20, 2010, 0:39 GMT

Nice work, although I have spotted an error in your table. It says that Bradman played in 50 matches, when in fact he played in 52. All the best.

• Vipin on June 20, 2010, 0:07 GMT

Well... you could have as well given a link to any corporate finance book with risk return explained. problem is that cricket is not like finance. Anyone who knows his cricket will say that it is ok that if you get out before you get your eye in.. say 30-50 balls in test however it is sinful if you don't make a good use of times when you get in comfortably.

Another point, it is wrong that ODI list necesarily shows a more realistic pecking order. Issue is that test matches are won by bowlers while ODIs are won by batsmen hence consistent batsmen will show better slope

• Singhe on June 19, 2010, 23:13 GMT

Great and technically sound analysis. I think consistency has to be measured over a long time which is the "best" simulation for a variety of conditions: I took the top ten with a minimum HUNDRED TEST innings and a straight analysis. They are: Hobbs, Redpath, Ranatunga, Barrington, M.Amarnath, AN Cook, Fredricks, Chanderpaul, SM Pollock, CH Lloyd. Couple of surprises: -Cook and Pollock on the list -Three (Fredricks, Chanderpaul and Lloyd) from little Guyana: could have been four with Butcher having 78 innings.

• krishna darooka on June 19, 2010, 15:07 GMT

A great research and finding,I suggest this should be recognised in selecting a player and yearly award should be initiated for the best player.

How about doing similar analysis for the bowlers?

• deepak on June 19, 2010, 10:09 GMT

contd The next 10 from my analysis proves my point... 21 IVA Richards 22 SM Gavaskar 23 SR Waugh 24 L Hutton 25 DCS Compton 26 GS Sobers 27 GS Chappell 28 ML Hayden 29 Javed Miandad 30 TT Samaraweera And People like Dhoni & Richardson drop off the list of the tope players. Purely from a cricketing point of view would any captain have Richardson (rank 1 in your list) rather than Dravid (rank 85 in your list)?? Technically, this shows the dangers of blindly applying statistics without using clustering.

• deepak on June 19, 2010, 10:02 GMT

I downloaded your full list and did some work on it. First I clustered batsmen as having achieved averages of >60, >50, >40 etc. Then re ranked them with both criteria (cluster rank and CoV). This gave the result as below, which is probable more nearer to the true consistancy of the batsmen. As you can see, the pretenders have gone off the list of top 20. 1 H Sutcliffe 2 DG Bradman 3 RG Pollock 4 GA Headley 5 JB Hobbs 6 KF Barrington 7 MEK Hussey 8 CL Walcott 9 JH Kallis 10 AR Border 11 MJ Clarke 12 ED Weekes 13 AD Nourse 14 G Gambhir 15 RS Dravid 16 A Flower 17 RT Ponting 18 Yousuf Youhana 19 Inzamam-ul-Haq 20 SR Tendulkar

• Sind on June 19, 2010, 8:11 GMT

The conclusions seem to be quite correct. This “matchwinning” business is for the most part a joke considering the fact that there are 21 other players involved in the final outcome of any match. The flashier players may once in a while play a special innings especially when “finishing” off a match and then that innings will be held up as an example of some particular “matchwinning” ability. The “True” matchwinning value of a batsman will come about only from an analysis of a large number of matches wherein we may estimate how much a batsman has contributed not just in wins but also in staving off defeats, salvaging draws etc. we cannot ,of course, use the actual match results for this analysis, which is perhaps what makes it a difficult one. Up and down batsmen such as Lara,Sehwag etc may play some sensational innings but then follow up with a long string of innings where if only they had shown more consistency and responsibility , matches may have been saved – or atleast the team would have been afforded a greater chance of them being saved. “Winning” 1 match and then as the premier batsman failing miserably for 5 – and ending up with a notional win-loss ratio of 1:5 is hardly indicative of a matchwinning batsman.

• sitting-on-a-gate on June 19, 2010, 5:47 GMT

Probably a better way of doing was to see the proportion of each innings scored by a particular batsman and then do an analysis on the proportion. In a low scoring innings, better batsmen too tend to score less (trying to protect tail enders etc). With regard to the anomaly in consistancy, that is again due to far fewer very high scores in the ODI game (you dont see 200s, 300s in ODIs do you)...

• Sridhar on June 18, 2010, 12:51 GMT

One explanation for CoV correlating strongly with % won could be that one day being a, well, one day game, on his day, a batsman can win it for his team. Also, the player in the zone get the 'benefit' of it for his entire innings.

In test cricket, that form could drop down to more ordinary levels the next day.

My 0.02 Rs!

• Abhi on June 18, 2010, 10:35 GMT

I think your view that "view that Australia v. ICC World XI, 14–17 October 2005, was not a Test match; similarly, these games are never included in my ODI stats."....is totally incorrect. I think most of the big international guns failed, which is why these matches are not being accounted for.If one of them had scored big we would be hearing about it till the end of time...much like Sobers big double..."Oh, and what about that fantastic Hundred when Australia were at their peak..." etc etc.... The Biggest motivation for ANY international player should be taking part in such a world XI vs the "World champions"....and these matches ,in my view, must be taken into account.

• Rajat on June 18, 2010, 9:48 GMT

Cont. from above (these two are a bit geeky)...

3) Using RPI vs average does not effect both halves of the CoV ratio equally. Variance will be increased, and averages decreased by using RPIs, which biases CoV upwards. A better approach would be to exclude not out innings. Not ideal, but less likely to create bias.

4) A really geeky one - I disagree with your use of population vs sample statistics. When we judge to quality of a batsman, we judge the kind of innings he may be expected to play on average. He is not restricted to one of the innings he has played in the past Even over an entire career, there is no guarantee that his career average would match the average he would tend to, were he to play an infinite number of innings. For instance, couldn't Bradman have played one more innings to take his average above 100? His end of career average is thus only a sample estimate of his "true" average.

5) For results, surely we need to use the average CoV of all players on the team?

• Rajat on June 18, 2010, 9:21 GMT

Nice effort here. As I'm sure you're aware though, it is virtually impossible to do a "technically correct" stats analysis with real world data. In this context, a few issues listed below:

1) It is not correct to regress CoVs against averages, as the average is an input into CoV. I know that you are using RPIs instead of averages, but it is still too close for comfort. Higher RPIs will lead to lower CoVs by definition, which will lead to a significant relationship with averages. This low, scattered, but P<0.001 regression is very characteristic of regressions where the independent variable is a divisor of the dependent variable in some form.

2) Skew is a significant factor here s (lower limit at 0, averaegs 40-60, significant scores above 100). Maybe you want to look at a skew metric? This might be your "match winning" stat, and would also possibly place people like Lara, Tendulkar, etc. higher.

Out of space. Cont. in the next post...

• Raman on June 17, 2010, 13:08 GMT

DG Bradman is outlier in the data :-)

• Xolile on June 17, 2010, 11:24 GMT

(...Continued)

This is of course a simplified example; but it does suggest that when you build a proper statistical model based on a large number of random variables, you should be able to prove that a team with six consistent specialist batsmen will be more successful in the long run.

A team featuring Sutcliffe, Hobbs, Kallis, Barrington, Walcott and Hussey may therefore be stronger, odd as it seems, than one featuring Sehwag, Hutton, Hammond, Lara, Tendulkar and Sobers.

• Xolile on June 17, 2010, 11:21 GMT

Another way of thinking about the value of consistency is to consider two teams with one specialist batsman each. BatsmanOne scores exactly 50 every time he bats. BatsmanTwo scores 100 runs 50% of the time and 0 runs 50% of the time. Both batsmen therefore have a career average of 50. In this example, when these teams face each other, both teams would win an equal number of matches.

However, if you introduce a third batsman, things start to get interesting. Assume BatsmanThree scores 200 runs 25% of the time, and 0 runs 75% of the time. He therefore also end up with a career average of 50. Well BatsmanThree would only win 25% of his matches against BatsmanOne. And when BatsmanTwo plays against BatsmanThree, BatsmanTwo would win 37.5% of the matches, BatsmanThree would win 25% of the matches, and 37.5% of the matches would end in a draw.

(Continued)

• Vinish on June 17, 2010, 8:41 GMT

I find the two scenarios debatable when a batsman A scores 53, 32, 64, 41, 76, 23 and another B scores 121, 2, 24, 142, 36, 5. In either case, both the batsmen would have brought victories to their teams, or managed a draw on last day. As a fan of old-school cricketers such as Dravid and Kallis, I would favor batsman A, rather than a Sehwag, Gayle or Dilshan. The fact that the TEAM can depend on a batsman to STAY at crease is the most important factor in test cricket.

I cannot stop myself from appreciating your effort, and to give us insight in to such details.

• Paul Coffey on June 17, 2010, 7:35 GMT

Fascinating work Gabriel, well done. Debating the merits of various players is exactly the kind of discussion that comes up amongst cricket-loving friends, and it's always nice to have a handy set of stats to call upon :)

• Bhagyesh on June 17, 2010, 5:33 GMT

you can see lesser COV winning more matches in ODIs is because of its nature, there are only 50 overs and if someone can score 40 runs consistently it means to say that atleast 15 PC of team score is assured. I would prefer a ODI team with players with lesser COV. Imagine having 5 batsman who average min 30 RPI and lower COVs, we would get atleast 250 every innings. That in itself increase the winning chances. In Test Though I would mix it up, since lower COVs mean you do not tend to score hue scores. So a good mix of that will do wonders.

• David Barry on June 17, 2010, 5:22 GMT

I think the large downward slope on the CV vs Average scatterplot for ODI's is mostly caused by middle-order batsmen. Players who bat in the middle order typically don't have the time to make big scores. So if they have a high RPI, then they must be getting decent scores often, ie, being more consistent. I'd be interested in seeing the scatterplot reproduced just for the top-order batsmen (and then just for middle-order batsmen).

• Sanchez on June 17, 2010, 2:26 GMT

You have in the ODI table someone who averages over 60 yet their COV is near zero. Who is that?

Great analysis.

• Ramesh on June 17, 2010, 1:08 GMT

It is not a useful batting metric if Sir Don does not top it :)

• Yawn on June 16, 2010, 23:14 GMT

Silly list doesn't mirror reality. Richardson more reliable than Sutcliffe, really I think we're doing statistical analyses that border of the Duckworth Lewis level of silliness. The reason the ones at the top appear reliable is because they are but at their level only. Example, if a guy scores 30.30.30.30.30 he can be said to be reliable, but reliable at scoring 30s which wont win you too many games. The more prolific players will have greater variations and will not be statistically looked upon as being very reliable.

• Tariq on June 16, 2010, 20:57 GMT

If I am the captain, I would like to have an "inconsistent" batsman with scores e.g 160,0,20,210,0,0,10,3,140,240,23,4,0,23,176 - in the top 5 batting positions, but for latter positions I would look for consistency like, 40,35,54,43,29,50,12,55,34,54,36. I think at top you should be able to score big, of course more often the better.

• keynotespeaker on June 16, 2010, 20:00 GMT

A cricketing reason for the stronger effect in ODIs is that a cricketer usually accelerates once he has got his eye in say after 20-30 runs. In ODIs this has a big additional benefit: it is a more efficient way of using the balls, thus leaving more room to post a higher score within the constraint of 300 balls. Tests do not have this constraint (plus the effect of declarations cutting off an innings). This means that a player that consistently scores high in ODIs, will have a stronger effect in enabling winning team scores than in Tests. Hope this makes sense?

• keynotespeaker on June 16, 2010, 19:50 GMT

I've often wondered about this consistency issue as well, thanks for the analysis! Regarding consistency being more valuable in ODI, I can think of both a mathematical reason as well as a cricketing reason. The mathematical reason may be right-censoring: high scores, say 150+, are very rare in ODI cricket, but much more common in Test cricket. As a consequence, a player will usually have a lower CoV in ODIs than in Tests. Now if we focus on the players with high averages (since they are the ones likely to win you matches), then for those players, the correlation between average and CoV will be stronger in ODI than in Tests. This means that the comparatively strong effect of CoV in ODIs is because it is partly picking up the effect of a high average, moreso than in Tests. (cricketing reason follows in the next post)

• Saikrishna Chavali on June 16, 2010, 18:14 GMT

Hi Gabriel, I am an Indian economics student (in an undergrad college in the US) and incredibly thrilled to see stats up-and-coming in Cricket-blogs. Thats why I love many of the authors profiled to the right of this page, including your analysis! In any case, I wonder why you guys dont tighten up the writing slightly and publish this stuff!? Your analysis uses more than the simple OLS, has a large enough dataset, thorough and detailed stats and good results. This would be great for the cricketing and economic world to see some sports analysis other than boring, stat-dumped and cleaned baseball or any other american sport. Please take this as a serious comment.

• himanshu khanna on June 16, 2010, 17:29 GMT

For ODI's an important factor is the strike rate. If a consistent batsman makes 30 in 45 balls for 3 innings, vis a vis an inconsistent one making 30 in 15 balls, 15 in 10 balls and 45 in 30 balls the latter batsman is more valuable.

• HusseyFan on June 16, 2010, 16:56 GMT

Glad to see Bevan and Hussey getting deserved credit. Tendulkar's fan will somehow argue against it and try to show that Tendulkar tops them.

• kb on June 16, 2010, 15:27 GMT

Nice analysis! Wonder if there is a way you can overlay "lean patches" over this and analyze how consistent batsmen are with and without lean patches. Of course, the frequency and duration of those lean patches can be a separate analysis on its own. I would regard someone who went goes through a lean patch and consistently score poor runs and then hit a purple patch and score many runs to be "consistent" for what is expected. And finally this is what I like to think about consistent vs non-consistent batsmen say Dravid vs Sachin. I will fly over seven seas to see Sachin bat but wont do it for Dravid; but if my life depended on the outcome, I would ask Dravid ,and not Sachin, to bat for me.

• Gabriel Rogers on June 16, 2010, 15:03 GMT

Ravindra~

There are links to fuller lists at the bottom of each table (though they're still limited to batsmen with >=2,000 runs, so I'm afraid they don't include tail-enders; if you do include everyone, then the least consistent batsman of all time is - for obvious reasons - Jason Gillespie).

The comments about Sutcliffe and Javed are pertinent; I've got some stuff coming up in my next-but-one blog that I hope you'll find interesting, on that score.

--G

• Abhi on June 16, 2010, 13:54 GMT

I also wonder whether "length of career" in terms of time (not innings) has an adverse effect on consistency...coz of the ups and downs if form,injuries etc. which have a greater probability of affecting longer careers I would bet that Tendulkar would have been higher up the consistency table till dec 2002 and before his physical breakdown...

• Abhi on June 16, 2010, 13:49 GMT

Tremendous.It will take a couple of years study to fully sink in...but the conclusions mirror what i used to think too.

• Ravindra Marathe on June 16, 2010, 13:12 GMT

A very good article Gabriel. Can you publish the full tables? I wonder where V Kambli, J Miandad and tailenders like C Martin and G McGrath feature in the list? Just a side note: Sutcliffe's career average never fell below 50 and no wonder he is in the top 6. Miandad too had a throughout-career average over 50 but I think it is his big early-career centuries hurt him on your measure.

• KAPIL AGARWAL on June 16, 2010, 12:21 GMT

i just have to say one thing which is that making runs in 20s and 30s consistently and getting out does a lot of disservice to the team.instead of consistency match winning efforts by players needs to be considered as a parameter of his greatness.I think consisitency against top teams should also be given extra points.

• No featured comments at the moment.

• KAPIL AGARWAL on June 16, 2010, 12:21 GMT

i just have to say one thing which is that making runs in 20s and 30s consistently and getting out does a lot of disservice to the team.instead of consistency match winning efforts by players needs to be considered as a parameter of his greatness.I think consisitency against top teams should also be given extra points.

• Ravindra Marathe on June 16, 2010, 13:12 GMT

A very good article Gabriel. Can you publish the full tables? I wonder where V Kambli, J Miandad and tailenders like C Martin and G McGrath feature in the list? Just a side note: Sutcliffe's career average never fell below 50 and no wonder he is in the top 6. Miandad too had a throughout-career average over 50 but I think it is his big early-career centuries hurt him on your measure.

• Abhi on June 16, 2010, 13:49 GMT

Tremendous.It will take a couple of years study to fully sink in...but the conclusions mirror what i used to think too.

• Abhi on June 16, 2010, 13:54 GMT

I also wonder whether "length of career" in terms of time (not innings) has an adverse effect on consistency...coz of the ups and downs if form,injuries etc. which have a greater probability of affecting longer careers I would bet that Tendulkar would have been higher up the consistency table till dec 2002 and before his physical breakdown...

• Gabriel Rogers on June 16, 2010, 15:03 GMT

Ravindra~

There are links to fuller lists at the bottom of each table (though they're still limited to batsmen with >=2,000 runs, so I'm afraid they don't include tail-enders; if you do include everyone, then the least consistent batsman of all time is - for obvious reasons - Jason Gillespie).

The comments about Sutcliffe and Javed are pertinent; I've got some stuff coming up in my next-but-one blog that I hope you'll find interesting, on that score.

--G

• kb on June 16, 2010, 15:27 GMT

Nice analysis! Wonder if there is a way you can overlay "lean patches" over this and analyze how consistent batsmen are with and without lean patches. Of course, the frequency and duration of those lean patches can be a separate analysis on its own. I would regard someone who went goes through a lean patch and consistently score poor runs and then hit a purple patch and score many runs to be "consistent" for what is expected. And finally this is what I like to think about consistent vs non-consistent batsmen say Dravid vs Sachin. I will fly over seven seas to see Sachin bat but wont do it for Dravid; but if my life depended on the outcome, I would ask Dravid ,and not Sachin, to bat for me.

• HusseyFan on June 16, 2010, 16:56 GMT

Glad to see Bevan and Hussey getting deserved credit. Tendulkar's fan will somehow argue against it and try to show that Tendulkar tops them.

• himanshu khanna on June 16, 2010, 17:29 GMT

For ODI's an important factor is the strike rate. If a consistent batsman makes 30 in 45 balls for 3 innings, vis a vis an inconsistent one making 30 in 15 balls, 15 in 10 balls and 45 in 30 balls the latter batsman is more valuable.

• Saikrishna Chavali on June 16, 2010, 18:14 GMT

Hi Gabriel, I am an Indian economics student (in an undergrad college in the US) and incredibly thrilled to see stats up-and-coming in Cricket-blogs. Thats why I love many of the authors profiled to the right of this page, including your analysis! In any case, I wonder why you guys dont tighten up the writing slightly and publish this stuff!? Your analysis uses more than the simple OLS, has a large enough dataset, thorough and detailed stats and good results. This would be great for the cricketing and economic world to see some sports analysis other than boring, stat-dumped and cleaned baseball or any other american sport. Please take this as a serious comment.

• keynotespeaker on June 16, 2010, 19:50 GMT

I've often wondered about this consistency issue as well, thanks for the analysis! Regarding consistency being more valuable in ODI, I can think of both a mathematical reason as well as a cricketing reason. The mathematical reason may be right-censoring: high scores, say 150+, are very rare in ODI cricket, but much more common in Test cricket. As a consequence, a player will usually have a lower CoV in ODIs than in Tests. Now if we focus on the players with high averages (since they are the ones likely to win you matches), then for those players, the correlation between average and CoV will be stronger in ODI than in Tests. This means that the comparatively strong effect of CoV in ODIs is because it is partly picking up the effect of a high average, moreso than in Tests. (cricketing reason follows in the next post)