June 16, 2010

Achieving the right consistency - I

Gabriel Rogers

Mark Richardson wasn't the most attractive batsman, but with him you knew, more than with any other player, what you were going to get © Getty Images

My first few analyses for It Figures are all going to be broadly about the same thing, and that thing could broadly be called consistency. I'll bet that, at some time or other, everyone reading this post has criticised a cricketer for being inconsistent. I've done it myself but, whenever I have, I've had a nagging doubt: is performing brilliantly in one match and terribly in the next really any worse (or better) than being moderately good in two games on the trot? Maybe some stats can help us to unpick this issue.

I'm going to start by looking at batsmen. More specifically, my focus, in this first post, is batsmen's innings-to-innings consistency. If Batsman A has scores of 0, 138, 11, 0, & 101, and Batsman B has scores of 52, 50, 45, 48, & 55, then they both have the same average (50.00). However, there's a very obvious difference between the ways in which they've achieved the mark that we won't appreciate, if we concentrate on the average alone.

There are two big questions here, for me: (i) is it possible and instructive to identify batsmen with more or less consistent careers, and to quantify how much variability their records show? and (ii) does it matter? Is there any way in which a run of scores like Batsman A's is demonstrably better or worse - for himself and/or his team - than that of Batsman B?

Mister Hugely Reliable

S Rajesh comes close to answering the first of my questions in this It Figures avant la lettre column from 2006. He proposed a consistency index that is derived by dividing a batsman's average by the standard deviation (SD) of runs scored in each of his innings. I think he's on exactly the right lines, here, but I think the index can be improved in two ways. Firstly, I'm twitchy about combining one measure - the batting average - that makes an adjustment for not-out innings with another - the SD of the same dataset - that does not. For this reason, I'd rather rely on simple runs-per-innings (RPI), in this context. This way, both halves of the sum are quantifying the same thing and, although both may be affected by not-out innings, they are both affected equally. The second modification I have made is to turn the sum upside-down, so we have SD divided by RPI. Mathematically, this makes no difference to the ranking of results (although it means that low numbers, rather than high ones, indicate greater consistency).

The advantage of doing these two things is that the number you end up with has a solid interpretation: it is the percentage of deviation around the mean that is observed, on average, throughout the dataset. Dividing the SD by the mean is a trick statisticians use quite often; they call the result the coefficient of variation (CoV). As Rajesh pointed out, it's important to perform this scaling, rather than concentrating on SDs on their own, otherwise the batsmen who score most runs will always appear to have more variability in their records. A batsman with scores of 5, 30, and 100 has the same CoV as one with scores of 10, 60, and 200, though they have very different SDs.

So much for the theory; what about the results? Table 1 shows the batsmen who have been most and least consistent on an innings-to-innings basis throughout Test history, with a few notable figures picked out from the middle of the table.

Top of the lot is Kiwi opener Mark Richardson. He may not have set the world alight compared to some of his dashing contemporaries, but his solidity as an opening batsman can easily be overlooked: he reached double figures in 80% of his Test innings (a very high proportion, as noted in another Numbers Game a few years ago), and only ever registered one duck. What stopped him from threatening the real top rank of the game was that, though he'd seldom get out cheaply, he was also pretty unlikely to score very heavily, as a total of four centuries from 65 innings and a top score of 145 attests. These characteristics are perfect for a low CoV, because they imply that a large majority of his innings fell in a relatively tight range in the middle of possible scores. Cricket will always find a way of surprising you but, to a greater extent than with any other batsman, you knew what you were going to get from Richardson.

Table 1: Test batsmen sorted according to consistency (coefficient of variation) in score
1.MH Richardson38652,77644.7742.7135.160.823
2.H Sutcliffe54844,55560.7354.2345.240.834
3.TL Goddard41782,51634.4732.2627.110.840
4.SM Katich49853,79248.0044.6138.720.868
5.MS Dhoni43662,42842.6036.7932.360.880
6.JB Hobbs601025,41056.9553.0446.680.880
7.IR Redpath661204,73743.4639.4834.890.884
8.JB Stollmeyer32562,15942.3338.5534.200.887
9.PE Richardson34562,06137.4736.8032.860.893
10.A Ranatunga931555,10535.7032.9429.440.894
32.JH Kallis13622910,76054.6246.9944.950.957
35.AR Border15626511,17450.5642.1740.490.960
47.KP Pietersen621115,16649.2046.5445.540.978
56.DG Bradman50806,99699.9487.4586.650.991
85.RS Dravid13823811,37254.1547.7848.911.024
97.RT Ponting14324111,82855.2749.0850.891.037
108.SR Tendulkar16627113,44755.5749.6251.921.046
113.IVA Richards1211828,54050.2446.9249.431.053
115.SM Gavaskar12421410,12251.1247.3049.961.056
116.SR Waugh16826010,92751.0642.0344.511.059
131.GS Sobers931608,03257.7850.2054.021.076
226.BC Lara13023011,91253.1851.7962.431.205
238.V Sehwag751286,60853.7251.6364.711.254
245.DW Randall47792,47033.3831.2740.901.308
246.Zaheer Abbas781245,06244.8040.8254.001.323
247.SE Gregory581002,28224.5422.8230.331.329
248.LG Rowe29492,04743.5541.7855.981.340
249.GJ Whittall46822,20729.4326.9136.451.354
250.DL Amiss50883,61246.3141.0555.741.358
251.MS Atapattu901565,50239.0235.2749.931.416
252.Mohammad Ashraful551072,30622.3921.5530.701.425
253.Wasim Akram1041472,89822.6419.7128.151.428
254.MH Mankad44722,10931.4829.2946.061.572
qual. 2,000 Test runs; complete list available here

I was slightly surprised to see MS Dhoni riding high in this list. His reputation is for a more free-spirited kind of play than might be expected to generate a low CoV. But it turns out that any such assumptions do him a bit of a disservice: his Test record is that of a reliable runscorer, rather than a hit-or-miss gunslinger. Simon Katich's presence next to him is perhaps more in keeping with his reputation.

It is intriguing to see both Herbert Sutcliffe and Sir Jack Hobbs in the top half-dozen of this list. There could surely be no firmer foundation for a partnership as successful as theirs than the kind of shared dependability this statistic suggests. If they both had more mercurial profiles then, though they each might have scored as many runs, they would have been unlikely to have shared so many significant partnerships.

The fact that Jacques Kallis has fallen down the list somewhat compared to Rajesh's analysis is, to a small extent, a reflection of my slightly different methods, but it's more to do with the fact that his record has become a wee bit more inconsistent in the 4 years since Rajesh wrote his column.

According to this analysis, the least consistent batsman in Test history is Vinoo Mankad. His career has the opposite profile to Mark Richardson's: there is a very high proportion of low scores in his record (he only got into double figures 57% of the time) but, when he got in, he often went on to score big hundreds (including two doubles in one series against New Zealand in 1955/56). In contrast to Richardson's reliable-but-unspectacular record, Mankad's performances were an awful lot less predictable.

Wasim Akram's position at the bottom of the list is very largely ascribable to the effect of one mammoth score of 257* in the midst of a dataset that characteristically reflects a much more modest level of achievement (there's a good argument for calling this the most out-of-character innings in Test history, as discussed in a recent Ask Steven). If that one innings is excluded from his record, his CoV reverts to a much more run-of-the-mill 1.119.

Marvan Atapattu's status is probably not surprising for a man who started his Test career with a famous string of failures, but ended up with 6 double-centuries under his belt.


The unanswered question I find most intriguing is whether, in the grand scheme of things, any of this matters. As cricket fans, we're quite used to berating inconsistent batsmen ("you never know what you're going to get: one day, he's brilliant; the next, he couldn't buy a run") but, then again, we may have a paradoxical tendency to look down our noses at those with the least variable records ("he's good at getting in, but he never goes on to register a matchwinning score"). Is either of these positions more justifiable than the other?

I've come up with two ways of answering this question. The first is to examine whether consistent batsmen, ultimately, score more runs than their more mercurial counterparts. It's all very well to invent hypothetical 50-averaging batsmen with consistent and inconsistent records, like I did in my introduction, but it may be that, in the real cricketing world, batsmen with one profile or the other are more likely to achieve a decent average.

To explore this, I used a statistical technique called regression (to be more precise: univariate ordinary least squares linear regression), which enables us to assess the relationship between two variables. The results are shown in Figure 1. Each batsman's CoV is plotted against his average, with the typical relationship between the two (the regression line) indicated by the red dotted line. You can see that, although there's an awful lot of scatter around the trend, the datapoints generally appear to line up with a slight downwards slope. This suggests that there is a weak but identifiable association between the two variables, with more consistent batsmen tending to average slightly more (for any statsheads, that means that r 2 is a pretty dismal 0.065, but p<0.001 for the slope coefficient).

Fig 1 Association between consistency (coefficient of variation) and success (average) for Test batsmen © Gabriel Rogers

Clearly, there are plenty of examples that do not fit the general trend too well, but it appears that, on average, consistency is associated with higher runscoring. Actually, a more pronounced correlation would have been surprising, because we didn't see a very obvious hierarchy in the consistency list - no one is suggesting that Mark Richardson was, in any meaningful way, a better batsman than Brian Lara. Nevertheless, it does seem to be the case that consistency is, by and large, a positive thing for individual batsmen. This may seem like an obvious finding, but I don't think it's been demonstrated before.

My second way to assess the value of batting consistency was to see whether it has a positive effect for the team. So I looked to see if there's any correlation between each batsman's CoV and his record of winning matches. I did this in exactly the same way, plotting one variable against the other, and drawing a univariate regression line through the results. For Test match cricket, there was a very weak, but still detectable, association between CoV and percentage of matches won (r 2=0.015; p=0.005); this vaguely suggests that, the more consistent a batsman is, the more likely he is to be on the winning side. It's a pretty unsatisfactory analysis, though, with an awful lot of noise around the hint of a signal. What I was more interested to find is that the correlation gets quite a bit stronger when, instead of winning record, you look at each batsman's not-losing record. The results of this analysis are shown in Figure 2. You can see a relatively shallow, but pretty obvious, upwards slope to the dataset, showing that, on average, the most consistent batsmen are also those who have lost the lowest proportion of the Test matches in which they have played.

Fig 2 Association between consistency (coefficient of variation) and losing record for Test batsmen © Gabriel Rogers

The fact that consistency is associated with not-losing more strongly than it is with winning suggests that consistent batsmen really come into their own when it comes to securing draws for their teams. (And, indeed, regressing CoV against draw-rate produces a strongly significant result [p=0.002].) So, if you've got a team packed with consistent batsmen, you might not win too many more games, but you might draw some that less consistent teams would lose. I'm not quite sure how to explain this finding in cricketing terms; if you've got any bright ideas, please feel free to comment!

Once again from the top in pyjamas

The remainder of this post repeats the above analysis for ODI cricket.

Table 2 lists the most and least consistent batsmen in ODIs, The list is topped by Australia's two great "finishers" - Michaels Hussey and Bevan. We're used to seeing them high on lists of ODI stats, but it's worth remembering that - because CoV, as I have calculated it, relies on RPI rather than average - the high number of not-outs in each of their records has no direct influence on their excellent consistency ratings. Plenty of players have higher RPIs that these two; it's only once not-outs are factored in that their averages rise so high (although that doesn't necessarily mean the not-outs inflate their average, as is often assumed; Charles Davis has done good work on this). Accordingly, it is notable that consistency stats for these two players agree with their conventional records: I conclude that it was the dependability - as much as the volume - of their contributions that marked them out as matchwinners for their team.

Table 2: ODI batsmen sorted according to consistency (coefficient of variation) in scores
1.MG Bevan2321966,91253.5835.2725.600.726
2.MEK Hussey1371134,02953.0135.6526.040.730
3.RR Sarwan1561465,09843.9534.9227.980.801
4.AH Jones87872,78435.6932.0025.720.804
5.NH Fairbrother75712,09239.4729.4623.810.808
6.IR Bell78762,48335.4732.6726.450.810
7.GP Thorpe82772,38037.1930.9125.040.810
8.CG Greenidge1281275,13445.0440.4333.010.817
9.AJ Lamb1221184,01039.3133.9827.870.820
10.RG Twose87812,71738.8133.5427.540.821
11.Javed Miandad2332187,38141.7033.8627.810.821
12.DM Jones1641616,06844.6237.6931.030.823
17.Zaheer Abbas62602,57247.6342.8735.580.830
20.ML Hayden1601546,13144.1139.8133.380.838
21.GC Smith1531515,73240.3737.9631.860.839
24.S Chanderpaul2612458,64841.7835.3029.960.849
28.MS Dhoni1591415,24650.4437.2131.790.855
29.JH Kallis29828410,80946.5938.0632.680.859
31.Mohammad Yousuf2752619,45842.8036.2431.250.862
32.RS Dravid33530910,64439.4234.4529.710.863
40.KP Pietersen96863,20245.1037.2332.830.882
49.RT Ponting34133212,62342.7938.0234.110.897
55.IVA Richards1871676,72147.0040.2536.450.906
74.BC Lara29528510,34840.9036.3134.280.944
105.SR Tendulkar44243117,59845.1240.8340.140.983
147.BB McCullum1711453,56929.0224.6126.771.088
148.DI Gower1141113,17030.7828.5631.171.091
149.Mohammad Ashraful1571513,29823.9021.8424.091.103
150.Kapil Dev2251983,78323.7919.1121.101.105
151.WW Hinds1191112,88028.5125.9528.831.111
152.RS Kaluwitharana1891813,71122.2220.5022.791.112
153.VVS Laxman86832,33830.7628.1731.661.124
154.L Vincent102992,41327.1124.3728.051.151
155.H Masakadza95952,60128.5827.3832.861.200
156.KO Otieno89872,01623.4423.1728.161.215
qual. 2,000 ODI runs; complete list available here

It's surprising to see Ian Bell so high in the list (actually, it's kinda surprising to learn that Ian Bell has 2,000 ODI runs). His reputation may not suggest limited-overs strength but, if you look at his record, it's clear that he has very seldom failed completely in ODIs. Of course, there's a relative lack of dramatic successes, too, and we've already seen that these two characteristics tend to produce a low CoV.

Another unexpected finding is that Sachin Tendulkar's ODI record is not an especially consistent one. To an extent, this is just because his scores encompass such a wide range. Obviously, Tendulkar has a much higher proportion of big scores in his record than someone like Ian Bell. Less predictably, however, he also has a higher proportion of cheap dismissals, and it is the swing from one extreme to the other that produces a higher CoV. Tendulkar's customary position at the top of the ODI order may provide part of the explanation for both of these features: if he hits his stride in any given innings, he will have full opportunity to score plenty of runs, and this may not be true of those who bat lower in the order; on the other hand, he also has the challenge of facing the new ball with close catchers in place - something that may raise his probability of early dismissal. It is notable that there are relatively few opening batsmen amongst those with the lowest CoVs (although I checked, and this is not a systematic bias in the dataset: batting position is not, in itself, predictive of consistency).

In contrast, it is anything but a shock to see Tendulkar's teammate, VVS Laxman, near the foot of the table: has any batsman more dramatically personified sublime-to-the-ridiculous swings of achievement in the ODI era?

I don't know if you've noticed it, but I think there's quite a difference between the ODI consistency list and the Test match equivalent, in Table 1, above. It seems to me that - the odd diminutive ginger anomaly aside - there is some sort of hierarchy going on in the ODI list. Players in the top part of the table are, by and large, better than those who come lower down. So we might expect to see a more pronounced correlation between ODI CoV and batting average than was the case in the analogous Test analysis. Any such expectation would be right on the money, as Figure 3 shows. There is a fairly strong association between the two variables (r 2=0.196; p<0.001), with a clear downward trend, suggesting that lower CoVs - indicating greater consistency - are associated with higher averages. I'm pretty sure that there is no mathematical reason why consistency should appear to be more valuable in ODI cricket than it is in the five-day game. If anyone can think of a cricketing reason, please do put it in the comments.

Fig 3 Association between consistency (coefficient of variation) and success (average) for ODI batsmen © Gabriel Rogers

And, if it's positive for individual batsmen to be more consistent in their ODI runscoring, you'd expect their teams to see some benefit. As Figure 4 shows, this would appear to be another reasonable inference: batsmen with lower CoVs are, on average, those who win most ODIs (r 2=0.163; p<0.001).

Fig 4 Association between consistency (coefficient of variation) and winning record for ODI batsmen © Gabriel Rogers


So what if, playground style, you're offered first pick between two batsmen with different records? Given a choice between a consistent batsman and an inconsistent one who averages more, you'd be a fool not to go for the one with the higher average. However, if your choice was between two batsmen with similar averages but different CoVs, I'd go for the more consistent one every time, as a result of what I've learned in this analysis. My expectation would be that he'd help me win - or at least not lose - a greater proportion of my matches. What is more, if I was provided with no information at all about the batsmen's averages, but did know their CoVs, I'd favour whoever had the more consistent record, because it would be a reasonable - though far from infallible - guess that he'd also have the better average.

One corollary of this conclusion may be that "matchwinning" performers are a bit of a myth. In the future, I want to do some work on whether there is such a thing as a true matchwinner in cricket (what analysts in other sports sometimes refer to as clutch players) but, with this analysis as my starting point, my provisional view is that the kind of batsman who quietly gets on with contributing on a match-to-match basis may be of at least as much value as one who has an exceptional game once in a while.

I've got some related posts coming up about consistency among bowlers, and swings of form over longer periods.

All stats calculated Jun 10, 2010 (i.e. all Tests up to England v Bangladesh at Manchester, Jun 4-6, 2010 [Test # 1959] and all ODIs up to Zimbabwe v Sri Lanka at Harare, Jun 9, 2010 [ODI # 2990]).

Technical appendix

Anyone who isn't incredibly fascinated by statistical methods doesn't need to read this bit, but I like to give a precise account of what I've done, in case anyone cares.

Technical note #1. My SDs are calculated as population SDs (i.e. if we're going to be really geeky, I have not adopted Bessel's correction). The reason for this is that, in batsmen's complete careers, we're dealing with all the observations that are available to us. This is unusual, for a statistician: normally, we have a limited sample of observations from which we want to draw inferences about a wider population (ask 1,000 people how they intend to vote, and you can predict what the entire electorate is going to do; give 500 people a drug, and you can tell how effective it'll be for everyone... that sort of thing). Here, though, the data we have is all we're going to get, so it's appropriate not to use the tiny correction that's normally strictly necessary. If anyone wants to replicate my analyses in Excel or Access, you need to use the StDevP function, not the normal StDev. If anyone else read the foregoing, didn't understand a word of it, but wonders whether it makes a difference to my outputs, the answer is no: the effect is tiny, but I believe it's more correct, so that's what I've done.

Technical note #2. Any statisticians reading this analysis might have been slightly concerned that the regression I presented in Figure 1 is unduly influenced by the highest-averaging batsmen (a phenomenon statisticians refer to as leverage). I did some sensitivity analyses that established that this isn't the case (p remains <0.001 when the dataset is restricted to batsmen averaging <60 or <50).

Technical note #3. Another concern statisticians might have with my regressions is that I've picked two covariates of consistency and analysed them separately (univariately). A more comprehensive model would be a multivariate one - that is, one that bundles everything up in the same analysis. So I did that. For Test cricket, I regressed CoV against average, losing percentage, and an interaction term. The only significant covariate was average (p<0.001). This suggests that the reason more consistent batsmen lose fewer Test matches is that they average more: there's no independent effect of consistency on not-losing. For ODIs, the multivariate results - regressing CoV against average, winning percentage, and an interaction term - are more interesting: all three covariates come up p<0.05 (and r 2 rises to 0.415). This suggests that more consistent batsmen are likely to win more games even if they don't average any higher, and the significant interaction term indicates that, the more games you win, the more consistency is of value in raising your average.

Technical note #4. The super eagle-eyed may have noticed that the scatterplots showing CoV -v- Average are rather bushier than those showing CoV -v- Results. This is because winning (or losing) percentages are constrained at both ends, and I found that the good number of players have won or lost all of their games were skewing results around, somewhat. Accordingly, all the CoV -v- Results analyses are limited to players with 40 or more innings. As a rule, I don't like doing this because, although tiny samples can produce weird results, their weirdness should balance out on either side of the average, so it's my preference to use all the data that's going. In this instance, though, I found I got much more sensible results by adopting an artificial constraint.

Technical note #5. I take the view that Australia v. ICC World XI, 14–17 October 2005, was not a Test match; similarly, these games are never included in my ODI stats.


RSS Feeds: Gabriel Rogers

Keywords: Stats,

© ESPN Sports Media Ltd.

Posted by Gabriel Rogers on (June 24, 2010, 11:05 GMT)

Thanks all for some interesting comments. Here are a few quick responses to some that I think raise particularly important issues. More than one contributor has tumbled to what - I am sure - is the correct explanation for my finding that consistency appears to be more strongly correlated with success in ODI cricket than it is in Tests. The reason is that good players are more likely to have their innings interrupted (to finish not-out; this is what keynotespeaker refers to as "right-censoring"). This may also explain why Sachin Tendulkar's consistency rating was surprisingly low. If, on four successive occasions on which he opens an ODI innings, Tendulkar is playing well enough to score 20, 42, 60, and 158, he is likely to score... well... 20, 42, 60, and 158. Now, if Mike Hussey is also playing well enough to score 20, 42, 60, and 158, but is batting in his customary position lower down the order, he might well end up with scores of 20, 17*, 60, and 43*. Because his scores are artificially compressed into a much narrower range, he would have a lower CoV (even though each player had the capacity to score exactly the same number of runs, and even though the batting averages, for those two sequences, are identical). This is a slightly exagerrated example, but it shows why batting down the innings, in ODIs, might be expected to lead to lower CoVs.

This is exactly what David Barry suggests. But it turns out that the story isn't all that conclusive. I did exactly what David asked, and split ODI records according to batting position. And, it's true, there is a slightly stronger correlation between CoV and average for batsmen at nos. 3-6, but it really isn't that much more pronounced than that seen in opening batsmen - see the graph here. Importantly, there's still a clear trend for more consistent opening batsmen to have higher averages so, although the effect amongst middle-order batsmen may HAve a bit more influence, it certainly isn't driving the finding exclusively.

Sanchez: The fella with an average over 60 and CoV near zero is Bermudan Chris Douglas, who scored 69 and 53 in his two ODIs.

Rajat: you raise some really interesting points, to which I can't do full justice, here. In brief, my views are that you're completely right about #2 (I do have nonparametric methods for doing this sort of analysis, but it's way beyond the scope of this kind of column to introduce them here). You're arguably right about #4 (though it'd be an interesting argument, and I'd still fight my corner). You're not necessarily right about #3 - whether variance is increased or decreased will depend on which particular innings have been interrupted; on the whole, it will tend towards decreasing. What you suggest in #5 might be an interesting alternative analysis, but it'd be computationally tiresome, and I don't see why it's necessarily any more correct than what I've done. Your really important criticism is #1. If you're right about that, my whole analysis is on very dodgy ground. I don't think you are right about it, though. Dividing SD by RPI, to calculate CoV, simply produces a dimensionless measure of variance. It would be absolutely inappropriate to regress RPI against SD, because those two measures are always going to be strongly correlated, even in the presence of constant levels of performance. But CoV is SD scaled to account for this relationship (effectively, to turn it into a percentage). So it isn't true to say that "higher RPIs will lead to lower CoVs by definition", because higher levels of scoring also produce higher SDs, with the net result that CoVs will only be related to RPIs (or averages) in the event that higher-scoring batsmen tend to be more consistent. As a simple demonstration of this, note that there's plenty of players with lower CoVs than Bradman's; he is so far ahead of the pack when it comes to runscoring expectation that, if high RPIs automatically translated into low CoVs, he'd be certain to top the list. In fact, he's below, say, Shaun Pollock and Dwayne Bravo.

Deepak: forgive me, but all you've demonstrated there is that, if you take a measure that doesn't fit your ideas, and multiply it by batting average, then you end up with a list that looks more like the batting average. Well, yeah, you do.

Singhe: I wasn't especially surprised by Cook's high ranking (not least because his stats are, in many ways, eerily similar to Mark Richardson's). I'm sure England fans will recognise him as a prime example of the hypothetical "Batsman B" in my introduction: his infamous habit of getting out for 60 reflects just this status.

Souvik: undoubtedly, yes, you could come up with better predictors of match outcome (though I'm afraid I don't really understand your suggestion). That wasn't the purpose of this analysis, though.

Chetan Makani: you get the spotter's badge! There was a small mistake in my query which ended up with M equalling matches in which the given batsmen batted, and not matches in which he'd played. Hence the error with number of games for Bradman (and any other player who played in matches in which he didn't bat). The key numbers for the analysis are unaffected, though.

Posted by Engle on (June 21, 2010, 2:52 GMT)

Consistency is most important for the opening batsmen over all other batting positions.

Posted by Squishy on (June 21, 2010, 1:51 GMT)

I believe that in a past issue of It Figures, someone ran an analysis looking at what 'not-out' figures were likely to have been if the innings had been played to completion. It'd be interesting to use those figures (though not necessarily statistically correct), to replace the not-out RPI scores in your database, and see if the consistency achieved by middle-order batsmen in ODIs is altered, or if the consistency / average correlation is still seen.

Posted by James on (June 20, 2010, 22:11 GMT)

A bit late on this, but I'd love to see a similar study but with bowler economy rates in ODIs. I have a theory that the current Aussie attack is very inconsistent. Doug Bollinger and Mitchell Johnson seem to be brilliant one game (or 2 or 3 :), rubbish the next, Bollinger in particular. I'm not sure those are the kind of bowlers you want to build an attack around. Just an idea anyway...

Posted by Souvik on (June 20, 2010, 17:09 GMT)

Gabriel, Nice analysis. But I think r^2 =0.015 is indicative of a very low S/N. For test matches, perhaps RPI is not the best of measures to gauge a batsman's contribution to a "non-loss" ... for test cricket, we should go back to some old school measures ..... MPI ... minutes spent per innings and WLWC ... wickets lost while at the crease .... do this for batsmen who have batted no. 6 or higher in 40 or more completed second innings (... See Moredeclared or all out), so we ignore all rain affected matches. A cross plot between second innings MPI and WLWC should be a much better measure of batsmen who "save" test matches. My guess is there will be a negative correlation (less wickets for more minutes) for guys like Ken Barrington, Rahul Dravid, Steve Waugh, Jimmy "Padams", Chanderpaul, possibly Hazare, and some others. Take the names with the highest r^2 and plot the win loss ratio for only those games for which the correlation was performed. You ought to see a positive correlation.

Posted by Chetan Makani on (June 20, 2010, 0:39 GMT)

Nice work, although I have spotted an error in your table. It says that Bradman played in 50 matches, when in fact he played in 52. All the best.

Posted by Vipin on (June 20, 2010, 0:07 GMT)

Well... you could have as well given a link to any corporate finance book with risk return explained. problem is that cricket is not like finance. Anyone who knows his cricket will say that it is ok that if you get out before you get your eye in.. say 30-50 balls in test however it is sinful if you don't make a good use of times when you get in comfortably.

Another point, it is wrong that ODI list necesarily shows a more realistic pecking order. Issue is that test matches are won by bowlers while ODIs are won by batsmen hence consistent batsmen will show better slope

Posted by Singhe on (June 19, 2010, 23:13 GMT)

Great and technically sound analysis. I think consistency has to be measured over a long time which is the "best" simulation for a variety of conditions: I took the top ten with a minimum HUNDRED TEST innings and a straight analysis. They are: Hobbs, Redpath, Ranatunga, Barrington, M.Amarnath, AN Cook, Fredricks, Chanderpaul, SM Pollock, CH Lloyd. Couple of surprises: -Cook and Pollock on the list -Three (Fredricks, Chanderpaul and Lloyd) from little Guyana: could have been four with Butcher having 78 innings.

Posted by krishna darooka on (June 19, 2010, 15:07 GMT)

A great research and finding,I suggest this should be recognised in selecting a player and yearly award should be initiated for the best player.

How about doing similar analysis for the bowlers?

Posted by deepak on (June 19, 2010, 10:09 GMT)

contd The next 10 from my analysis proves my point... 21 IVA Richards 22 SM Gavaskar 23 SR Waugh 24 L Hutton 25 DCS Compton 26 GS Sobers 27 GS Chappell 28 ML Hayden 29 Javed Miandad 30 TT Samaraweera And People like Dhoni & Richardson drop off the list of the tope players. Purely from a cricketing point of view would any captain have Richardson (rank 1 in your list) rather than Dravid (rank 85 in your list)?? Technically, this shows the dangers of blindly applying statistics without using clustering.

Comments have now been closed for this article