September 14, 2010

# Form is temporary ...

A statistical analysis of batsmen's form across their career
34

Having written a couple of blogs unpicking the value of innings-to-innings consistency among batsmen and bowlers, I'm now turning my attention to variability of performance over longer periods. In these analyses, I look at how players' careers are made up of spells of relative success and failure. In other words, what I'm interested in is the statistical basis of what we often call form. Once again, I'm going to start with batsmen and, for reasons of space, I've concentrated on Test cricket only.

The key statistical technique I have used to look at this issue is the simple moving average. That is to say, I have cut up each player's career into a series of overlapping blocks of the same length, and calculated his average for each block in turn. In my base case, the length of block I have chosen is 20 innings. This means that we start with the individual's average over his first 20 innings, then we look at innings 2–21, then innings 3–22, and so on. (There are good arguments for using a slightly more sophisticated kind of moving average; if you're interested in why I didn't, please see the Technical Appendix at the foot of this blog.)

Later, I'm going to do some number-crunching on the results of my analysis but, to begin with, I want to do something a bit simpler. I want to draw pictures of the results. By and large, I think that cricket statisticians tend to be pretty poor at finding helpful ways of visually presenting the scads of data we often turn out, and we could all do with giving more thought to information graphics. There's a couple of visualisations we routinely see on telly (especially in limited-overs cricket, in which the so-called "worm" and "Manhattan" are used with some frequency), but I'm convinced it would be useful to have an awful lot more tricks of this kind up our sleeves. [Note: I drafted this paragraph before Anantha published his most recent It Figures blog, which I was really pleased to see.]

I find it particularly remarkable that there is no common way of depicting individual players' career records over time (what a statistician would call a longitudinal approach). We all know that, to one degree or another, all players go through peaks and troughs of performance, and that the career stats with which they end up iron out the kinks in their record, through the magic of aggregates and averages. I think it would be great to have a way of thinking about – and looking at – the information that gets lost.

So, in this column, I am introducing my stab at plugging this gap. Because I'm a statistician, I call it the Longitudinal Career Graph (LCG for short); if I were a telly producer, I'd probably call it an iceberg plot, or something like that. An example is shown in Figure 1, depicting Sachin Tendulkar's test batting career. There are two key features:

* Firstly, the player's moving average throughout his career is given in the shaded area. It is shown relative to his long-run career average, which is pegged to the central axis: whenever the black area is above the axis, the player averaged more over the previous 20 innings than he did over his whole career and, whenever the black area is below the axis, his average for the last 20 innings was worse than he achieved in the long run. The advantage of presenting the data in this way is that it allows us immediately to see a given player's hot and cold streaks in relation to his overall level of performance (which is important because, of course, the kind of figures that constitute a purple patch for one player might represent a dry spell for another).

* Secondly, the evolution of the player's career average over time is indicated by the red line (this is a straightforward depiction of what Statsguru calls the cumulative average). Because the final career average is the point of reference for the moving average plot, the red line will always end at the exact point around which the black area pivots.

(By the way, I'm not going to use them here, because I can't squish them into the 470 pixels Cricinfo give me to play with, but I've also developed a flashier version which gives more context about where and against whom runs were made – here's Tendulkar again, as an example.)

As you get used to reading these graphs, you'll come to recognise that Tendulkar's LCG shows a pretty constant level of achievement, without too much in the way of dramatic swings of form (that is to say, there's not a whole lot of black on his graph). Nevertheless, we can see relatively good and relatively bad streaks, perhaps most obviously over his last 50 or 60 knocks, with an apparent drop-off in form reaching a nadir at the turn of 2007, and then a distinct renaissance over the last two years (over his last 20 innings, he averages 78.22, with 7 hundreds, which isn't far behind his best-ever 20-knock streak of 81.17).

If you prefer a few more thrills on your rollercoaster ride, how about Mohammad Yousuf's test career, shown in Figure 2? There's a lot more shaded area on his LCG, indicating that his career has been subject to more dramatic ups and downs. Most conspicuous of all is the amazing peak he reached at the end of 2006. In the 20 innings from the tail-end of 2005 to that point, he scored 2011 runs at an average of 105.84, reaching three figures in precisely half of those 20 knocks. There are troughs to go with the peaks, though, including one at the present moment (he averages 31.80, without a single century, in his last 20 Test innings).

So much for pretty pictures; what about some numbers? The question I address, here, is which cricketers' careers appear to have been more (or less) streaky. In order to quantify streakiness, I use a measure that is directly related to the area of black on each batsman's LCG – the greater the area, the streakier the player. [Technically, the measure is the root mean squared deviation of the moving average relative to the long-run career average, which is then scaled by the overall average, to provide CV(RMSD).] Table 1 gives a list of the most and least streaky batsmen in Test history, sorted according to this measure.

 Name M I R Ave 20-Inns Min 20-Inns Max 20-Inns Rng CV(RMSD) p 1. Gatting MW 79 138 4,409 35.56 19.94 86.92 66.98 0.505 0.002 2. Vengsarkar DB 116 185 6,868 42.13 20.35 114.17 93.82 0.485 0.001 3. Adams JC 54 90 3,012 41.26 19.11 91.79 72.68 0.482 0.038 4. Shoaib Mohammad 45 68 2,705 44.34 27.26 86.69 59.42 0.432 0.020 5. Hussey MEK 52 90 3,981 51.04 22.21 91.71 69.50 0.422 0.007 6. Flower A 63 112 4,794 51.55 27.26 115.79 88.52 0.421 0.028 7. de Silva PA 93 159 6,361 42.98 18.20 103.40 85.20 0.406 0.008 8. Fletcher KWR 59 96 3,272 39.90 15.21 75.29 60.08 0.400 0.056 9. Tillakaratne HP 83 131 4,545 42.88 21.06 101.00 79.94 0.397 0.049 10. Macartney CG 35 55 2,131 41.78 15.84 73.00 57.16 0.396 0.008 ... 13. Gambhir G 32 57 2,800 52.83 32.32 91.17 58.85 0.392 0.004 14. Chanderpaul S 126 215 8,969 49.28 24.16 122.09 97.93 0.392 0.019 15. Imran Khan 88 126 3,807 37.69 19.17 82.50 63.33 0.385 0.043 ... 26. Mohammad Yousuf 90 156 7,530 52.29 26.70 105.84 79.14 0.347 0.025 ... 35. Waugh SR 168 260 10,927 51.06 21.74 104.69 82.96 0.331 0.178 36. Sangakkara KC 91 152 8,016 56.85 34.42 110.00 75.58 0.328 0.079 ... 39. Sobers GS 93 160 8,032 57.78 28.00 103.94 75.94 0.316 0.186 ... 41. Hayden ML 102 182 8,437 50.22 25.80 94.00 68.20 0.310 0.051 ... 43. Kallis JH 139 235 11,043 54.94 24.35 95.60 71.25 0.310 0.092 ... 46. Ponting RT 145 245 11,926 54.71 29.72 94.47 64.75 0.305 0.073 ... 68. Dravid RS 141 243 11,467 53.33 23.84 88.81 64.97 0.280 0.170 ... 79. Richards IVA 121 182 8,540 50.24 27.68 89.60 61.92 0.268 0.241 ... 94. Gavaskar SM 125 214 10,122 51.12 24.26 87.84 63.58 0.256 0.394 ... 129. Lara BC 130 230 11,912 53.18 28.45 83.89 55.44 0.240 0.700 ... 162. Tendulkar SR 169 276 13,837 56.02 28.95 81.18 52.23 0.216 0.838 ... 166. Sehwag V 78 133 6,956 54.34 28.26 74.84 46.58 0.214 0.728 ... 217. Bradman DG 52 80 6,996 99.94 67.05 132.61 65.56 0.161 0.754 ... 226. Hobbs JB 60 102 5,410 56.95 39.71 73.22 33.52 0.152 0.686 ... 229. Pietersen KP 66 117 5,306 47.80 35.37 64.37 29.00 0.148 0.880 ... 246. Greig AW 58 93 3,599 40.44 31.20 56.00 24.80 0.126 0.883 247. Imran Farhat 39 75 2,327 31.88 26.55 42.28 15.73 0.125 0.826 248. Cowper RM 27 46 2,061 46.84 39.25 59.37 20.12 0.123 0.868 249. Wessels KC 40 71 2,788 41.00 29.89 51.25 21.36 0.123 0.925 250. Richardson MH 38 65 2,776 44.77 35.40 57.11 21.71 0.117 0.714 251. Chauhan CPS 40 68 2,084 31.58 23.89 38.10 14.21 0.112 0.850 252. D'Oliveira BL 44 70 2,484 40.06 31.50 49.47 17.97 0.104 0.968 253. Cook AN 60 108 4,364 42.78 32.00 53.24 21.24 0.103 0.993 254. Bravo DJ 37 68 2,175 32.46 26.85 39.68 12.83 0.100 0.897 255. Rameez Raja 57 94 2,833 31.83 26.37 38.95 12.58 0.099 0.972 qual. 2,000 runs; stats correct at 30-Aug-2010; full list available here

Streakiest of the lot is Mike Gatting. His career consisted of three clear phases: to start with, he looked like he was going to fail to live up to the reputation he had gained in county cricket, with a moving average between 20 and 30 for his first fifty or so Test innings; then, he found his feet at Test level and, for the next fifty knocks, his moving average was over 40 (and, at its peak, rose to 86.92); that level of achievement couldn't last, however, and he sank back to 20–30 when he was recalled in the 1990s. The upshot of all this is that Gatting's career average of 35 is a terrible estimator of how he performed at any one time – he was either much better than that or much worse, depending on which phase you caught him in.

The best-ever 20-innings streaks are Bradman's, naturally (in fact, there are only nine batsmen who have achieved over 20 innings what Bradman managed to sustain over a whole career four times that length). Behind the Don, we find Shivnarine Chanderpaul, who, from the second innings of the Old Trafford Test of 2007 until the first innings in Napier the following year, averaged 122.09. That streak produces a dramatic peak in his LCG (Figure 4), one that is exaggerated by the notable dips in performance that are also evident – indeed, no-one's best and worst streaks encompass such a broad range as his.

Another remarkable case is that of Aravinda de Silva. There is a massive gap in average between his worst 20-knock streak (18.20) and his best (103.40), but what makes this gulf doubly notable is that the two streaks were almost directly consecutive (there was just one innings between them).

At the other end of the scale, the least streaky batsman in Test history was one of Gatting's opponents on the most infamous day of his career (and a fella who happens to be on the radio as I draft this), Rameez Raja. His LCG shows that he had almost no form-related deviations in his career. He averaged 33.37 over his first 20 test innings, and scarcely deviated from that level at any stage in his career, ending with a long-run average of 31.83. In his best 20 innings, he averaged 38.95; in his worst 20, 26.37.

It's not a surprise that the ranks of the least streaky include several batsmen whom I previously identified as having consistent records on an innings-to-innings level. Mark Richardson is there, and it is further evidence of his consistency to see that his 20-innings moving average never dropped any lower than 35.40 (only 11 players have done better than that). Other players who feature in the most consistent 20 of both lists are Richardson's namesake, Peter, Alastair Cook (more about him in a minute), Ranatunga, Bravo, Rameez, Chauhan, Greig, and Stollmeyer. It stands to reason that the batsmen with least variability in their records would also be those whose average stayed pretty constant throughout.

The same isn't true at the other end of the list, however: the streakiest batters are not the same ones who appeared least consistent on an innings-to-innings level. To start with, this surprised me but, after a moment's thought, it makes perfect sense: if your performance in any given innings is unpredictable, then you're less likely to end up with extended phases of good and poor performance (and, if you were consistently poor, then you'd be dropped).

Unlike innings-to-innings consistency - which I showed to be weakly, but identifiably, correlated with both higher runscoring and likelihood of victory - there is absolutely no evidence of an association between streakiness (or the lack of it) and overall batting average or win-rate (r 2=0.001, p=0.507 and r 2<0.001, p=0.648, respectively). Some good players have up-and-down records; others are much more stable. There's no evidence of an overall advantage for either profile.

The analyses above are all well and good, but do they really help us to understand form? In order to answer that question, it is important to make a distinction between a run of good (or bad) form and a run of good (or bad) scores. Batsmen themselves sometimes make a very similar point, especially when it comes to streaks of low scores (how often did Michael Vaughan tell us he was in great nick; he just kept getting out?) It is central to this argument – and central to the science of statistics – that we should attempt to distinguish any real trend from the influence of chance. If you roll a pair of dice many times, you're bound to observe runs of high scores and runs of low ones, even though the probability of getting any particular result is the same every time you roll the dice and, in the long run, the overall average will be 7.

The way in which we tend to think of form in cricket is not like this at all, though: it is much more like imagining that there are series of rolls when the dice are weighted to make a high score more likely, and series of rolls when low ones are most probable. So how do we distinguish between the two models? The key to the answer is that, if you had a pair of non-constantly weighted dice, you would observe greater variation in your overall series of rolls than you would if there was nothing but plain old luck at play.

To apply this principle to cricket data, I used a statistical technique called bootstrapping. I took each batsman's career and put the innings in a random order, to create a new virtual career, but one in which the sequence of knocks is based purely on chance, with no fundamental underlying trends (i.e. no form). For each batsman, I generated 10,000 form-free careers of this type. Then I compared the amount of variability in the random careers with what we see in the batsman's real record. In particular, I worked out the proportion of simulations showing at least as much streakiness – i.e. at least as high a RMSD based on the 20-innings moving average – as the batsman's actual career. This gives us an estimate of the probability that a career as streaky as (or more streaky than) the batsman's real one would have arisen even if there was no underlying variation in form. A statistician would call this estimate an empirical one-tailed p-value.

The p-value for each player is given in Table 1. It will be clear from the explanation above that small p-values (indicating a low likelihood that the player's career would have turned out at least as streaky as it did through chance variation alone) increase our confidence that there probably is evidence of form-related fluctuations in a player's career.

To give one obvious example: it seems extremely unlikely (p=0.007) that a career with the profile of Mike Hussey's would have developed unless there was some kind of variation in his underlying run-scoring capacity (i.e. form). His LCG (Figure 6) gives a fairly dramatic depiction of the deterioration (and subsequent slight resurgence) in his scoring.

A few other players have careers that show the opposite profile; for instance, chance seems like an unlikely explanation of the clear upward trend to Daniel Vettori's Test batting career (p=0.018). Others have careers that are too up-and-down (Yousuf, Chanderpaul, de Silva), or too dominated by one atypical peak (Gatting, Vengsarkar) to be likely to have occurred without some underlying variability in form.

However, it turns out that cases like these are the exception rather than the rule. In a substantial majority of cases, the careers batsmen end up with are perfectly consistent with the hypothesis that an individual's long-run average provides a reasonable estimator of his run-scoring ability throughout his time in the game. This suggests pretty strongly that a lot of what we think of as form is really just random variation – the streakiness of the evenly weighted dice. Cricket fans are not alone in this: it is very well established that human beings – and perhaps especially sports fans – have a pretty poor appreciation of the play of chance (a phenomenon known as the clustering illusion).

A case in point is Alastair Cook. A couple of weeks ago, gallons of newsprint were spilled describing his supposed slump in form. However, it turns out that his is one of the least form-inflected careers of all, as his LCG (Figure 7) shows. Even before his recent Oval revival, he had averaged 39.16 in his last 20 Test innings – hardly setting the world on fire, but hardly the record of a lost cause, either. In fact, his best-ever 20-innings run in Test cricket is 53.24, and his worst is 32.00 and, in the grand scheme of things, this is not very much variation at all. This much can be inferred from the fact that the streaks overlap: there are 11 innings that appear in both!

When I took Cook's innings and put them in a random order 10,000 times, a huge majority – 9,925 – of those virtual careers showed greater streakiness than we see in his actual career. If you could see the LCGs of the form-free careers, they would almost all have conspicuously more black on them than we see on Cook's real-world graph (in the most extreme, "Cook" averaged 20.55 in one 20-innings streak and 91.19 in another). And just about all of them contained at least one cold streak that looks much worse than his recent slump.

In fact, Cook is just an extreme example of a phenomenon that is very widely observed in this dataset. Brian Lara was in an extraordinary run of good form when he averaged 83.89 in 20 consecutive innings in 2004–05, right? But shuffle his scores around at random and just over three quarters of the careers you produce will contain a streak just as hot. There's a greater weight of evidence to mark out Rahul Dravid's slump of a couple of years ago as "real" but, still, put his innings in any old order and, about 15% of the time, you'll end up with a trough at least as deep. That's a degree of uncertainty that would be very unlikely to convince statisticians in any other field that we were looking at anything other than a blip.

In this respect, I hope that, as the pressure mounted on Cook, he adopted an attitude similar to that advised by Greg Chappell (as quoted by Aakash Chopra in this column): "When not in form you should look back at your career stats. More often than not you'd find that you scored runs in every fourth or fifth innings, and hence every innings of low score is actually taking you closer to the innings in which you'd score runs." This is, doubtless, excellent advice from a psychological perspective and it's almost excellent advice from a statistical perspective, too (although we should be careful of the gambler's fallacy – that is, assuming that streaks are liable to correct themselves by some sort of "law of averages"). What we can say is that many apparent slumps like Cook's recent one are, mathematically speaking, entirely consistent with simple random variation around a constant mean that is well estimated by the batsman's career average. Or, in other words, form is temporary, but class... well, even if it isn't permanent, it seldom fluctuates much.

Technical appendix

1. To start with, an acknowledgement. The approach set out in this blog is heavily influenced by (and, in some places, directly pinched from) Curve Ball, an excellent book on baseball stats by two academic statisticians. (It's aimed at people who are fascinated by baseball and mildly interested in numbers, but I've found it works just as well for those of us who'd put that the other way around.)

2. It may be noted that, although I've presented some p-values, I haven't, at any stage, used the dread words statistically significant. Conventionally, we talk about a finding being significant if its p-value is lower than some threshold. That threshold is very often 0.05 – equivalent to saying we'll accept a 1-in-20 chance of considering our finding significant when, in fact, it's just a fluke. I'm wary of this approach, for a couple of reasons: firstly, the threshold is always arbitrary, and always involves a trade-off between type I and type II errors (in other words, the more cautious you are about interpreting something as significant, the greater the chance that you'll falsely classify something as non-significant). Secondly, there's a problem, here, with multiple testing. There are 255 batsmen in the dataset, so we'd expect to end up with 12 or 13 with p-values less than 0.05 just by chance. You could correct for this, using Bonferroni methods or similar, but I took the view that that would be complicated to explain, probably unnecessarily conservative, and would put too much stress on my approximated p-values (it would require p to be accurate to five or six decimal places, and you'd need a lot more than 10,000 samples to establish that). For these reasons, I present my p-values without correction and without (much) comment.

3. Whenever an analysis is dependent on a statistician's arbitrary choices, it is crucial to examine how much of an influence these decisions had on the results of the analysis. This is a process known as sensitivity analysis, because it analyses the extent to which the outputs of the process are sensitive to its underlying assumptions.

I did loads of these analyses. The most obvious place to start is with the size of the window over which the moving average was calculated. I looked at longer and shorter windows; here are the results for 10 innings and 30 innings. You'll see that neither list is terrifically different from the 20-innings analysis. It's interesting to see that there have been a few players who've managed 10-innings streaks with higher averages than Bradman's best; highest of all is Kumar Sangakkara's 2006–07 effort of 1,185 runs with 6 hundreds (5 of them 150+) at 197.50. No one other than Bradman has ever sustained an average of 100-plus for 30 innings, though.

Another obvious sensitivity analysis is to question the use of the simple moving average at all. The measure has some disadvantages, the most notable amongst which is that it can appear to be driven not by what's happening at a particular moment in time, but by what happened 20 innings before (take another look at Tendulkar's LCG: that sudden drop-off towards halfway through 2005 comes about because it's the point at which his 241* at the SCG in 2004 is more than 20 innings ago and, thus, falls out of the calculated moving average). An alternative approach that minimises this problem is the exponentially weighted moving average, in which innings are never completely discarded; they just receive ever-decreasing weight as they recede into the past. I chose not to use this method, in my base case, because it answers a slightly different question – something like: taking into account everything we know about a player's career to date, and placing more importance on his most recent outings, what kind of form was he in at any given instant? This is a valid question that might have its uses (perhaps if you were trying to predict how well you expect the player to do in his next innings – although it doesn't answer that question very well). However, it's not quite what I'm interested in, here, which is capturing how well a batsman did over a given phase (and, in that context, I think it's entirely appropriate that the measure should be influenced by notable scores falling out of the window of interest).

Nevertheless, to investigate how much difference the alternative approach makes, I redid all the analyses detailed above using EWMAs instead of the simple moving average. The weighting coefficient I used was 0.066967, which may sound like a weird number, but it's the one that dictates that the weight applied halves every ten innings (so ten innings ago is worth 50% as much, 20 innings ago 25%, and so on). The results table is here. By and large, there is very little difference between these results and those calculated according to the simple moving average. Maybe this mode of analysis gives very slightly more prominence to players who have a distinct trend to their careers (either worsening – a la Adams and Hussey – or improving – like Vettori and Imran). On the whole, though, I can't tell much difference between them.

4. If any statsheads read my methods and inferred (correctly) that I used bootstrap sampling without replacement, and thought that I really should have used a with-replacement approach, it's a fair cop. I just thought it'd be much easier to explain the process as shuffling the deck rather than sampling from a theoretical distribution approximated by the empirical dataset. I did some sensitivity to show that it doesn't make a huge amount of difference, in this case, but I accept that with-replacement is theoretically the better approach (plus, of course, it allows you to do amusing things like estimate confidence intervals for the batting average) (another time).

• Chris Howard on August 26, 2011, 13:13 GMT

Nice work, Gabriel. Now here's a challenge for you. Devise a method to rate a batsman's worth that takes into account not only their plain runs versus innings, but their longevity (i.e. weighting in favour of more innings), not outs (i.e. down rating "not outs" which favour lower order batsmen), their ability to turn starts into 50s, their ability to turn 50s into 100s. I had a crack at it a few months ago and got a top 20 of Bradman, GA Headley, Tendulkar, H Sutcliffe, CL Walcott, Kallis, ED Weekes, Sobers. RG Pollock, Ponting, WR Hammond, IJL Trott, VG Kambli, Sangakkara, KF Barrington, Gavaskar, Hayden and GS Chappell and Lara tied 20. But I'm sure you could do better. We need some way of rating batsmen that take more factors into account to get a truer picture of who really were the great batsmen.

• Abhi on September 24, 2010, 16:25 GMT

Thank you for the graph. Though, as you say, the reasons for variations are many- the dramatic difference between the '90s and 2000s conforms to my theories.

• Gabriel Rogers on September 24, 2010, 9:58 GMT

Thanks, Dave; of course, you (and Russ) are absolutely right. (As an aside, it's really bloody hard trying to write this stuff in a way that satisfies the occasional statistician who surfs past while keeping it relatively accessible for everyone else. I rather feared what I wrote about Hussey's p-value would get someone twitchy.) If we take alpha=0.05, that gives us 23 batsmen with significantly streaky records, compared to an expected count of 12.75. To take a simple global hypothesis test, we can plug those numbers into a Poisson distribution to estimate the probability of observing at least that many events given our expectation, and we get p=0.006. From this, I imagine we'd conclude that the chances of producing that many streaky-looking batsmen under a global null hypothesis of no-streakiness-for-anyone-ever are pretty slim. So I think I'm pretty happy to say that, at some point in the history of Test cricket, some true streakiness has occurred.

Even so, I think the evidence suggests that that streakiness has been limited, and I wholeheartedly sign up to your summary that "there is much less non-chance streakiness than one would antecedently have expected". I certainly anticipated that many more players would have barn-door evidence of streakiness, and that careers which can be comfortably be explained by random variation would be the exception rather than the (overwhelming) rule.

• Dave on September 24, 2010, 3:47 GMT

Excellent article, but the problem about multiple testing needs much more attention. You can't really say "it seems extremely unlikely (p=0.007) that a career with the profile of Mike Hussey's would have developed [by chance]", when in fact we should expect about 1 in 140 players (so about two in your sample) to have such a career just by chance. What's really relevant (as Russ in effect notes) is that four batsmen in your sample rather than two have a career as streaky as that or streakier. That's suggestive of some non-chance level streakiness, but even so it's not obvious whether it is significant (at say the 0.05 level).

It would be interesting to see global significance figures of that sort, to see whether and to what extent that null hypothesis that there is no non-chance streakiness has been decisively refuted. Even if it has been, what I take from the figures overall is that there is much less non-chance streakiness than one would antecedently have expected.

• Abhi on September 18, 2010, 3:40 GMT

Would it be possible to come up with a similar graph depicting the moving averages over time for top order batsmen (say 1-6)?

• Gabriel Rogers on September 17, 2010, 17:40 GMT

Rajesh - Your point #1 is an interesting idea. I was thinking along similar-ish lines. When I do my ODI analysis, my plan is to plot the results against Test form for players over time; although it won't be a perfect relationship, you'd expect that hot and cold streaks that really are about technique and/or mentality would be reflected in both forms of the game simultaneously. On the other hand, apparent hot and cold streaks that really are about random variation should occur... well... randomly, with no relationship between the two. Let me do that, and then I'll think about your point #2.

Abhi - The overall record you're interested in looks something like this. (This is a 5-year moving average.) From it we can infer that runs were a lot harder to come by before WW1. Other than that, I'm always wary of overinterpeting such data; as I see it, there are at least four possible reasons why we might see variation (let's say, for the sake of argument, an era in which runscoring appears higher): (1) the prevailing conditions in some way favoured batters; (2) the batters were better; (3) the bowlers were worse; (4) random variation. All-too-often, any such trends get uncritically interpreted as (1) only. One could do some work to establish the likelihood of (4); I'm afraid that's never got to the top of my to-do list.

Anonymous - be careful - the batting analysis showed that there's no real hierarchy, here (some good batsmen have streaky careers; others have laminar ones); I'll bet you wouldn't have been able to predict who'd be high and who'd be low on that list. Just because you put things in an ordered list, it doesn't mean that those at the top are "good", and those at the bottom "bad". That said, bowlers may be different. I'm just beginning to run the analyses now.

Mick - please see my response to Deepak, above.

Andrew - Yes, there is plenty of scope for raising an eyebrow at those confident conclusions from tiny subdivided samples (what baseball statisticians refer to situational effects). I have to say that I'm pretty sceptical about the meaningfulness of "statistical significance", myself, though!

Jonathan - I think it's interesting to see the development of a player's career record as time goes on, and it helps us to see how spells of relative success and failure translate into an overall record (for instance, I think it's interesting that Mike Hussey's record, over the past 20-ish innings, has pretty much enabled him to maintain his overall average, and I think it's interesting to watch Dravid's excellent 2003-07 run resulting in an average touching 60, and the subsequent deterioration thereafter).

For anyone asking for ODI and/or bowling analyses - they're on the way.

• Jonathan on September 17, 2010, 5:26 GMT

Russ, I'm not sure exactly what you're suggesting. If you simply replace Gabriel's 20 innings clumps with 1000 ball clumps, then most of the time a single very large innings should have more of an impact on the RMSD, not less.

Gabriel, it's good to see more thought about how these things are visualised, but why do you include the to-date average? It, and other cumulative rate graphs like the run rate worm have always seemed to me to have fairly limited use.

• Andrew on September 17, 2010, 0:26 GMT

Great job - there must be so many urban legends to dismantle, but form is probably the biggest. I look forward to the term "not statistically significant" entering the commentary box when they spout statistics.

Couple of requests: - the obvious bowling consistency analogy - best and worst travellers (home and away) incl the non-significance of "struggles against Country X" when they play 4 matches against 9 opponents

• saurabh somani on September 16, 2010, 18:17 GMT

Fantastic work Gabriel. Can't believe it has received so few comments. Will need to read the article again in minute detail to get its full benefit.

• Arsalan Khan on September 16, 2010, 13:25 GMT

One word: WOW

• Chris Howard on August 26, 2011, 13:13 GMT

Nice work, Gabriel. Now here's a challenge for you. Devise a method to rate a batsman's worth that takes into account not only their plain runs versus innings, but their longevity (i.e. weighting in favour of more innings), not outs (i.e. down rating "not outs" which favour lower order batsmen), their ability to turn starts into 50s, their ability to turn 50s into 100s. I had a crack at it a few months ago and got a top 20 of Bradman, GA Headley, Tendulkar, H Sutcliffe, CL Walcott, Kallis, ED Weekes, Sobers. RG Pollock, Ponting, WR Hammond, IJL Trott, VG Kambli, Sangakkara, KF Barrington, Gavaskar, Hayden and GS Chappell and Lara tied 20. But I'm sure you could do better. We need some way of rating batsmen that take more factors into account to get a truer picture of who really were the great batsmen.

• Abhi on September 24, 2010, 16:25 GMT

Thank you for the graph. Though, as you say, the reasons for variations are many- the dramatic difference between the '90s and 2000s conforms to my theories.

• Gabriel Rogers on September 24, 2010, 9:58 GMT

Thanks, Dave; of course, you (and Russ) are absolutely right. (As an aside, it's really bloody hard trying to write this stuff in a way that satisfies the occasional statistician who surfs past while keeping it relatively accessible for everyone else. I rather feared what I wrote about Hussey's p-value would get someone twitchy.) If we take alpha=0.05, that gives us 23 batsmen with significantly streaky records, compared to an expected count of 12.75. To take a simple global hypothesis test, we can plug those numbers into a Poisson distribution to estimate the probability of observing at least that many events given our expectation, and we get p=0.006. From this, I imagine we'd conclude that the chances of producing that many streaky-looking batsmen under a global null hypothesis of no-streakiness-for-anyone-ever are pretty slim. So I think I'm pretty happy to say that, at some point in the history of Test cricket, some true streakiness has occurred.

Even so, I think the evidence suggests that that streakiness has been limited, and I wholeheartedly sign up to your summary that "there is much less non-chance streakiness than one would antecedently have expected". I certainly anticipated that many more players would have barn-door evidence of streakiness, and that careers which can be comfortably be explained by random variation would be the exception rather than the (overwhelming) rule.

• Dave on September 24, 2010, 3:47 GMT

Excellent article, but the problem about multiple testing needs much more attention. You can't really say "it seems extremely unlikely (p=0.007) that a career with the profile of Mike Hussey's would have developed [by chance]", when in fact we should expect about 1 in 140 players (so about two in your sample) to have such a career just by chance. What's really relevant (as Russ in effect notes) is that four batsmen in your sample rather than two have a career as streaky as that or streakier. That's suggestive of some non-chance level streakiness, but even so it's not obvious whether it is significant (at say the 0.05 level).

It would be interesting to see global significance figures of that sort, to see whether and to what extent that null hypothesis that there is no non-chance streakiness has been decisively refuted. Even if it has been, what I take from the figures overall is that there is much less non-chance streakiness than one would antecedently have expected.

• Abhi on September 18, 2010, 3:40 GMT

Would it be possible to come up with a similar graph depicting the moving averages over time for top order batsmen (say 1-6)?

• Gabriel Rogers on September 17, 2010, 17:40 GMT

Rajesh - Your point #1 is an interesting idea. I was thinking along similar-ish lines. When I do my ODI analysis, my plan is to plot the results against Test form for players over time; although it won't be a perfect relationship, you'd expect that hot and cold streaks that really are about technique and/or mentality would be reflected in both forms of the game simultaneously. On the other hand, apparent hot and cold streaks that really are about random variation should occur... well... randomly, with no relationship between the two. Let me do that, and then I'll think about your point #2.

Abhi - The overall record you're interested in looks something like this. (This is a 5-year moving average.) From it we can infer that runs were a lot harder to come by before WW1. Other than that, I'm always wary of overinterpeting such data; as I see it, there are at least four possible reasons why we might see variation (let's say, for the sake of argument, an era in which runscoring appears higher): (1) the prevailing conditions in some way favoured batters; (2) the batters were better; (3) the bowlers were worse; (4) random variation. All-too-often, any such trends get uncritically interpreted as (1) only. One could do some work to establish the likelihood of (4); I'm afraid that's never got to the top of my to-do list.

Anonymous - be careful - the batting analysis showed that there's no real hierarchy, here (some good batsmen have streaky careers; others have laminar ones); I'll bet you wouldn't have been able to predict who'd be high and who'd be low on that list. Just because you put things in an ordered list, it doesn't mean that those at the top are "good", and those at the bottom "bad". That said, bowlers may be different. I'm just beginning to run the analyses now.

Mick - please see my response to Deepak, above.

Andrew - Yes, there is plenty of scope for raising an eyebrow at those confident conclusions from tiny subdivided samples (what baseball statisticians refer to situational effects). I have to say that I'm pretty sceptical about the meaningfulness of "statistical significance", myself, though!

Jonathan - I think it's interesting to see the development of a player's career record as time goes on, and it helps us to see how spells of relative success and failure translate into an overall record (for instance, I think it's interesting that Mike Hussey's record, over the past 20-ish innings, has pretty much enabled him to maintain his overall average, and I think it's interesting to watch Dravid's excellent 2003-07 run resulting in an average touching 60, and the subsequent deterioration thereafter).

For anyone asking for ODI and/or bowling analyses - they're on the way.

• Jonathan on September 17, 2010, 5:26 GMT

Russ, I'm not sure exactly what you're suggesting. If you simply replace Gabriel's 20 innings clumps with 1000 ball clumps, then most of the time a single very large innings should have more of an impact on the RMSD, not less.

Gabriel, it's good to see more thought about how these things are visualised, but why do you include the to-date average? It, and other cumulative rate graphs like the run rate worm have always seemed to me to have fairly limited use.

• Andrew on September 17, 2010, 0:26 GMT

Great job - there must be so many urban legends to dismantle, but form is probably the biggest. I look forward to the term "not statistically significant" entering the commentary box when they spout statistics.

Couple of requests: - the obvious bowling consistency analogy - best and worst travellers (home and away) incl the non-significance of "struggles against Country X" when they play 4 matches against 9 opponents

• saurabh somani on September 16, 2010, 18:17 GMT

Fantastic work Gabriel. Can't believe it has received so few comments. Will need to read the article again in minute detail to get its full benefit.

• Arsalan Khan on September 16, 2010, 13:25 GMT

One word: WOW

• Nick on September 16, 2010, 10:40 GMT

Gabriel - fantastic article! I've long been frustrated at so called 'cricket statistics' just identifying arbitrary outliers or data points instead of helping us parse random variation from a model of what we think is going on. The results don't surprise me, but to see them actually established empirically is very exciting.

• Raghav Kumar on September 16, 2010, 8:53 GMT

Great Article!

• Mick on September 16, 2010, 7:55 GMT

Super article. It would be good if we could see the graphs of a couple of other batsmen currently under pressure, namely, Pietersen, Ponting, Smith, Sarwan etc.

• Anonymous on September 16, 2010, 5:19 GMT

Brilliant analysis - and one request : we should always have the Don's chart also in any analysis - say what you may about his technique, his temperament, his teamspirit etc. etc., but he has been and will always be the "platinum standard" of batsmanship ... Simiilar analysis for bowlers would be welcome too !!!! And i hazard a guess - we would have m/s barnes, holding and murali on top

• Sanchez on September 16, 2010, 3:40 GMT

Greg Chappell had some really good advice. Another one was when he was asked what sort of form he was in when he went through a particularly bad patch:

"I dont know, I havent been out in the middle long enough to find out"

Great analysis. Thanks Gabriel

• Navin Agarwal on September 16, 2010, 1:39 GMT

Dear Rogers, Can we have the same table for ODI's also.

• Abhi on September 15, 2010, 15:52 GMT

1)I wonder whether you can obtain the most prolific periods for batsmen from these graphs.For eg we generally know that the mid 2000s was a run fest. Any other such periods particularly conducive to batting?

2)The reason I mentioned injuries/surgeries is because if these are included then the "predictive" power of these stats are reduced. In Cook's case ,since he hasn't had a performance reducing injury as yet these stats and their predictive power may apply.

• Ramesh Kumar on September 15, 2010, 13:56 GMT

Excellent analysis.would probably require more time for us to assimilate the points. Two queries though:

1. If we do the ODI charts, will the charts be similar to signify "form" factor? Afterall, these players play Tests and ODIs one after the other, atleast in the last 20 years.

2. If we can measure highs and lows and get a model, can we extrapolate for earlier generation players who played fewer matches so that we can compare across eras?

• Ananth on September 15, 2010, 11:36 GMT

GR An excellent article. Last year I had done a study of streaks (both high and low). However I had only presented the data in a tabular form. Gabriel's graphical presentation is an excellent visual depiction of the key aspect of batting. Only a graph can show stark variations between Cook/Rameez and Gatting/Hussey so well. Gabriel's day-to-day work of Medical statistics specialist has helped him do this very well. Hats off to him. One reader has not missed the opportunity to take an indirect dig at me by referring to "without introducing a bunch of subjective criteria to muddle the water". First let me tell ChiSquare, there is no deliberate attempt to muddy the water. Second, what is done is the result of experience, discussions and suggestions. Third, unlike Gabriel, I am not a Statistician. I am a Cricket analyst. I understand p-value but lack the background to explain that as succintly as Gabriel has done. I may not understand "clustering illusion" if it was delivered on a plate.

• Gabriel Rogers on September 15, 2010, 9:37 GMT

More than one commenter has noted that external factors - standard of opposition, state of fitness, etc. - play into the kinds of variation we see in batsmen's records. Of course, this is true, so "form", here, is really just a shorthand for variability-of-performance-for-whatever-reason. If there were such a thing as a measure of performance that was robustly reflective of these contextual factors (and, hence, "fairer" than the batting average), it would be fascinating to rerun my analyses using that measure. I have to say, though, that I'm not convinced such a thing exists (though David Barry and Charles Davis have made valuable strides in the right direction - maybe we should look at combining their approaches with mine).

Deepak - If you follow the link to the complete list at the foot of the table, the table it takes you to contains a link to each batsman's LCG.

Amit - I didn't do any graphs for shorter window-lengths, but there's a link to the numbers from a 10-inns analysis in note 3 of the Technical Appendix.

Saurabh - Looking at allrounders would be interesting (though I'm doing bowlers first) but, really, you need a single measure by which to rate them. Just taking a ratio of bowling:batting average might be okay, but I always feel that measure favours batters-who-bowl-a-bit over those whose talents are the other way around. I'll give it some thought, though.

Jayanta - Yes, the averages are over the period {x-19,...,x}.

Russ - you're absolutely right about the distribution of p-values, of course - there are more low-ish ones than you'd expect in no-one's career was subject to any kind of form. What I found notable was that there aren't more, leading me to the conclusion that there can only be a comparatively low number of batsmen (though we can't be positive which ones) whose careers show true supra-periodic variation. Your idea about average calculated over balls faced is a really excellent one, though any results will be frustratingly incomplete due to the substantial gaps in the balls-faced record. I'll see what I can produce, though.

Bala - Sorry, I used a cut-off of 2,000 Test runs; by my calculations, Chris Martin will be due to reach that number in his 1,258th Test match!

• Arjun on September 15, 2010, 6:43 GMT

Hi,

I had done similar work myself, using career runs for 10 consecutive innings instead of 'average'. Zaheer abbas was most inconsistent and Ranatunga was most consistent batsman (qualification 5000 test runs).

Arjun.

• Jayanta on September 15, 2010, 4:39 GMT

This is the best Statistical analysis I have seen in Cricinfo, with figures that truly represent the cricketers' form etc, and sound statistical justifications. Well done, Rogers. I especially like the shaded area concept, which reminds me of integration, and the fact that you have used moving average to do it. One small point, shouldn't the shaded areas also fade progressively towards the end, as they did in the beginning. Or you are taking moving average of {x-19,...,x} to find the value at x. Traditionally, the moving average at x is depicted by taking values {x-10,...,x,... x+10} if I follow the time series approach.

• bornalibran on September 15, 2010, 4:32 GMT

During SRT's lean phase close to the end of the 03 Aussie tour, I felt he was out of form but he hit a couple of double centuries in Australia/BD which shot his average up. That makes me think that there are some people for whom the moving standard deviation in average could also be a good indication of their form. Worth a try with your analysis.

• chirag on September 15, 2010, 4:16 GMT

And one thought, cricket was simply a matter of putting bat to bowl!! Fantastic analysis!So is Rameez the Bevan of tests?

• Amit on September 15, 2010, 3:12 GMT

Nice analysis. One request I have is to analyse this over a different span than 20 innings. Typically very few batsman will have the luxary of failing over 20 innings (which is effectively 12 or 13 tests). I think it might make sense to either reduce the span to 12 innings (about the tolerance level before getting dropped) or make is a variable depending on how deep in a persons career. A player can afford to have a bad batting spell now, but was under the gun when he had an awful start to his career when he failed over first 12 or so innings.

• Russ on September 15, 2010, 2:32 GMT

Gabriel, very nice work. I feel you could say a more about the p-values however. If streaks were purely a matter of luck, you would expect the distribution of p-values to be fairly flat across your sample. But that isn't the case: there are almost twice as many batsmen with p-values below 0.2 as you'd expect. That might be a sampling bias (players with long negative streaks get dropped before they can revert to the mean), but it is interesting.

As an aside, how does this analysis change if instead of clumping batsmen by innings, you did so by balls faced? That is, by calculating their average for every 1000 balls faced. And thus largely removing the problem of very large innings distorting the short-term average.

• Saurabh on September 14, 2010, 18:50 GMT

Very interesting.

I would like to see similar approach for the all-rounders. Someone like Kallis would have consistent performance but for others like Imran his last phase greatly boosts his overall performance(he shows up in this list at 15)

Would be interested if you could do that.

• Deepak on September 14, 2010, 16:13 GMT

It'd be very nice if you could put up the graphs for more batsmen, it'd be really interesting to see.

• Chi Square on September 14, 2010, 15:45 GMT

This is what I call proper statistical analysis. Thanks Gabriel, it was very illuminating to see proper valid statistical analysis in these pages without introducing a bunch of subjective criteria to muddle the water .

• sarath on September 14, 2010, 14:24 GMT

suprised to see no comments yet. But an amazing amazing article. My respect for statisians has increased 100 fold. I can well imagine the effort you must have put into it. highly insightful as the article is, it is even more amazing 4rm the statistical point of view. concepts like bootstrapping were super fun. avatar pales in comparision [(complicty), read the article 5 times, avatar twice]. Keep up the good work, it is highly appreciated.

• Abhi on September 14, 2010, 14:04 GMT

And I've been going on and on with Ananth about how we require some sort of "chronological" analysis in cricket(and how "overall" stats often(not always) distort reality...Apparently the graphs for the same are called “LCG”!

Very useful stats- most especially for injury free players/careers.

Unfortunately, the stats breakdown when dealing with a player struggling with injuries/recent surgeries etc. Especially if that part is included in calculating the p-value.

For eg. a Nadal of a little while back was clearly struggling due to injuries and “form” was not the factor. And Tendulkar's mid 2000s low was undoubtedly due to several injuries/surgeries.

• Bala Kritikeshan on September 14, 2010, 13:30 GMT

You forgot to mention Chris Martin's fabulous streak.

• Shyam on September 14, 2010, 12:58 GMT

One of the best articles on cricinfo that I have read...a lot of interesting statistics and wonderful analysis

• Navin Agarwal on September 14, 2010, 12:23 GMT

Great Article Rogers, I think that this is the first article which you written that I am reading. Talking about Tendulkar's best streaks, Statistically you are write but the first streak of 2000-2002 had 6 innings against Zimbabwe where he scored 576 runs. In second streak he scored 264 runs against Bangladesh in 3 innings. This brings that the second streak though statistically weak by 3 runs per innings was much better considering the opposition and age factor. This point can take nothing from your article though. Just Superb . keep it mate

• No featured comments at the moment.

• Navin Agarwal on September 14, 2010, 12:23 GMT

Great Article Rogers, I think that this is the first article which you written that I am reading. Talking about Tendulkar's best streaks, Statistically you are write but the first streak of 2000-2002 had 6 innings against Zimbabwe where he scored 576 runs. In second streak he scored 264 runs against Bangladesh in 3 innings. This brings that the second streak though statistically weak by 3 runs per innings was much better considering the opposition and age factor. This point can take nothing from your article though. Just Superb . keep it mate

• Shyam on September 14, 2010, 12:58 GMT

One of the best articles on cricinfo that I have read...a lot of interesting statistics and wonderful analysis

• Bala Kritikeshan on September 14, 2010, 13:30 GMT

You forgot to mention Chris Martin's fabulous streak.

• Abhi on September 14, 2010, 14:04 GMT

And I've been going on and on with Ananth about how we require some sort of "chronological" analysis in cricket(and how "overall" stats often(not always) distort reality...Apparently the graphs for the same are called “LCG”!

Very useful stats- most especially for injury free players/careers.

Unfortunately, the stats breakdown when dealing with a player struggling with injuries/recent surgeries etc. Especially if that part is included in calculating the p-value.

For eg. a Nadal of a little while back was clearly struggling due to injuries and “form” was not the factor. And Tendulkar's mid 2000s low was undoubtedly due to several injuries/surgeries.

• sarath on September 14, 2010, 14:24 GMT

suprised to see no comments yet. But an amazing amazing article. My respect for statisians has increased 100 fold. I can well imagine the effort you must have put into it. highly insightful as the article is, it is even more amazing 4rm the statistical point of view. concepts like bootstrapping were super fun. avatar pales in comparision [(complicty), read the article 5 times, avatar twice]. Keep up the good work, it is highly appreciated.

• Chi Square on September 14, 2010, 15:45 GMT

This is what I call proper statistical analysis. Thanks Gabriel, it was very illuminating to see proper valid statistical analysis in these pages without introducing a bunch of subjective criteria to muddle the water .

• Deepak on September 14, 2010, 16:13 GMT

It'd be very nice if you could put up the graphs for more batsmen, it'd be really interesting to see.

• Saurabh on September 14, 2010, 18:50 GMT

Very interesting.

I would like to see similar approach for the all-rounders. Someone like Kallis would have consistent performance but for others like Imran his last phase greatly boosts his overall performance(he shows up in this list at 15)

Would be interested if you could do that.

• Russ on September 15, 2010, 2:32 GMT

Gabriel, very nice work. I feel you could say a more about the p-values however. If streaks were purely a matter of luck, you would expect the distribution of p-values to be fairly flat across your sample. But that isn't the case: there are almost twice as many batsmen with p-values below 0.2 as you'd expect. That might be a sampling bias (players with long negative streaks get dropped before they can revert to the mean), but it is interesting.

As an aside, how does this analysis change if instead of clumping batsmen by innings, you did so by balls faced? That is, by calculating their average for every 1000 balls faced. And thus largely removing the problem of very large innings distorting the short-term average.

• Amit on September 15, 2010, 3:12 GMT

Nice analysis. One request I have is to analyse this over a different span than 20 innings. Typically very few batsman will have the luxary of failing over 20 innings (which is effectively 12 or 13 tests). I think it might make sense to either reduce the span to 12 innings (about the tolerance level before getting dropped) or make is a variable depending on how deep in a persons career. A player can afford to have a bad batting spell now, but was under the gun when he had an awful start to his career when he failed over first 12 or so innings.