This post is an extremely belated follow-up to my earlier analysis of streakiness among batsmen. This time, the focus is on bowlers. I've used exactly the same methods as before – analysing and graphing moving averages (calculated over a 20-innings window, in my base case); for details, please see the batting form column.
As before, an example should help to clarify the approach and, because it's always helpful to use as much data as you can get your hands on, let's start with Muttiah Muralitharan. Murali's Longitudinal Career Graph (LCG) is shown in Figure 1. It shows, in the shaded area, his 20-innings moving average (i.e. his bowling average for every consecutive 20 innings in which he bowled). The moving average is shown relative to the average with which he finished his career: whenever the black area is above the axis, he averaged more over the previous 20 innings than he did over his whole career and, whenever the black area is below the axis, his average for the last 20 innings was worse than he achieved in the long run. The innings-by-innings progress of his career average (what StatsGuru calls the cumulative average) is shown by the red line.
Looking at the red line, we can see that, from the beginning of 1996 until the end of 2008, Murali's career average showed a pretty steady improvement (it fell from 33.89 to a low-point of 21.26). But, if we were to concentrate on that career average alone, we'd probably be tempted to infer that Murali was getting better and better during this period. However, the LCG helps us to understand that wasn't exactly the case: what actually happened was that he got quite a lot better fairly suddenly, then got a bit better again, and then maintained the level of achievement for a number of years while his long-run average slowly caught up (in lengthy careers, long-run averages will only ever catch up slowly, and they'll never catch up completely). At his best during this period, Murali achieved a 20-innings streak of 89 wickets at 15.13.
It should be clear that, the greater the black area on a bowler's LCG, the streakier his performance over his career. In the same way, a single streakiness statistic can be calculated that is directly related to the area of black on each bowler's LCG. [Technically, the measure is the root mean squared deviation of the moving average relative to the long-run career average, which is then scaled by the overall average, to provide CV(RMSD).] Table 1 gives a list of the most and least streaky bowlers in Test history, sorted according to this measure.
Wilfred Rhodes's position at the top of the list reflects the very different ways in which his skills were deployed during his career: in his first Test, he batted at no. 10 and opened the bowling; a decade later, he was routinely opening the batting and his bowling had become occasional. Little wonder, then, that his bowling average fluctuated enormously: he achieved a 20-innings average of 15.58 in 1900–05 whereas, in 1909–13, the same measure sank to 74.60 (note how many did-not-bowleds there are in the latter sample, underlining Rhodes's change of role).
It's little surprise to see Andrew Flintoff high on the list of streaky bowlers. His LCG (Figure 2) gives a clear depiction of the well recognised tripartite nature of his career record. In the worst 20-innings period of his distinctly unimpressive first 50 or so innings, Flintoff managed just 21 wickets at an average of 60.05. Just a couple of years later, he achieved his best 20-inns streak, averaging 22.30 (although note that he only amassed 42 wickets – without a single five-fer – in that period).
Shift that whole profile down by the best part of ten runs, and you have something eerily similar: Imran Khan's career as a Test bowler (Figure 3). Again, you have the (relative) famine followed by the (relative) feast, with an unhappy coda where the body could no longer do justice to the ability. In Imran's case, the highlight was an amazing 79 wickets at 13.15 from these 20 consecutive innings in 1982–83. His worst 20 innings were the last 20 in which he bowled but, as worst runs go, 31 wickets at 34.87 is very far from an embarrassment.
With Flintoff and Imran as our clue, we might notice that there are a fair few allrounders at the top of the streakiness league. Sobers (another whose bowling was pretty ordinary through his first 50 innings), Botham (whose "streakiness" was actually a fairly linear deterioration), Shaun Pollock (pretty constantly great for most of his career, but suffered an extended horrible streak at the end of it), and Jacques Kallis (up and down throughout) are all amongst the 30 most identifiably streaky, as are Trevor Bailey, Ravi Shastri, and Monty Noble. Conversely, it seems like those at the bottom of the list have a tendency to be pretty poor with the willow. In fact, there is a noisy but identifiable statistical correlation between a bowler's streakiness (CV[RMSD]) and his batting average (r2=0.102; p<0.001). The most likely explanation for this finding, it seems to me, is that bowlers who bat are more likely to endure prolonged streaks of poor form with the ball without getting dropped (it's fair to assume that Flintoff would have been given up as a lost cause long before his 50th innings if he had no capacity to contribute with the bat). As a result, bowlers with extended poor streaks – be that true underperformance or just a run of bad statistical luck – are under-represented in this dataset. If everybody was allowed to play 100 test matches regardless of how well they were doing, then I wouldn't expect allrounders' bowling figures to be any different.
When judged according to these methods, the least streaky bowler in Test history is Bill O'Reilly. His LCG (Figure 4) illustrates the serenely excellent progress of his career. In his worst 20 innings, he took 54 at 25.87; in his best 20 innings, he took 54 wickets at 19.87.
According to the maths, the least streaky bowler with at least 100 innings under his belt is Javagal Srinath but, for me, it's Dennis Lillee's numbers that really stand out, with a career record almost as smooth as his bowling action. As his LCG (Fig 5) shows, it was only really his last few Test matches that spoiled what was otherwise an incredibly consistent career. Aside from his last six innings, his 20-innings moving average never left the twenties.
Mitchell Johnson's low position in the streakiness table may be a surprise to some; after all, he doesn't have much of a reputation for steady results. However, it turns out that, over periods of 20 innings, any real or perceived fluctuations in his performance tend to even themselves out, and the range over which his moving average oscillates (23.22 to 37.56) is, in comparative terms, not so great. On average, across this dataset, a bowler's performance in his best and worst 20-innings streaks are 72% and 150% of his career average, respectively. If Johnson were entirely typical, in this respect, his best and worst streaks would be 21.40 and 44.62, which confirms that his performance has been a bit less variable than average. You don't get a very different picture when you use shorter windows, either (I also looked at 10-innings averages and 5-innings averages; see Technical Appendix for a brief description and links to the results). I conclude that Johnson has a bit of an unfair reputation for wildly varying performances, though one thing these stats can't confirm or deny is that he goes through phases of conspicuously looking terrible and brilliant.
Immediately following what may be the most famous 1-wicket match-haul in Test history (at Old Trafford in 1956), Tony Lock achieved something almost as exceptional as his partner's feat: during his next twenty innings, he became the only bowler amongst those analysed here to average less than 10.00 over a period of that length, taking 54 wickets in the process. He even managed to get dropped during this streak although, to be fair to the selectors, the man they preferred – Johnny Wardle – was in the middle of a sub-20 streak of his own. With Laker's 20-innings moving average dropping to 11.86 in the same period, it's fair to assume England's spin-bowling resources of that time are unlikely ever to find a statistical equal.
There are only ten bowlers in the history of Test cricket who never averaged over 30.00 in any 20 consecutive innings. Most of them belong to an era of pervasively lower averages, but there are two modern-day exceptions – Curtly Ambrose, whose highest 20-innings moving average was just 26.81, and Mohammad Asif, who never did worse than the 28.19 he averaged in his last 20 Test innings (it's probably appropriate to speak about Mohammad Asif in the past tense, now, right?).
As with batsmen, there is no association between streakiness (or the lack of it) and success, either in terms of average (r 2=0.007; p=0.298) or win-rate (r 2<0.001; p=0.993). Some bowlers achieved great figures with up-and-down performance; others were closer to their long-run average throughout; there's no evidence that one profile or another leads to more wins or better stats at the end of your career.
What does it all mean?
As ever, though, this story only really gets interesting when we question the patterns underlying the data. Statisticians like to make a distinction between descriptive statistics (those that simply present observed data) and inferential statistics (those that seek to make sense of it). In this case, what we need to do is account for the play of random variation in bowlers' careers. It is inevitable that chance alone will lead to variation in each player's figures, and we need to distinguish this from real swings of performance.
I investigated this in exactly the same way as with batsmen – shuffling each bowler's career into a purely random order 10,000 times, and seeing how often a career as streaky as the real one emerged (the technical term for this technique is bootstrapping). This way, we get to quantify how likely it is that their careers would have happened in a world where form didn't exist (this is the figure marked p in the table – technically, it is a one-tailed empirical p-value).
The results provide a very similar picture to that which I found when analysing form amongst batsmen. The key finding is that there are surprisingly few bowlers whose careers give a convincing picture of variation in form over and above that which would be expected by chance.
One example is Terry Alderman. There are only two possible explanations for his career showing as much variation as it did: (i) for one reason or another, his essential wicket-taking ability varied over the course of his career (i.e. he really did have runs of good and bad performance), or (ii) a statistical event with probability 0.0007 (1-in-1,429) has occurred. In this circumstance, we can probably conclude with some confidence that there's some non-random variation afoot and, indeed, looking at Alderman's LCG (Figure 6), it's hard to imagine that horrible 1984 and that dazzling 1989–1990 could have happened to the same bowler.
However, bowlers whose careers show such an identifiably streaky pattern are the exception rather than the rule. The relatively small number of very low p-values suggests that random variation around a career-long mean is very often a pretty plausible explanation of the peaks and troughs we tend to think of as form. Turning back to Mitchell Johnson, we can see that shuffling his career into a random order produces at least as much up-and-down as we've seen in his actual career about three-quarters of the time. Similarly, the prevailing wisdom is that a career like Steve Harmison's has been massively influenced by swings of form. However, when I took form out of the equation by putting his career in a random order, something that was – on the whole – every bit as streaky emerged nearly a quarter of the time (although less than a twentieth of the virtual careers featured a single streak as hot as Harmison's 50 wickets at 18.64 in 2003–2004). In any other field, a statistician faced with such numbers would be very unlikely to conclude that there was anything other than random variation at play.
However, one interesting finding is that Muttiah Muralitharan – although his streakiness stat (CV[RMSD]) is nothing out of the ordinary – has a pretty low p-value (much lower than those around him on the list). One reason for this is that, because his career is longer than most, it provides more data and, hence, more opportunity to distinguish signal from noise (a statistician would say that, when we look at Murali's career, we get a more powerful analysis, meaning it is less susceptible to Type II error). This raises the possibility that, if we had more data on other bowlers, we'd be able to detect streakiness in their careers more easily (in the same way that it's a lot easier to tell whether you've got an unevenly weighted coin by tossing it 200 times than it is when you toss it only 20).
One thing it's important to emphasise is that, although I've used the word form throughout this analysis, that's really just a shorthand term for variation-of-performance-for-whatever-reason. The methods described here can identify up-and-down results, and can account for the play of chance in contributing to apparent hot and cold streaks. What they can't do is explain the causes of any non-random variation in performance. It may be that a bowler really was worse at taking wickets in a given period, but it's equally likely that he was bowling in unfavourable circumstances beyond his control. Above, we saw that Terry Alderman's Test career appears to have more than a hint of up-and-down about it. However, that monstrous hump in his LCG just happens to coincide with a period during which he spent a disproportionate amount of time bowling at a pretty formidable West Indies side. Maybe he would have done just as badly against other opponents at this time, or maybe he would have achieved a level of performance that was more consistent with the rest of his career; nothing in the numbers alone helps us to guess.
One way or another, though, the findings described in this blog – in conjunction with my earlier analysis of test batting form – lead me to question whether, as cricket fans, we read rather too much into apparent peaks and troughs of performance. I'm quite sure few bowlers would dispute the assertion that their figures are susceptible to dumb luck; they'd certainly acknowledge that, in any individual innings, their best balls may beat the bat while they pick up wickets with deliveries that they wouldn't otherwise have wanted to remember. So it's maybe not so great a leap to conclude that the fact that bowlers end up with figures that can be quite variable across sequences of matches does not necessarily imply that there was fundamental variation in their wicket-taking capacity over those periods. In this way, it's not so surprising to see that, in a substantial majority of cases, you get just as much peak and just as much trough if you rearrange test bowlers' careers in any old order. One thing's for certain: every bowler who gets dropped after a bad trot feels certain he was on the verge of a performance that would have redressed the balance. Maybe more of them are right than we would've guessed.
1. As before, I should start by acknowledging that the approach set out in this blog is heavily influenced by an excellent baseball stats book, Curve Ball by Jim Albert and Jay Bennett.
2. As I did for batsmen, I undertook a series of sensitivity analyses, varying the size of the window over which the moving average is calculated. I looked at longer and shorter windows; here are the results for 5 innings, 10 innings, and 30 innings. Once again, none of these analyses is very different from the 20-innings version. Funnily enough, the six bowlers with the most successful 10-inns streaks are all Englishmen – Lock, Barnes, Laker, Wardle, Statham, and Bedser – five of them achieving the feat in the 1950s! Most of them are also amongst the best 30-inns streaks, where they're joined by the likes of Imran, Hadlee, and Muralitharan. I also saw what difference it makes to use a different type of moving average – the exponentially weighted moving average – in which innings are never completely discarded; they just receive ever-decreasing weight as they recede into the past. The weighting coefficient I used was 0.066967, which dictates that the weight applied halves every ten innings. The results table is here. By and large, there is very little difference between these results and those calculated according to the simple moving average. I notice that a couple of bowlers whose career had a distinct upward or downward trend rise up the list (Richard Hadlee is a good example of someone who got better and better). On the whole, though, I can't tell much difference between them.
3. In the comments of my column about batting streakiness (which used an identical statistical approach to this analysis), there was some interesting discussion about p-values and multiple testing. This is an important issue in statistical analyses which look at the same thing repeatedly – in this case, the streakiness of 149 different bowlers. For example, when we say Terry Alderman appears to be a significantly streaky bowler because he has a very low p-value of 0.0007, we mean that there are only two possible explanations for his career showing as much variation as it did: (i) for one reason or another, his essential wicket-taking capacity varied over the course of his career (i.e. he really did have runs of good and bad performance), or (ii) a statistical event with probability 0.0007 (1-in-1,429) has occurred. At first glance, 1-in-1,429 seems very long odds, so it's tempting to conclude that we have a robust finding of streakiness. After all, you'd be amazed if you rolled four dice and got four sixes, and that's a slightly more likely event. However, we need to remember that there are 149 separate bowlers being analysed, here; if we repeated our dice-rolling experiment that many times, would we be very surprised to see 6-6-6-6 come up at least once along the way? So we need to be careful before assuming that something unlikely couldn't have happened when it had many opportunities to do so. There are several methods for adjusting p-values for multiple comparisons, but I chose not to extend and complicate my analysis by applying them (not least because I'm not much of a fan of obsessive p-value-spotting, in any case).
4. So, if we have to be a bit hesitant about identifying individual bowlers as especially streaky, can we tell whether there's any streakiness going on? One way to get a handle on that question (thanks to Russ and Dave, whose comments on my last column led me in this direction) is to calculate a global p-value – that is, an estimate of the weight of evidence that there's at least some streakiness somewhere amongst all the bowlers analysed. This can be done by counting the number of individual p-values below a certain level, and estimating the probability that that many bowlers (or more) would have streaky-looking records if there were nothing but random variation at play. In this instance, we can say that, with a global dataset of 149 bowlers, we would expect roughly 7 of them to have a p-value of 0.05 or less, just by chance, if there were no such thing as streakiness amongst bowlers (149 × 0.05 = 7.45). In fact, there are 21 such players in the dataset. Comparing this observed frequency to a Poisson distribution, we can calculate that the probability of getting 21 streaky players when you expect 7.45 is 0.00004. In other words, the amount of streakiness observed across all bowlers is extremely unlikely to have occurred by chance alone (in technical terms, we are likely to reject the global null hypothesis that there's no such thing as a streaky bowler).