August 24, 2011

Test bowlers and their mean streaks

Gabriel Rogers

Mitchell Johnson: surprisingly less variance in performances  © Getty Images
Enlarge

This post is an extremely belated follow-up to my earlier analysis of streakiness among batsmen. This time, the focus is on bowlers. I've used exactly the same methods as before – analysing and graphing moving averages (calculated over a 20-innings window, in my base case); for details, please see the batting form column.

As before, an example should help to clarify the approach and, because it's always helpful to use as much data as you can get your hands on, let's start with Muttiah Muralitharan. Murali's Longitudinal Career Graph (LCG) is shown in Figure 1. It shows, in the shaded area, his 20-innings moving average (i.e. his bowling average for every consecutive 20 innings in which he bowled). The moving average is shown relative to the average with which he finished his career: whenever the black area is above the axis, he averaged more over the previous 20 innings than he did over his whole career and, whenever the black area is below the axis, his average for the last 20 innings was worse than he achieved in the long run. The innings-by-innings progress of his career average (what StatsGuru calls the cumulative average) is shown by the red line.

Longitudinal career graph of Muttiah Muralitharan's career
© Gabriel Rogers

Looking at the red line, we can see that, from the beginning of 1996 until the end of 2008, Murali's career average showed a pretty steady improvement (it fell from 33.89 to a low-point of 21.26). But, if we were to concentrate on that career average alone, we'd probably be tempted to infer that Murali was getting better and better during this period. However, the LCG helps us to understand that wasn't exactly the case: what actually happened was that he got quite a lot better fairly suddenly, then got a bit better again, and then maintained the level of achievement for a number of years while his long-run average slowly caught up (in lengthy careers, long-run averages will only ever catch up slowly, and they'll never catch up completely). At his best during this period, Murali achieved a 20-innings streak of 89 wickets at 15.13.

It should be clear that, the greater the black area on a bowler's LCG, the streakier his performance over his career. In the same way, a single streakiness statistic can be calculated that is directly related to the area of black on each bowler's LCG. [Technically, the measure is the root mean squared deviation of the moving average relative to the long-run career average, which is then scaled by the overall average, to provide CV(RMSD).] Table 1 gives a list of the most and least streaky bowlers in Test history, sorted according to this measure.

Table 1: Streakiest bowlers in Test cricket, according to variation [CV(RMSD)] in 20-innings moving average
NameMIWAve20-Inns Min20-Inns Max20-Inns RngCV(RMSD)p
1.W Rhodes579012726.9715.5874.6059.020.4980.021
2.TM Alderman417317027.1519.2558.4539.200.4350.001
3.Intikhab Alam477812535.9523.4281.8358.410.4290.010
4.N Boje437210042.6526.0095.3869.380.3980.073
5.AV Bedser519223624.9013.8443.5829.730.3840.004
6.DL Underwood8515129725.8414.2658.1243.860.3750.017
7.Mushtaq Ahmed528918532.9722.3365.6043.270.3720.006
8.GAR Lock498817425.589.4037.9028.500.3680.048
9.GS Sobers9315923534.0424.2875.5851.300.3630.011
10.TE Bailey619513229.2115.8466.1450.300.3460.206
...
13.IT Botham10116838328.4015.5654.2138.640.3050.031
14.A Flintoff7713521933.3522.3160.0537.740.2960.014
...
16.SF Barnes275018916.4311.8524.2912.450.2890.004
17.SM Pollock10820242123.1214.8548.5033.650.2880.011
...
21.Imran Khan8614236222.8113.1534.8721.720.2770.023
...
30.JH Kallis14423826931.9918.2759.9341.660.2600.536
...
48.RJ Hadlee8615043122.3015.4837.3421.860.2280.068
...
52.M Muralitharan13022879522.6715.1336.4221.290.2190.040
53.Waqar Younis8615437323.5615.9535.9720.020.2170.111
...
57.Z Khan7914427331.7822.4654.8032.340.2130.183
...
67.SK Warne14427170225.5317.9643.1725.210.1950.215
...
70.Kapil Dev13122743429.6517.5442.2924.750.1940.458
...
75.DW Steyn468523823.2215.1431.5516.410.1860.187
...
92.AA Donald7212933022.2517.3831.6514.270.1600.361
...
96.GD McGrath12324156021.6915.2033.2618.060.1560.770
...
106.GP Swann366615328.8221.4037.6216.220.1460.425
107.MD Marshall8115137620.9515.4432.3916.960.1460.680
...
115.FS Trueman6712730721.5816.5628.3311.780.1270.787
...
124.CEL Ambrose9817940520.9915.6726.8111.140.1220.945
125.DK Lillee7013235523.9220.0033.9713.970.1210.751
...
127.MG Johnson428018129.7123.2237.5614.330.1200.745
...
131.J Srinath6512123630.4922.7540.0017.250.1140.925
...
140.SL Malinga305910133.1628.9039.8010.900.1010.817
141.DE Malcolm407212837.0931.2345.0313.810.1000.961
142.S Ramadhin437615828.9824.6138.6714.060.0950.962
143.DA Allen396512230.9826.3036.8410.540.0940.854
144.GR Dilley406513829.7625.5836.1210.540.0870.892
145.PR Adams457613432.8727.0838.5511.480.0840.977
146.RC Motz325510031.4827.4237.5710.150.0840.913
147.NAT Adcock264610421.1117.8624.236.360.0830.838
148.AN Connolly295510229.2323.4033.089.680.0770.931
149.WJ O'Reilly274814422.6019.8725.886.000.0700.889
qual. = 100 wkts, 40 inns, 1.5 inns bowled per match; stats correct at 14-Aug-2011;
full list with links to each bowler's LCG available here

Wilfred Rhodes's position at the top of the list reflects the very different ways in which his skills were deployed during his career: in his first Test, he batted at no. 10 and opened the bowling; a decade later, he was routinely opening the batting and his bowling had become occasional. Little wonder, then, that his bowling average fluctuated enormously: he achieved a 20-innings average of 15.58 in 1900–05 whereas, in 1909–13, the same measure sank to 74.60 (note how many did-not-bowleds there are in the latter sample, underlining Rhodes's change of role).

It's little surprise to see Andrew Flintoff high on the list of streaky bowlers. His LCG (Figure 2) gives a clear depiction of the well recognised tripartite nature of his career record. In the worst 20-innings period of his distinctly unimpressive first 50 or so innings, Flintoff managed just 21 wickets at an average of 60.05. Just a couple of years later, he achieved his best 20-inns streak, averaging 22.30 (although note that he only amassed 42 wickets – without a single five-fer – in that period).

Longitudinal career graph of Andrew Flintoff's career
© Gabriel Rogers

Shift that whole profile down by the best part of ten runs, and you have something eerily similar: Imran Khan's career as a Test bowler (Figure 3). Again, you have the (relative) famine followed by the (relative) feast, with an unhappy coda where the body could no longer do justice to the ability. In Imran's case, the highlight was an amazing 79 wickets at 13.15 from these 20 consecutive innings in 1982–83. His worst 20 innings were the last 20 in which he bowled but, as worst runs go, 31 wickets at 34.87 is very far from an embarrassment.

Longitudinal career graph of Imran Khan's career
© Gabriel Rogers

With Flintoff and Imran as our clue, we might notice that there are a fair few allrounders at the top of the streakiness league. Sobers (another whose bowling was pretty ordinary through his first 50 innings), Botham (whose "streakiness" was actually a fairly linear deterioration), Shaun Pollock (pretty constantly great for most of his career, but suffered an extended horrible streak at the end of it), and Jacques Kallis (up and down throughout) are all amongst the 30 most identifiably streaky, as are Trevor Bailey, Ravi Shastri, and Monty Noble. Conversely, it seems like those at the bottom of the list have a tendency to be pretty poor with the willow. In fact, there is a noisy but identifiable statistical correlation between a bowler's streakiness (CV[RMSD]) and his batting average (r2=0.102; p<0.001). The most likely explanation for this finding, it seems to me, is that bowlers who bat are more likely to endure prolonged streaks of poor form with the ball without getting dropped (it's fair to assume that Flintoff would have been given up as a lost cause long before his 50th innings if he had no capacity to contribute with the bat). As a result, bowlers with extended poor streaks – be that true underperformance or just a run of bad statistical luck – are under-represented in this dataset. If everybody was allowed to play 100 test matches regardless of how well they were doing, then I wouldn't expect allrounders' bowling figures to be any different.

When judged according to these methods, the least streaky bowler in Test history is Bill O'Reilly. His LCG (Figure 4) illustrates the serenely excellent progress of his career. In his worst 20 innings, he took 54 at 25.87; in his best 20 innings, he took 54 wickets at 19.87.

Longitudinal career graph of Bill O'Reilly's career
© Gabriel Rogers

According to the maths, the least streaky bowler with at least 100 innings under his belt is Javagal Srinath but, for me, it's Dennis Lillee's numbers that really stand out, with a career record almost as smooth as his bowling action. As his LCG (Fig 5) shows, it was only really his last few Test matches that spoiled what was otherwise an incredibly consistent career. Aside from his last six innings, his 20-innings moving average never left the twenties.

Longitudinal career graph of DennisLillee's career
© Gabriel Rogers

Mitchell Johnson's low position in the streakiness table may be a surprise to some; after all, he doesn't have much of a reputation for steady results. However, it turns out that, over periods of 20 innings, any real or perceived fluctuations in his performance tend to even themselves out, and the range over which his moving average oscillates (23.22 to 37.56) is, in comparative terms, not so great. On average, across this dataset, a bowler's performance in his best and worst 20-innings streaks are 72% and 150% of his career average, respectively. If Johnson were entirely typical, in this respect, his best and worst streaks would be 21.40 and 44.62, which confirms that his performance has been a bit less variable than average. You don't get a very different picture when you use shorter windows, either (I also looked at 10-innings averages and 5-innings averages; see Technical Appendix for a brief description and links to the results). I conclude that Johnson has a bit of an unfair reputation for wildly varying performances, though one thing these stats can't confirm or deny is that he goes through phases of conspicuously looking terrible and brilliant.

Immediately following what may be the most famous 1-wicket match-haul in Test history (at Old Trafford in 1956), Tony Lock achieved something almost as exceptional as his partner's feat: during his next twenty innings, he became the only bowler amongst those analysed here to average less than 10.00 over a period of that length, taking 54 wickets in the process. He even managed to get dropped during this streak although, to be fair to the selectors, the man they preferred – Johnny Wardle – was in the middle of a sub-20 streak of his own. With Laker's 20-innings moving average dropping to 11.86 in the same period, it's fair to assume England's spin-bowling resources of that time are unlikely ever to find a statistical equal.

There are only ten bowlers in the history of Test cricket who never averaged over 30.00 in any 20 consecutive innings. Most of them belong to an era of pervasively lower averages, but there are two modern-day exceptions – Curtly Ambrose, whose highest 20-innings moving average was just 26.81, and Mohammad Asif, who never did worse than the 28.19 he averaged in his last 20 Test innings (it's probably appropriate to speak about Mohammad Asif in the past tense, now, right?).

As with batsmen, there is no association between streakiness (or the lack of it) and success, either in terms of average (r 2=0.007; p=0.298) or win-rate (r 2<0.001; p=0.993). Some bowlers achieved great figures with up-and-down performance; others were closer to their long-run average throughout; there's no evidence that one profile or another leads to more wins or better stats at the end of your career.

What does it all mean?

As ever, though, this story only really gets interesting when we question the patterns underlying the data. Statisticians like to make a distinction between descriptive statistics (those that simply present observed data) and inferential statistics (those that seek to make sense of it). In this case, what we need to do is account for the play of random variation in bowlers' careers. It is inevitable that chance alone will lead to variation in each player's figures, and we need to distinguish this from real swings of performance.

I investigated this in exactly the same way as with batsmen – shuffling each bowler's career into a purely random order 10,000 times, and seeing how often a career as streaky as the real one emerged (the technical term for this technique is bootstrapping). This way, we get to quantify how likely it is that their careers would have happened in a world where form didn't exist (this is the figure marked p in the table – technically, it is a one-tailed empirical p-value).

The results provide a very similar picture to that which I found when analysing form amongst batsmen. The key finding is that there are surprisingly few bowlers whose careers give a convincing picture of variation in form over and above that which would be expected by chance.

One example is Terry Alderman. There are only two possible explanations for his career showing as much variation as it did: (i) for one reason or another, his essential wicket-taking ability varied over the course of his career (i.e. he really did have runs of good and bad performance), or (ii) a statistical event with probability 0.0007 (1-in-1,429) has occurred. In this circumstance, we can probably conclude with some confidence that there's some non-random variation afoot and, indeed, looking at Alderman's LCG (Figure 6), it's hard to imagine that horrible 1984 and that dazzling 1989–1990 could have happened to the same bowler.

Longitudinal career graph of Terry Alderman's career
© Gabriel Rogers

However, bowlers whose careers show such an identifiably streaky pattern are the exception rather than the rule. The relatively small number of very low p-values suggests that random variation around a career-long mean is very often a pretty plausible explanation of the peaks and troughs we tend to think of as form. Turning back to Mitchell Johnson, we can see that shuffling his career into a random order produces at least as much up-and-down as we've seen in his actual career about three-quarters of the time. Similarly, the prevailing wisdom is that a career like Steve Harmison's has been massively influenced by swings of form. However, when I took form out of the equation by putting his career in a random order, something that was – on the whole – every bit as streaky emerged nearly a quarter of the time (although less than a twentieth of the virtual careers featured a single streak as hot as Harmison's 50 wickets at 18.64 in 2003–2004). In any other field, a statistician faced with such numbers would be very unlikely to conclude that there was anything other than random variation at play.

However, one interesting finding is that Muttiah Muralitharan – although his streakiness stat (CV[RMSD]) is nothing out of the ordinary – has a pretty low p-value (much lower than those around him on the list). One reason for this is that, because his career is longer than most, it provides more data and, hence, more opportunity to distinguish signal from noise (a statistician would say that, when we look at Murali's career, we get a more powerful analysis, meaning it is less susceptible to Type II error). This raises the possibility that, if we had more data on other bowlers, we'd be able to detect streakiness in their careers more easily (in the same way that it's a lot easier to tell whether you've got an unevenly weighted coin by tossing it 200 times than it is when you toss it only 20).

Conclusions

One thing it's important to emphasise is that, although I've used the word form throughout this analysis, that's really just a shorthand term for variation-of-performance-for-whatever-reason. The methods described here can identify up-and-down results, and can account for the play of chance in contributing to apparent hot and cold streaks. What they can't do is explain the causes of any non-random variation in performance. It may be that a bowler really was worse at taking wickets in a given period, but it's equally likely that he was bowling in unfavourable circumstances beyond his control. Above, we saw that Terry Alderman's Test career appears to have more than a hint of up-and-down about it. However, that monstrous hump in his LCG just happens to coincide with a period during which he spent a disproportionate amount of time bowling at a pretty formidable West Indies side. Maybe he would have done just as badly against other opponents at this time, or maybe he would have achieved a level of performance that was more consistent with the rest of his career; nothing in the numbers alone helps us to guess.

One way or another, though, the findings described in this blog – in conjunction with my earlier analysis of test batting form – lead me to question whether, as cricket fans, we read rather too much into apparent peaks and troughs of performance. I'm quite sure few bowlers would dispute the assertion that their figures are susceptible to dumb luck; they'd certainly acknowledge that, in any individual innings, their best balls may beat the bat while they pick up wickets with deliveries that they wouldn't otherwise have wanted to remember. So it's maybe not so great a leap to conclude that the fact that bowlers end up with figures that can be quite variable across sequences of matches does not necessarily imply that there was fundamental variation in their wicket-taking capacity over those periods. In this way, it's not so surprising to see that, in a substantial majority of cases, you get just as much peak and just as much trough if you rearrange test bowlers' careers in any old order. One thing's for certain: every bowler who gets dropped after a bad trot feels certain he was on the verge of a performance that would have redressed the balance. Maybe more of them are right than we would've guessed.



Technical appendix

1. As before, I should start by acknowledging that the approach set out in this blog is heavily influenced by an excellent baseball stats book, Curve Ball by Jim Albert and Jay Bennett.

2. As I did for batsmen, I undertook a series of sensitivity analyses, varying the size of the window over which the moving average is calculated. I looked at longer and shorter windows; here are the results for 5 innings, 10 innings, and 30 innings. Once again, none of these analyses is very different from the 20-innings version. Funnily enough, the six bowlers with the most successful 10-inns streaks are all Englishmen – Lock, Barnes, Laker, Wardle, Statham, and Bedser – five of them achieving the feat in the 1950s! Most of them are also amongst the best 30-inns streaks, where they're joined by the likes of Imran, Hadlee, and Muralitharan. I also saw what difference it makes to use a different type of moving average – the exponentially weighted moving average – in which innings are never completely discarded; they just receive ever-decreasing weight as they recede into the past. The weighting coefficient I used was 0.066967, which dictates that the weight applied halves every ten innings. The results table is here. By and large, there is very little difference between these results and those calculated according to the simple moving average. I notice that a couple of bowlers whose career had a distinct upward or downward trend rise up the list (Richard Hadlee is a good example of someone who got better and better). On the whole, though, I can't tell much difference between them.

3. In the comments of my column about batting streakiness (which used an identical statistical approach to this analysis), there was some interesting discussion about p-values and multiple testing. This is an important issue in statistical analyses which look at the same thing repeatedly – in this case, the streakiness of 149 different bowlers. For example, when we say Terry Alderman appears to be a significantly streaky bowler because he has a very low p-value of 0.0007, we mean that there are only two possible explanations for his career showing as much variation as it did: (i) for one reason or another, his essential wicket-taking capacity varied over the course of his career (i.e. he really did have runs of good and bad performance), or (ii) a statistical event with probability 0.0007 (1-in-1,429) has occurred. At first glance, 1-in-1,429 seems very long odds, so it's tempting to conclude that we have a robust finding of streakiness. After all, you'd be amazed if you rolled four dice and got four sixes, and that's a slightly more likely event. However, we need to remember that there are 149 separate bowlers being analysed, here; if we repeated our dice-rolling experiment that many times, would we be very surprised to see 6-6-6-6 come up at least once along the way? So we need to be careful before assuming that something unlikely couldn't have happened when it had many opportunities to do so. There are several methods for adjusting p-values for multiple comparisons, but I chose not to extend and complicate my analysis by applying them (not least because I'm not much of a fan of obsessive p-value-spotting, in any case).

4. So, if we have to be a bit hesitant about identifying individual bowlers as especially streaky, can we tell whether there's any streakiness going on? One way to get a handle on that question (thanks to Russ and Dave, whose comments on my last column led me in this direction) is to calculate a global p-value – that is, an estimate of the weight of evidence that there's at least some streakiness somewhere amongst all the bowlers analysed. This can be done by counting the number of individual p-values below a certain level, and estimating the probability that that many bowlers (or more) would have streaky-looking records if there were nothing but random variation at play. In this instance, we can say that, with a global dataset of 149 bowlers, we would expect roughly 7 of them to have a p-value of 0.05 or less, just by chance, if there were no such thing as streakiness amongst bowlers (149 × 0.05 = 7.45). In fact, there are 21 such players in the dataset. Comparing this observed frequency to a Poisson distribution, we can calculate that the probability of getting 21 streaky players when you expect 7.45 is 0.00004. In other words, the amount of streakiness observed across all bowlers is extremely unlikely to have occurred by chance alone (in technical terms, we are likely to reject the global null hypothesis that there's no such thing as a streaky bowler).

RELATED LINKS

RSS Feeds: Gabriel Rogers

Keywords: Stats

© ESPN Sports Media Ltd.

Posted by Vikram on (August 31, 2011, 8:05 GMT)

On a holiday, and read your 3 most recent analyses in one sitting. Amazing stuff. What I found more interesting was the comments section. There seems to be a greater desire to be a "my hero better" attitude when it comes to batting. However, as for bowlers, people seem happier to concede that Holding, Marshall, Hadlee, Mcgrath, Warne, Murali, Barnes, et al are all great. Why does it not happen in the case of batting? I haven't seen a single stat of bowler A being better than bowler B as his day 1 stat is better, but for batting we get into these meaningless discussions. This comment is off the topic, but as I said, I read all three articles together and hence a combined response. Ananth, some fabulous analysis here - however, this being a team game can we have a few more team-related analyses. It might help India in terms of its future performances.

Posted by Chintan Shah on (August 31, 2011, 4:16 GMT)

Extremely insightful article, great work.

Posted by Ananth on (August 30, 2011, 14:00 GMT)

Gabriel Many thanks for your prompt clarification. If nothing else, at least it has given me a chance to explain briefly how such an analysis is approached. Ananth

Posted by Gabriel Rogers on (August 30, 2011, 9:53 GMT)

Ananth, a full explanation of my reservations about performance measures that seek to capture multiple dimensions will have to wait. But, in the meantime, I should emphasise that you really shouldn't have taken my comment as some sort of veiled criticism of your work. In particular, the note about methods not being fully documented relates to the ICC rankings, not to anything you've ever done. Cheers / G.

Posted by Tom Clarke on (August 30, 2011, 5:28 GMT)

Great work. A joy to read - did A level stats and flirted with sports numbers but didn't have the imagination. Impressive stuff.

Just read the batsmen article and note that you picked out Alastair Cook as having an even form - guess that's changed substantially!

Posted by Ananth on (August 30, 2011, 4:11 GMT)

I'm not much of a fan of so-called "omnibus measures" of good-ness, because they tend to double-count at least some characteristics and apply arbitrary judgements to all of them. I don't know about the arithmetic of it, because I've never been able to find it explained in full.

Gabriel, I noted with surprise and disappointment your comments on "Omnibus" type of analysis. I have responded to this since I tend to do such Omnibus type of analysis one every month and have also done 85% of “It Figures” articles. The response is long and I do not want to occupy your blogspace. The document has been uploaded and the link has been provided below. You could peruse it. Any other readers who are interested in this aspect of analysis could also do a download/perusal.

http://www.thirdslip.com/misc/gabriel.doc

Regards Ananth

Posted by tonyp on (August 29, 2011, 23:59 GMT)

Interesting that Bill O'Reilly came out in front, but what of Clarrie Grimmett? I don't see him on the table but I suspect he would have been quite consistent through his career. Did he not meet the selection criteria?

Tony

Posted by Gabriel Rogers on (August 29, 2011, 6:05 GMT)

Sancho: The way I see it, baseball statisticians have three massive advantages over us: (i) more data (700 at-bats in a season! what we wouldn't give to get our hands on such riches); (ii) simpler data (an awful lot of it comes down to hit the ball / didn't hit the ball, which makes it very much easier to model); (iii) more and bigger brains doing more and better work (let's not forget that at least two Nobel laureates – Ed Purcell and Dudley Herschbach have had a decent stab at baseball stats, as has Stephen Jay Gould). There's a few of us fumbling around the margins, and I strongly suspect that at least some international teams have people doing worthwhile things with fuller datasets than the rest of us get to play with but, for obvious reasons, that work doesn't get shared for the idle interest of the regular punter!

Jawad: Wasim Akram's career was fairly consistently good (never averaged more than 32.28 over any 20 consecutive innings).

Chris: I'm not much of a fan of so-called "omnibus measures" of good-ness, because they tend to double-count at least some characteristics and apply arbitrary judgements to all of them. I don't know about the arithmetic of it, because I've never been able to find it explained in full, but, from a purely cricketing perspective, I've always thought the ranking system developed by Deloittes and subsequently adopted by the ICC has pretty good face validity (I do know that it uses exponential weighting to deal with "form", which is an approach I looked at in the sensitivity analyses for this blog and the batting one).

Gerry_the_Merry: Nice examples. Not that I have any sort of bias towards Somerset cricketers.

Posted by Gerry_the_Merry on (August 27, 2011, 1:27 GMT)

Gabriel, great work, especially the detailed tables in the appendix, which have color coding of the opposition, and the LCG links for every bowler on the right. I wish you could color Z, BD in light colours and other teams in dark colors (just a personal bias, as these teams provided easy meat for opposition bowlers and batsmen, statistically speaking, and ruling out random fluctuations).

I see that on the matter of deciding between variations due to form and natural causes, those bowlers who do not have multiple areas below the normal or multiple areas above the normal can be considered to be form affected, whereas those who are scattered up and down throughout can have normal variations around the career finishing average.

For instance, Botham's career graph is striking. If the base line were lowered to 18, then it is one big mountain we climb, from left to right, which never ends, and which continuously pulls up the LCG. On the other hand, no one really got on top of Garner...

Posted by arch on (August 27, 2011, 0:45 GMT)

@Ali Hannan Imran was 37 when Waqar made his debut. Considering 35 is the retirement age for most fast bowlers, any statistical comparision of wicket shares and hunting pair breakdowns is quite meaningless.

Comments have now been closed for this article