December 19, 2008

The most efficient strike bowlers in Tests

A stats analysis to identify the most efficient strike bowlers in Test history

My usual lair is Different Strokes, but that's a place for (semi-)topical opinion rather than discussion of statistics methodology, and Rajesh has been kind enough to allow me to interlope and put this little study before you.

Although I didn't start out that way, what I've ended up with, I think, is a pretty good cross-era ranking of the most efficient strike bowlers Test cricket has known. I don't claim that it's definitive: what I do claim is that the method I've used is quite interesting, and I'd like to see what other stats mavens make of it.

The first decision I made was to eliminate all minnow matches. Leaving out Bangladesh and Zimbabwe is pretty commonplace but if we're being realistic, only England and Australia did not have a bedding-in period as minnows before they became a team to be at least reckoned with. It seems essential to eliminate minnow matches because otherwise some bowlers are at a distinct disadvantage: a bowler whose career was from 1970-1980 never got a chance to bowl at a minnow team, whereas Fred Trueman had endless fun with weak Asian teams in the 1950s. Since I use Ric Finlay's Tastats, this sort of exclusion is very easily accomplished.

There being no formal event which declares a team to have "arrived" in Test cricket, I had to make some arbitrary judgements about when to regard a team as having graduated. I took South Africa's entry to senior ranks as having occurred when they unveiled their quartet of googly bowlers and comprehensively thrashed the fairly weak England team which toured in 1905-6, after which they were generally difficult to beat. Though West Indies won a series against England in 1934-35, the touring side was again half-strength; I decided that they did not really graduate until 1945. India's graduation I took to be 1961, Pakistan's 1965, New Zealand's 1969 and Sri Lanka's 1990. One might be able to argue that Zimbabwe were of a reasonable standard from about 1998-2003, but it seemed simpler to leave them and Bangladesh out of all consideration. Since I also have a prejudice against non-Test matches being included, the ICC Superflop game is also left out.

Subtracting those games has a widely-varying effect on a bowler's career total of wickets. Muralitharan drops from 751 wickets to 588 and Trueman from 307 to 192, whereas Jeff Thomson and Michael Holding's figures remain untouched.

Next, I decided to find a way to give greater credit for taking top-order wickets, because they are the ones you really want your strike bowlers to be cleaning up.

I was initially tempted to weight them on the basis of the runs scored at each position, but then realised that the top order contribution is exaggerated by declarations and innings cut short by the match being over. I then moved on to using the batting averages at each position.

Adding up the averages for each position gives a total of 307.27. The share for each wicket is given by positional average/total average, so the #3 average of 39.662 is 0.129 of the total. Those shares sum to 1, so if we multiply them by 11, they will sum to 11. This gives us the following weightings:

1 2 3 4 5 6 7 8 9 10 11
1.34 1.27 1.42 1.48 1.34 1.15 0.98 0.75 0.56 0.41 0.331

If the dismissal of a batsman is worth the above number of wickets, then a bowler taking one of each will have a total of 11 wickets, whereas someone with a top-order bias will have more and someone who wipes up tail-enders exclusively will have a lot less. Owing to a limitation in TAStats, whose breakdown of bowler's victims by position does not differentiate between openers, in practice I used 1.30 for both 1 and 2.

To take three examples, Shane Warne's total gets adjusted from 685 to 685.2, Glenn McGrath's from 549 to 605.6, and Stuart MacGill's from 164 to 159.0. Given that in practice a lot more top-order batsmen than tail-enders get dismissed, most bowlers actually show a profit, so MacGill's reduction is evidence that he really was a tail-end cleaner.

If we apply this wicket adjustment to the figures for non-excluded matches and remove everyone who played less than 20 relevant games or took under 100 relevant wickets, this is the resultant top ten by average:

Player M Balls Runs Wkts Adj W AdjAve AdjSR
SF Barnes 27 7873 3106 189 203.6 15.26 38.67
R Peel 20 5216 1715 101 108.7 15.78 48.00
MD Marshall 81 17,584 7876 376 410.3 19.20 42.86
CEL Ambrose 96 21,641 8401 397 433.1 19.40 49.97
GD McGrath 120 28,485 11,930 549 605.6 19.70 47.04
AK Davidson 34 8997 3033 142 153.9 19.71 58.48
JC Laker 36 10,312 3611 162 178.3 20.25 57.84
AA Donald 69 14,906 7113 316 350.7 20.28 42.50
H Trumble 32 8099 3072 141 150.8 20.37 53.71
J Garner 58 13,175 5433 259 265.9 20.44 49.56

The right-hand column shows that there is a wide disparity between bowlers' strike rates. A strike bowler's efficiency does not depend solely on runs conceded; his strike rate is also an important factor because of the runs scored at the other end and the overall time taken. If Dale Steyn bowls six overs and takes a wicket but concedes 30 runs while Makhaya Ntini concedes 18 in his six without taking a wicket, the opposition are 48/1 at the end of these spells. If Shaun Pollock bowls 11 overs and concedes 20 runs while taking a wicket, 33 runs get conceded at the other end and the opposition reach 53/1 although the game is ten overs older.

I have for some time been toying with a measure I call the Power Index, which combines the average and strike rate by multiplying them together and taking the square root. Sqrt((runs/wickets)*(balls/wickets)) has a denominator of wickets, so the numerator can be seen as representing the resources used up in taking a wicket.

If we apply that algorithm, we get a new top ten, as follows:

Player M Balls Runs Wkts Adj W AdjAve Adj SR Adj PI
SF Barnes 27 7873 3106 189 203.6 15.26 38.67 24.29
R Peel 20 5216 1715 101 108.7 15.78 48.00 27.52
MD Marshall 81 17,584 7876 376 410.3 19.20 42.86 28.68
AA Donald 69 14,906 7113 316 350.7 20.28 42.50 29.36
CEH Croft 27 6165 2913 125 141.7 20.55 43.50 29.90
DW Steyn 23 4414 2706 114 114.7 23.60 38.49 30.14
GD McGrath 120 28,485 11,930 549 605.6 19.70 47.04 30.44
CEL Ambrose 96 21,641 8401 397 433.1 19.40 49.97 31.13
J Garner 58 13,175 5433 259 265.9 20.44 49.56 31.82
Waqar Younis 73 13,517 7374 293 312.3 23.61 43.28 31.97

Ambrose and McGrath drop down, Colin Croft rises, and Dale Steyn and Waqar Younis come in instead of Davidson and Trumble.

However, this is deeply unsatisfactory because we know that Barnes and Peel played in a time when scores were lower and wickets fell much more often. Today's fashion is to bat aggressively from the word go, whereas in the middle of the last century caution was the Test batsman's watchword. We need a way of equalising for the changes in general pitch conditions and style of play.

This is a well-known problem, and what follows does not claim to be universally applicable.

But the essential aspects of what we are examining here are the balls bowled, runs conceded and wickets taken. If we can find a way of keeing one or more invariant, then we have a fixed point while scaling the others to fit.

I decided to use the first match innings of Tests as the way to fix par. The first innings of the match is the least likely to be cut short by weather, and the least likely to be affected by tactical considerations. A third innings can be anything from a stonewall grind trying to save a match to a hell-for-leather bash while trying to set a target, but a first innings is always going to be played at whatever pace the side think appropriate given the conditions and they will nearly always get as many runs as the conditions allow. The dimensions of the first match innings may change, but its tactical purpose does not.

Across our population of matches, the mean first match innings notches up 327 runs off 678 balls.

What I did was to find out the dimensions of the average first match innings in a particular bowler's period. I decided not to restrict the sample to matches that the bowler played in, because then his performances are effectively the norm and we don't see how he stood out (or not) from his contemporaries. I think we are more interested in how their performances stack up relative to everything that happened in their period, so I used all the non-excluded matches played in the cricket years (running May-April) which his career spanned. Somone who debuted on 11th November 1982 and finished on 25th August 1994 would thus have his period defined as 1982 -1995 (Ric Finlay will recognise his "years from and to" filter option).

I then scaled their figures for balls bowled and runs conceded accordingly. So a bowler whose period averaged 340 runs off 650 balls is adjusted to concede his actual runs * 327/340 off his actual balls * 678/650 . We now have adjusted figures for each of balls, runs and wickets and can run through our standard calculations for average, strike rate and PI to come up with our final result, the top ten of which looks like this:

Player B/I1 NewB R/I1 NewR NewW NewSR NewAve NewPI
MD Marshall 659 18,091.0 321 8023.2 410.3 44.10 19.56 29.37
SF Barnes 552 9670.1 266 3818.3 203.6 47.50 18.75 29.85
DW Steyn 630 4750.3 358 2471.7 114.7 41.42 21.55 29.88
AA Donald 644 15,693.0 319 7291.4 350.7 44.74 20.79 30.50
GD McGrath 645 29,942.4 335 11,645.1 605.6 49.44 19.23 30.83
KR Miller 800 8199.6 329 3597.0 175.3 46.77 20.52 30.98
RR Lindwall 798 9219.3 325 4337.5 200.3 46.03 21.66 31.57
EH Croft 641 6520.9 308 3092.7 141.7 46.01 21.82 31.69
FS Trueman 764 9085.6 325 4665.5 203.5 44.64 22.92 31.99
JC Laker 790 8850.0 321 3678.5 178.3 49.64 20.63 32.00

B/I1 and R/I1 are the average first match innings balls and runs for that bowler's period.

As a dedicated supporter of SF Barnes as the king of bowlers, I am mortified to discover that Malcolm Marshall pips him to the top spot - but if Barnes had to be toppled, I'm glad it was Macko.

On a very contemporary note, Dale Steyn has made an incredible start to his career, since he comes in at number three (with a bullet) on this all-time list. Waqar Younis's figures at the same stage of his career were even more spectacular, with a PI of 27.29, so we can probably assume that Steyn will also descend the list as his career unfolds.

Of the top ten, only McGrath and Laker were ever really used in a containing role on dead pitches, and they did not do that much. In the full table, those who spend time keeping the runs down without taking wickets lose out, with the result that Shane Warne comes a lowly 55th. But then, this is not a merit ranking but an assessment of how nearly they approached the ideal of incessant lethality.

It's not an unbelievable top ten. If the model is wrong, it still manages to produce a sensible result.

But it can certainly be challenged on a number of points.

Are the cut-off dates for minnowhood reasonable?

Are the relative batting averages of the positions in the batting order a sound way to weight the value of wickets?

Is the Power Index a sensible way of combining parsimony and frequency to measure attacking prowess?

Is the first match innings a useful point of reference?

Even if comparing first match innings is reasonable, should one average the dimensions thereof for all matches or just the ones the bowler played in?

Whatever those averages are, is it sufficent to scale them in a linear fashion or should some more complex function be used?

So let the debate on those and no doubt other questions commence.

The full table is available here.