April 4, 2012

# Consistency in Test bowlers: a new look

An improved way of analysing consistency across the career of Test bowlers statistically

This is based on an idea given by Prashanth. After giving the idea and participating in a discussion or two, he disappeared off the radar. However I thank him for providing the spark. Couple of years back Gabriel Rogers did a similar article. However that wonderful article was based on complex statistical methodology and would not have been out of place in an Annual Conference of Statisticians. Mine is simpler, more common-sense based and is aimed at everyone who comes into this blogspace, irrespective of his statistical knowledge.

The relevant points are explained below.

1. For this purpose five-Test slices are considered. This is a reasonable number and normally covers 2-3 months of Test cricket. Tests, rather than innings are used as the basis so that both bowling and batting can be covered in an equitable manner.
2. Five tests means that batsman can go through a Test or two of limited opportunities to bat or non-batting because of emphatic wins etc. There will be enough opportunities within the five-Test slice to catch up. Normally the bowlers do not have this problem since they do a higher share of a team's work and have to capture 20 wickets for a win.
3. There is enough time to get over short duration loss of form.
4. To measure consistency, only runs scored and wickets captured will be used. The fundamental cricket dictum that batsmen should score runs and bowlers should take wickets is followed. Averages are important mainly over a career and for comparisons across players.
5. Why not average? Let us take couple of examples to understand why not. McGrath and Trueman have career averages around 22.0 and WpT values of around 4.5. In a 5-Test period, match context being comparable, McGrath captures 25 wickets at 25.0 and Trueman, 15 wickets at 20.0. Who has performed closer to his career figures and for that matter, better. Certainly McGrath, despite the higher slice average. Similarly for batting.
6. Let us not forget that we remember numbers like 46 (Laker) and 41 (Alderman) rather than the specific averages. Similarly 774 (Gavaskar) and 688 (Lara) without being aware of the averages.
7. The career slices should be non-overlapping and equal, other than the last one. Gooch's 456 in one test should be part of one career slice only. Similarly Laker's 19 wickets. Hence the concept of rolling number of Tests is not valid.
8. Five Tests might seem arbitrary but represents a long enough career slice. It represents a long Test series.
9. The keyword is consistency with reference to the player's own career performance levels. It may happen that a bowler has a rather high WpT value: e-g, Barnes at 7.00, Muralitharan at 6.02 et al., and what is perfectly acceptable for another bowler might not be, for such bowlers. That is acceptable since they have set high benchmarks and we are interested in seeing how often they went off these benchmarks.
10. We are not looking about high and low values but only relative to the concerned player's career figures. Over a five-Test stretch Murali is expected to take 30 wickets and Kallis is expected to capture nine wickets. This will be the basis. If Murali captured 20 wickets in the Test slice, it is well below average and the same 20-wickets performance for Kallis, way above average.
11. I know that a bowler like Imran Khan who did not bowl at all in 3 slices at the end of his career would be slightly affected by this methodology. However there is no clear method of handling this. I do not want to exclude Tests where a bowler did not bowl. Then the number of slices would not be dependent on the number of Tests played. Also I don't want someone later on asking me to exclude batsman's Test where there has been an innings win for loss of few wickets. These are minor quirks and may only reduce the accuracy from 100% to 95%.
12. Adjustment is made for the last career slice if the same is fewer than five Tests.
13. The criteria for selection is 100 or more Test wickets. 160 bowlers qualify. The only bowlers of note who are missing are Shane Bond and Frank Tyson (Adrian, happy !!!).
14.The Standard Deviation (SD) of the slice ratios is used to determine consistency.

I had initially thought that I would combine the batsmen and bowlers together in a single article. However the introduction of six tall graphs meant that the article would have become very long and I have separated this into two articles. The graphs are also special purpose ones showing the slice plotting of up to 10 players per graph.

The following 5 groups are formed for purposes of determining consistency. For each career-slice of 5-tests, a ratio is formed between that concerned slice's runs/wickets and the career-average runs/wickets for 5 tests. This ratio is called SPF (Slice Performance Factor). Suppose the bowler has captured 17 wickets and his 5-Test career-WpT value is 24, the SPF value is 0.71. If he captured 30 wickets, the SPF is 1.25.

```A. SPF  below 0.67:  Well below average - Falls into the inconsistent bracket.
B. SPF 0.67 - 0.90:  Below average
C. SPF 0.90 - 1.10:  Around average
D. SPF 1.10 - 1.33:  Above average
E. SPF  above 1.33:  Well above average - Falls into the inconsistent bracket.

```
Groups B, C and D are considered to be well within the average levels. Standard Deviation is also used to determine the consistency.

First some data tables. The first one is the core table of bowlers who have captured over 300 wickets in their Test career. The tables and graphs are presented with least comments. Let me allow the erudite readers to come out with their own comments.

BowlerTestsWktsAvgeWpTMeanStdDevMid3%C-SlicesGrp AGrp BGrp CGrp DGrp E

Muralitharan M13380022.736.01.000.27474.127451053
Warne S.K14570825.424.91.000.29569.02946955
Kumble A13261929.654.71.000.27574.12737584
McGrath G.D12456321.644.51.000.26672.02535854
Walsh C.A13251924.443.91.010.32870.42738745
Kapil Dev N13143429.653.30.990.39048.12777337
Pollock S.M10842123.123.91.000.27472.72227544
Wasim Akram10441423.624.00.990.33766.72143653
Harbhajan Singh9840632.224.10.990.26185.02033680
Ambrose C.E.L9840520.994.11.000.28770.02028424
Ntini M10139028.833.91.000.35561.92128236
Botham I.T10238328.403.81.010.42952.42164254
Marshall M.D8137620.954.60.990.29270.61733452
Waqar Younis8737323.564.31.000.43055.61847124
Imran Khan8836222.814.10.980.46455.61851453
Vettori D.L11035833.873.31.000.41068.22246363
Lillee D.K7035523.925.11.000.24371.41422622
Vaas WPUJC11135529.583.21.000.37669.62346463
Donald A.A7233022.254.61.000.22686.71515441
Willis R.G.D9032525.203.61.000.17494.41814670
Lee B7631030.824.11.000.26687.51609232
Gibbs L.R7930929.093.90.990.27568.81633442
Trueman F.S6730721.584.60.990.30071.41424512

To clarify the table contents. WpT mean Wickets per test. Mean is the mean of the SPF values and is close to 1.0 for all bowlers. StdDev is the Standard Deviation for all the SPF values. Mid3% is the % of the Groups B, C and D over the total number of Career Slices, which is the next column: C-Slices. Grp A to Grp E are self-explanatory. The complete file is available for downloading. The link is provided at the end.

Amongst the top wicket-takers, only Hadlee and Harbhajan Singh have the Mid3% values exceeding 80, indicating a high level of consistency. Then comes Donald, with 86% and Willis, with a very high 94%.

Consistency is determined in two ways. The first is statistical. The Standard Deviation (SD) is determined for all the ratios. Low SD values indicate consistent players and high SD values indicate inconsistent players. The usual method of using the Coefficient of Variation is not required since the means for almost all players is around 1.00. Shown below are the SD tables with the low-20 SDs indicating very consistent bowlers.

BowlerTestsWktsAvgeWpTMeanStdDevMid3%C-SlicesGrp AGrp BGrp CGrp DGrp E

O'Reilly W.J2714422.605.31.000.120100.0601410
Morkel M3913930.043.61.000.152100.0802420
Dilley G.R4113829.763.40.980.169100.0903420
Kasprowicz M.S3811332.883.00.990.17187.5802411
Willis R.G.D9032525.203.61.000.17494.41814670
Snow J.A4920226.674.11.000.19490.01003511
Collinge R.O3511629.253.31.000.19685.7711230
Lohmann G.A1811210.766.21.020.200100.0401120
Old C.M4614328.113.11.020.202100.01003340
Saeed Ajmal2010726.705.31.000.210100.0401120
Danish Kaneria6126134.804.30.990.21184.61313621
Umar Gul4315732.483.71.000.21988.9903321
Johnson I.W4510929.192.41.000.22277.8911421
Steyn D.W5427223.195.00.990.22381.81112521
Hughes M.G5321228.384.00.990.22490.91113160
Donald A.A7233022.254.61.000.22686.71515441
Statham J.B7025224.853.61.000.23078.61422631
Johnson M.G4719031.294.01.000.23670.01012502
Edmonds P.H5112534.182.51.000.23981.81113421

Now for the tables with the high-SD values indicating a very low level of inconsistency.

BowlerTestsWktsAvgeWpTMeanStdDevMid3%C-SlicesGrp AGrp BGrp CGrp DGrp E

Rhodes W5812726.972.21.000.88733.31251213
Hooper C.L10211449.431.11.010.72723.82194017
Briggs J3311817.753.60.970.67957.1721211
Hogg R.M3812328.453.20.990.56975.0815101
Bracewell J.G4110235.812.51.130.56355.6922122
Illingworth R6112231.202.00.990.55561.51335212
Kallis J.H15227632.451.81.000.54738.731106249
Sobers G.St.A9323534.042.51.000.52952.61944245
Giffen G3110327.103.30.980.52342.9721202
Shastri R.J8015140.961.91.000.51262.51636313
Verity H4014424.383.61.000.51162.5821311
Noble M.A4212125.002.91.020.50855.6923112
Underwood D.L8629725.843.51.060.49633.31862226
Mushtaq Ahmed5218532.973.61.000.48554.51124113
Intikhab Alam4712535.952.71.010.48130.01031114
Greig A.W5814132.212.40.990.47958.31233402
Giles A.F5414340.602.60.990.47345.51132303
Imran Khan8836222.814.10.980.46455.61851453
Bailey T.E6113229.212.21.000.46330.81362023

The alternate method is common-sense-based. The two extreme group numbers, A and E, are considered significant departures from the career levels. The middle three group numbers are added and divided by the total number of slices to get the Mid3%. This reflects the consistency of the players. Shown below are the SD tables with the high-10 Mid3% values.

BowlerTestsWktsAvgeWpTMeanStdDevMid3%C-SlicesGrp AGrp BGrp CGrp DGrp E

O'Reilly W.J2714422.605.31.000.120100.0601410
Old C.M4614328.113.11.020.202100.01003340
Morkel M3913930.043.61.000.152100.0802420
Dilley G.R4113829.763.40.980.169100.0903420
Lohmann G.A1811210.766.21.020.200100.0401120
Saeed Ajmal2010726.705.31.000.210100.0401120
Willis R.G.D9032525.203.61.000.17494.41814670
Hughes M.G5321228.384.00.990.22490.91113160
Snow J.A4920226.674.11.000.19490.01003511

Now for the tables with the low Mid3% values indicating a very low level of inconsistency.

BowlerTestsWktsAvgeWpTMeanStdDevMid3%C-SlicesGrp AGrp BGrp CGrp DGrp E

Hooper C.L10211449.431.11.010.72723.82194017
Intikhab Alam4712535.952.71.010.48130.01031114
Bailey T.E6113229.212.21.000.46330.81362023
Underwood D.L8629725.843.51.060.49633.31862226
Rhodes W5812726.972.21.000.88733.31251213
Pathan I.K2910032.263.40.990.36333.3620202
Boje N4310042.652.30.990.42933.3932103
Benaud R6324827.033.90.990.44238.51343024
Kallis J.H15227632.451.81.000.54738.731106249

Not surprisingly there is a strong negative correlation between the two methods. Understandably the correlation is negative since low SD and high Mid3% values indicate consistency. The correlation coefficient is a fairly high -0.73.

Now for some special graphs.

Graph of consistency for top wicket-takers

The top-10 bowlers are featured. It can be clearly seen that most of these bowlers do not exhibit a high level of consistency. The only exception seems to be Hadlee, during the first half of his career.

### Most consistent: Based on low SD values

Most consistent bowlers (based on low SD values)

Look at O'Reilly. An SD a low as 0.12 indicates a very consistent career. This is borne out by his placement in the next graph also. Willis is the only one amongst this lot with over 85 Tests. The others have all played below 50 Tests. Amongst the modern bowlers, Kasprowicz and Steyn have been fairly consistent. Especially look at Steyn's last five slices.

### Most consistent: Based on high Middle-3-group % values

Most consistent bowlers (based on high middle-3 % values

These are the bowlers with high middle three group % values. There are four bowlers, led by O'Reilly who have all their groups in the middle. This is amazing. This means that not once did these bowlers go below 66.7% or above 133.3% of their career values. That is some consistency. It can be seen that three of these bowlers, O'Reilly, Dilley and Adcock also occupy the top three positions in the SD table, indicating the very high degree of correlation between the two methods. Old is there in the top-10. However in terms of consistency, Willis takes the plum position. Look at his graph. Out of 18 career slices only once has he gone into the two extreme groups.

### Least consistent: Based on high SD values

Least consistent bowlers (high SD values)

These graphs look like the dying person's cardiograph. These players have had moves up and down throughout their career. Most of them are also batting all-rounders. It is also possible that these players might have had stretches in which they bowled very little. However that means that they were very inconsistent as bowlers.

### Least consistent: Based on low Middle-3-group % values

Least consistent bowlers (low middle-3 % values)

Almost the same bowlers. However now Underwood and Benaud come in. Look at Pathan's graph. Most of these bowlers have around a third of the slices in the middle.

### Bowlers with top averages

Graph of bowlers with top averages