April 13, 2012

Consistency in Test batsmen: a new look

A statistical analysis of consistency among Test batsmen

Allan Border has the lowest standard deviation among the top-20 batsmen Adrian Murrell / © Getty Images

This is based on an idea given by Prashanth. After giving the idea and participating in a discussion or two, he disappeared off the radar. However I thank him for providing the spark.

This follows the article on "Consistency in Test bowlers: a new look" (click here). The relevant points are explained below.

1. I had used 5 Tests as the basis for bowling. However there are many Tests in which a batsman does not get a chance to bat, because of heavy top-order batting, innings wins, big wicket wins et al. Hence I have taken 10-Innings slices as the basis for batsmen analysis. This is a reasonable number and normally covers 2-3 months of Test cricket. This is normally 5-6 Tests.
2. 10 innings means that batsman can go through a Test or two of limited opportunities to bat or non-batting because of emphatic wins etc. There will be enough opportunities within the 10-innings slice to catch up.
3. There is enough time to get over short duration loss of form.
4. To measure consistency, only runs scored will be used. The fundamental cricket dictum that batsmen should score runs and bowlers should take wickets is followed. Averages are important mainly over a career and for comparisons across players.
5. Why not average? Let us take couple of examples to understand why not. Sehwag and Younis Khan have career averages just over 50 and RpT values of around 85. In a 10-innings period, match context being comparable, Younis scores 330 at 55 and Sehwag scores 450 at 45. Who has performed closer to his career figures and for that matter, better. Certainly Sehwag, despite the lower slice average.
6. Let us not forget that we remember numbers like 974 (Bradman), 774 (Gavaskar) and 688 (Lara) rather than the averages.
7. The career slices should be non-overlapping and equal, other than the last one. Gooch's 333 should be part of one career slice only. Hence the concept of rolling number of innings is not valid.
8. 10 innings might seem arbitrary but represents a long enough career slice. It represents a long 5/6 Test series.
9. The keyword is consistency with reference to the player's own career performance levels.
10. We are not looking about high and low values but only relative to the concerned player's career figures. Over a 10-innings stretch Graeme Smith is expected to score 408 runs and Habibul Bashar is expected to score 300 runs. This will be the basis. If Smith scored 350 runs, it is a below-average performance and if Bashar scored 350 runs, it is an above-average performance.
11. Adjustment is made for the last career slice if the same is fewer than 10 innings.
12. The criteria for selection is 3000 or more Test runs. 162 batsmen qualify. It is unfortunate that a few top batsmen like Graeme Pollock and George Headley do not make the cut.
14.The Standard Deviation (SD) of the slice ratios is used to determine consistency.
15.There were suggestions that I should use more Tests/innings as the basis. I have resisted that idea mainly because I want to be hard on the players. If English batsmen had a great five- Test stint in summer and a poor five-Test sojourn in winter, I want these to be treated as two out-of-the-normal occurrences and do not want to get the 10 Tests together, get a nice, middle-level performance which papers over cracks. Same with all teams. Let us also agree. If a batsman scores 180 runs in 10 innings, it is a major cause for concern and should not be covered up by 600 runs in 10 innings before or after this barren period..

The following 5 groups are formed for purposes of determining consistency. For each career-slice of 10-innings, a ratio is formed between that concerned slice's runs and the career-average runs for 10 tests. This ratio is called SPF (Slice Performance Factor). Suppose the batsman has scored 284 runs and his 10-innings and his career-RpI value is 40, the SPF value is 0.71. If he scored 501 runs, the SPF is 1.25.

A. SPF  below 0.67:  Well below average - Falls into the inconsistent bracket.
B. SPF 0.67 - 0.90:  Below average
C. SPF 0.90 - 1.10:  Around average
D. SPF 1.10 - 1.33:  Above average
E. SPF  above 1.33:  Well above average - Falls into the inconsistent bracket.

Groups B, C and D are considered to be well within the average levels. Standard Deviation is also used to determine the consistency.

First some data tables. The complete table is available for download. The tables and graphs are presented with least comments. Let me allow the erudite readers to come out with their own comments.

BatsmanTeamInningsRunsAvgeRpIMeanStdDevMid3%GrpsGrp AGrp BGrp CGrp DGrp E
         %<6767-9090-110110-133>133
Tendulkar S.RInd3111547055.4549.70.990.32568.832410756
Dravid RInd2861328852.3146.50.990.33272.42949574
Ponting R.TAus2761319653.4347.81.010.41260.72855846
Kallis J.HSaf2571237956.7848.21.000.34869.22645674
Lara B.CWin2321195352.8951.50.990.27875.02438373
Border A.RAus2651117450.5642.20.990.24181.527261063
Waugh S.RAus2601092751.0642.01.000.33361.52647546
Jayawardene D.PSlk2171044351.1948.11.000.35259.12254454
Gavaskar S.MInd2141012251.1247.31.000.35068.22246453
Chanderpaul SWin234970949.2841.51.010.30466.72445564
Sangakkara K.CSlk183938254.8751.30.990.42557.91944524
Gooch G.AEng215890042.5841.40.990.36372.72245652
Javed MiandadPak189883252.5746.71.000.38457.91936415
Inzamam-ul-HaqPak200883049.6144.11.000.37660.02053363
Laxman V.V.SInd225878145.9739.00.990.30769.62337634
Hayden M.LAus184862650.7446.90.990.38547.41961624
Richards I.V.AWin182854050.2446.90.990.40673.71927613
Stewart A.JEng235846539.5636.01.000.36866.72446644
Gower D.IEng204823144.2540.30.990.29785.72127651
Sehwag VInd167817850.8049.00.990.42852.91745314
Boycott GEng193811447.7342.00.990.33270.02034643
Smith G.CSaf174804349.6546.20.990.35066.71828224
Sobers G.St.AWin160803257.7850.21.000.30768.81633442
Waugh M.EAus209802941.8238.41.000.28376.22126643
Fleming S.PNzl189717240.0737.91.000.24784.21918442
Chappell G.SAus151711053.8647.11.000.25581.21624271
Bradman D.GAus80699699.9487.51.000.27275.0812131
Flower AZim112479451.5542.80.980.43666.71224402

To clarify the table contents. RpI mean Runs per innings. Mean is the mean of the SPF values and is close to 1.0 for all batsmen. StdDev is the Standard Deviation for all the SPF values. Mid3% is the % of the Groups B, C and D over the total number of Career Slices, which is the next column: Grps. Grp A to Grp E are self-explanatory. The complete file is available for downloading. The link is provided at the end. The first one is the core table of batsmen who have scored over 8000 runs in their Test career. In addition, Don Bradman (no need to explain), Greg Chappell (a modern great), Stephen Fleming (New Zealand) and Andy Flower (Zimbabwe) are included.

Contrary to what all of us may have perceived, Lara is remarkably consistent on this 10-innings basis. His SD of 0.278 is second only to Border amongst the top-20 batsmen. Just to confirm that this is not a fluke, look at his Mid3% which is quite high at 75.2. Again, bettered only by Border and Gower.

Consistency is determined in two ways. The first is statistical. The Standard Deviation (SD) is determined for all the ratios. Low SD values indicate consistent players and high SD values indicate inconsistent players. The usual method of using the Coefficient of Variation is not required since the means for almost all players is around 1.00. Shown below are the SD tables with the low-20 SDs indicating very consistent batsmen.

BatsmanTeamInningsRunsAvgeRpIMeanStdDevMid3%GrpsGrp AGrp BGrp CGrp DGrp E
         %<6767-9090-110110-133>133
Greig A.WEng93359940.4438.70.990.171100.01002530
Redpath I.RAus120473743.4639.51.000.19591.71204431
Ranatunga ASlk155510535.7032.91.010.20293.81607441
Hassett A.LAus69307346.5644.50.990.20485.7711230
Fredericks R.CWin109433442.4939.81.000.205100.01103440
Pietersen K.PEng143665449.2946.51.010.21086.71513731
Knott A.P.EEng149438932.7529.51.000.22886.71514541
Saeed AnwarPak91405245.5344.51.020.230100.01004150
Smith R.AEng112423643.6737.81.000.23683.31214421
Hutton LEng138697156.6750.50.990.23785.71415251
Wright J.GNzl148533437.8336.01.000.23880.01516242
Border A.RAus2651117450.5642.20.990.24181.527261063
Ijaz AhmedPak92331537.6736.00.980.24690.01004231
Fleming S.PNzl189717240.0737.91.000.24784.21918442
Mushtaq MohammadPak100364339.1736.41.000.24870.01013222
Hunte C.CWin78324545.0741.61.000.24887.5803311
Collingwood P.DEng115426040.5737.00.980.24991.71213440
Strauss A.JEng167660441.0239.51.000.25082.41709233
Sutcliffe HEng84455560.7354.20.980.25277.8912411
Chappell G.SAus151711053.8647.11.000.25581.21624271

Tony Greig is the surprise leader in this table, with a low SD value of 0.171. The most notable modern batsman in this table is Pietersen with an excellent SD value 0.210. Other than Pietersen there is no current batsman in this list. Like Lara. he has certainly surprised us. Maybe there is a lot of substance behind that exaggerated swagger. He talked about the many hours of practice put in while talking of his Colombo classic. Maybe that is paying off. It is also possible that unlike what one associates with him, he does not have extensive bad patches nor purple patches. I also wish he stops making silly statements.

The alternate method is common-sense-based rather than on a statistical measure. The two extreme group numbers, A and E, are considered significant departures from the career levels. The middle three group numbers are added and divided by the total number of slices to get the Mid3%. This reflects the consistency of the players. Shown below are the SD tables with the high-10 Mid3% values.

BatsmanTeamInningsRunsAvgeRpIMeanStdDevMid3%GrpsGrp AGrp BGrp CGrp DGrp E
         %<6767-9090-110110-133>133
Fredericks R.CWin109433442.4939.81.000.205100.01103440
Saeed AnwarPak91405245.5344.51.020.230100.01004150
Greig A.WEng93359940.4438.70.990.171100.01002530
Ranatunga ASlk155510535.7032.91.010.20293.81607441
Redpath I.RAus120473743.4639.51.000.19591.71204431
Collingwood P.DEng115426040.5737.00.980.24991.71213440
Ijaz AhmedPak92331537.6736.00.980.24690.01004231
Hunte C.CWin78324545.0741.61.000.24887.5803311
Pietersen K.PEng143665449.2946.51.010.21086.71513731
Knott A.P.EEng149438932.7529.51.000.22886.71514541
Gower D.IEng204823144.2540.30.990.29785.72127651
Cook A.NEng135618448.6945.81.000.29185.71415521
Hutton LEng138697156.6750.50.990.23785.71415251
Slater M.JAus131531242.8440.50.980.26385.71414441
Hassett A.LAus69307346.5644.50.990.20485.7711230

These are the batsmen with high middle three group % values indicating a high degree of consistency. In the bowler tables, there were six bowlers with 100% of their groups in the middle-3 groups. It seems like batting is slightly more difficult since there are only three batsmen. These all belong to the 70s/80s/90s. Roy Fredericks, the attacking West Indian batsman leads the three-some, followed by Saeed Anwar and Tony Greig. Collingwood is there as also Pietersen and Cook. Possible reason for England's pre-eminence.

Now for some special graphs.

Top run-scoring batsmen

Top run-getters in Tests career
© Anantha Narayanan

The top-9 batsmen, who have crossed 10000 Test runs, are featured. It can be clearly seen that most of these batsmen do not exhibit a high level of consistency. The only exceptions seem to be Allan Border and for the first two-thirds of his career, Jayawardene.

Most consistent: Based on low SD values

batsmen with low standard deviation values
© Anantha Narayanan

As already discussed this table is led by Tony Greig. A fairly low SD of 0.171 indicates a very consistent career. This is borne out by his placement in the next graph also. However it should be noted that the lowest SD value for bowlers is a much lower 0.124. Pietersen finds a place in both the consistency graphs.

Most consistent: Based on high Middle-3-group % values

Batsmen with high middle-3 group % values
© Anantha Narayanan

Unlike bowlers where there were six with 100% in the middle categories, amongst batsmen, there are only three: namely Fredericks, Saeed Anwar and Greig.

Least consistent: Based on high SD values

Batsmen with high standard deviation values © Anantha Narayanan

These graphs look like the dying person's cardiograph. These batsmen have had moves up and down throughout their career. Exemplified by Gambhir who had a poor start, great move up and then fell off equally badly. Vettori has had such a Jekyll and Hide career that it is not surprising to see him here. In the first 70 innings Vettori averaged 18. In the next 100 innings he averaged well over 35.

Least consistent: Based on low Middle-3-group % values

Batsmen with low middle-3 % values
© Anantha Narayanan

It is clear that these two methods of determining consistency are quite different. There are different sets of batsmen in the two graphs.

Batsmen with top RpI figures

Batsmen with highest RPI figures
© Anantha Narayanan

Just to complete the analysis I have given here the charts for the top batsmen - by Runs per innings, since most of them would have missed the first chart: by career runs scored. Again inconsistency seems to be the trend here.

I think mention must be made of two batsmen, Tony Greig and Kevin Pietersen. Tony Greig never went off the middle three groups. That is some level of consistency. Pietersen, amongst the modern batsmen, has surprised us with his high degree of consistency.

To download/view the Excel sheet containing the complete data for 162 batsmen please click/right-click here. I have strengthened the Excel sheet by colour coding the individual SPF values through dynamic formatting.

Ed Smith's thought-provoking piece on randomness and form "When is poor form just randomness?" (click here) made me realize that this particular measure I have created can be applied to Ed Smith's axiom. Suppose I summed the SPF values of the top six batsmen or top four bowlers for every Test/innings, we would know what are the lowest SPF averages (very poor form, as a group of six/four players) and the highest SPF averages (very rich form, as a group of six/four players). That, for a later article.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

Comments