How does one create a ranking of the top bowling performances in Test cricket?

The parameters and methodology explained

09-Jun-2018

Richard Hadlee bowls, New Zealand v England, 3rd Test, Auckland, 3rd day, February 12, 1984

It's been 17 years since my Wisden top 100 Test innings of all time was first unveiled to the public and now it's time for an upgrade.

Back in 2001, I received a lot more bouquets than brickbats, but while the bouquets make me happy, the brickbats help shape my future analyses. I am grateful and indebted to all the cricket enthusiasts who took the time to study the lists, appreciated the effort, and commented. This exercise is dedicated to the enthusiasts who have waited for this update, confident that I would do a good job of ironing out the wrinkles and incorporating all the new improvements in the database. Of course, this effort will also reveal the memorable performances in the last decade and a half and show where these rank against the best of the best.

The Wisden 100 background
Anthony Bouchier, the CEO of Wisden Online, with whom I was working as a consultant in 2000, first brought up the idea of ranking the top 100 individual performances. During the next 12 months, this one-sentence concept became a massive project as I interacted with Wisden stalwarts in the UK and India, got the database checked and cleaned up, and wrote scores of C programs. The response to the Wisden 100 lists was heartening. The wide distribution of performances made sure no one period got undue credit. The list brought into the limelight many unknown and forgotten performances, such as the batting masterpieces of Clem Hill, Azhar Mahmood and Kim Hughes; and the bowling performances of Hugh Tayfield, James White, Devon Malcolm and Graham McKenzie among others.

When Wisden Online wound up operations, it bequeathed the rights to the Wisden 100 to me. However, in deference to the farsightedness and pioneering efforts of Wisden Online, I decided I would retire the name "Wisden 100". The new lists are my enhancements and reflect my philosophy.

I have given below the basis for the initial creation of Wisden 100 Bowling Performances tables.

Test ratings for bowling - a brief note on the calculations

1. Bowling Base Points
2. Pitch Index
3. Batting Quality Index
4. Dismissed Batsman Quality Index
5. Bowling Accuracy Index
6. Highest Wickets Index
7. Match Status Index
8. Result Contribution Index

When I think back over this base, after 17 years, I am proud of the fact that the basis is still sound. My current base is not that much different to the initial one. (I will not explain these here but the descriptions of the parameters are good indicators of what they cover.) As time went by, it was clear to me that there was a need for some important tweaks in the parameters and weights.

- The Bowling Base points, based on the wickets taken, got just over 30% weightage back then, which was perhaps a little high and has been rectified now.

- The Bowling Accuracy got more weight than warranted.

- The first two innings got a bit of a raw deal. I am reminded of the comments of a long-standing reader of mine: "We tend to overlook the first- and second-innings efforts in favour of the third- and fourth-innings efforts because of the drama associated with the latter." Very well said, and true.

- The "Highest wickets index" was quite a lightweight one; it didn't really mean a lot.

- The database was in the initial stages and quite raw, without the refinements listed later in the article.

First, let me present the top 20 entries in the original Wisden 100 table for bowling performances.

Top bowling performances (the 2001 list)
RtgPts	Test	Year	Bowler	Analysis	For	Vs	Ground
253.9	437	1957	HJ Tayfield	9 / 113	SAF	Eng	Johannesburg
248.6	1443	1999	A Kumble	10 / 74	IND	Pak	Delhi
241.7	428	1956	JC Laker	10 / 53	ENG	Aus	Old Trafford
238.8	179	1929	JC White	8 / 126	ENG	Aus	Adelaide
237.1	1029	1985	RJ Hadlee	9 / 52	NZL	Aus	Brisbane
234.4	1266	1994	DE Malcolm	9 / 57	ENG	Saf	The Oval
226.1	905	1981	RGD Willis	8 / 43	ENG	Aus	Headingley
225.1	234	1934	H Verity	8 / 43	ENG	Aus	Lord's
224.3	233	1934	WJ O'Reilly	7 / 54	AUS	Eng	Trent Bridge
224.1	543	1968	GD McKenzie	8 / 71	AUS	Win	Melbourne
223.7	1423	1998	M Muralitharan	9 / 65	SLK	Eng	The Oval
223.2	138	1921	AA Mailey	9 / 121	AUS	Eng	Melbourne
222.1	527	1962	LR Gibbs	8 / 38	WIN	Ind	Bridgetown
222.0	1539	2001	Harbhajan Singh	8 / 84	IND	Aus	Chennai
219.7	849	1979	Sarfraz Nawaz	9 / 86	PAK	Aus	Melbourne
218.1	699	1972	RAL Massie	8 / 53	AUS	Eng	Lord's
216.9	676	1971	JA Snow	7 / 40	ENG	Aus	Sydney
216.7	942	1982	Imran Khan	8 / 60	PAK	Ind	Karachi
215.1	1243	1994	PS de Villiers	6 / 43	SAF	Aus	Sydney
215.1	405	1955	IW Johnson	7 / 44	AUS	Win	Georgetown

Database enhancements
I'll say that if I had version 1.3 of the database in 2001, today I have version 9.7. My own knowledge and insight into the database has increased tremendously. Extremely valuable inputs from readers have gone into refining and fine-tuning the database. However, I am glad to say that many of the innings and bowling performances from the Wisden 100 tables are still at the top positions in the current lists.

These are the database improvements:

1. Availability of Career-Location-to-date (CLtd) batting and bowling averages at the level of each Test. This took me years to develop and is, almost certainly, the most important data improvement. Topsy-turvy careers such as those of Michael Hussey, Daniel Vettori, Ricky Ponting, and Kumar Sangakkara, among batsmen, and Muttiah Muralitharan, Ian Botham and Alan Davidson, among bowlers, are now handled properly.

2. Pitch Quality Index (PQI) is a complex measure that compares the expected runs scored/wickets captured for that particular location (home/away/neutral) for the batsmen/bowlers who played with the actual runs scored/wickets captured. This is done for both innings and both teams. The PQI for the match is determined using the comparative values for the combinations. Many methods, including visual inspection, were used to validate the PQI data.

3. The Weighted Bowling Quality for the innings is a huge improvement. This is done by taking the CLtd averages of the bowlers who bowled in the innings. Thus, when Imran Khan played in 18 Tests purely as a batsman, his bowling figures were not considered. Recently when Trent Boult and Tim Southee bowled 124 balls between them to dismiss England for 58 in Auckland, the Weighted Bowling Quality was very good since only these two bowlers bowled. Therefore, Craig Overton, who scored an invaluable 33 from No. 9, got additional credit.

4. The Batting quality is determined based on the CLtd Batting averages and is a true reflection of that particular innings.

5. The fact that every nightwatchman innings has been identified and marked as such, using the average batting position and the position the batsman batted in, enables a much more accurate determination of the batting strength.

6. The standardised Wicket value (WkV) determination has enabled fine-tuning of the very important Dismissed Batsmen Quality parameter. Again, CLtd averages are the base for WkV determination.

7. The Runs per Weighted Innings (RpWI) metric is a great improvement over the batting average as well as the Runs per Innings measures, and is a true indicator of the career of the batsmen.

8. There is clear acknowledgement of neutral locations and the development of related figures. Till 2001, there were very few neutral Tests. However, as of today, more than 40 Tests have been played in neutral locations.

9. The non-contextual Contribution analysis undertaken by me and my frequent collaborator Milind Pandit is a top-down analysis. The match points flow down to the innings, then to the teams, then to the functions, and finally to the players. Notwithstanding the overall complexity of the system, the final results are extremely easy to use and understand. They are used to distribute the team result points to the individual players.

The current basis for analysis
The first bit of relevant information is that the Bowling Performance Ratings will be on a scale of 1000, as is the current practice nowadays. The 1000 mark is virtually unreachable: maybe a bowler from Zimbabwe capturing, say, 9 for 25 against Australia at the MCG while successfully defending a sub-100 total in the fourth innings could get to that point (perhaps in 2020, when Steven Smith and David Warner are back to support the few other batsmen averaging 50-plus in the Australian team).

The highest values for each measure highlighted are extracted from the 3461 performances that received in excess of 430 rating points (exactly 50% of the rating points for the best performance). For validation purposes, the average across these 3461 performances is used. An average across all 44,500-plus bowling spells will not make any sense and may distort the overall picture. The bowling analyses have been normalised to six-ball overs.

1. Base points (Wickets captured)
The Bowling Base points are given for wickets captured. A gradually increasing scale is used. It is a paradox that the tenth wicket, in all probability that of a No. 11 batsman, carries a higher value than the first five wickets. However, this is influenced by the fact that among all the bowling performances to date, there have been just two ten-wicket hauls, 16 instances of nine-fors and 76 of eight-fors. These rare performances deserve the higher weight given for the latter wickets.

The Manchester effort of Jim Laker and the Anil Kumble masterclass in Delhi get the highest points in this measure.

2. Dismissed-batsmen quality
The new measure called Wicket value (WkV) is used for this work. This is an important index, which distinguishes between two bowlers who have taken five wickets each; the first one has accounted for batsmen 1-5 and the second one for batsmen 6-10. This is done in two parts. The first is to give credit for the dismissal itself. The batsman's CLtd batting average is used for this. The highest value was given to the bowler who dismissed Don Bradman and so on. The second part rewards the timing of dismissal. A bowler who dismissed Virat Kohli for a low score gets higher credit than a bowler who dismissed Kohli after he reached a hundred.

Kapil Dev's magnificent effort of 30.5-7-85-8 against Pakistan in 1982-83 gets the highest points in this measure. The high home averages of the Pakistani batsmen contributed significantly to this.

3. Top batsmen dismissed index
This is a new parameter I introduced to take care of situations where most teams have players of average standard playing for them in the first few years after they enter the Test arena. Many of these batsmen have low averages despite the fact that they are batting at the top. However, it is important to recognise bowlers who take top-order wickets (top six plus No. 7). That is how the teams win Tests. Hence, some credit is given to dismissing top batsmen. Care is taken that the nightwatchmen playing in top positions are identified and handled appropriately.

It is not a surprise that Laker and Kumble get the most points in this regard. In addition, Saqlain Mushtaq's 8 for 164 against England in 2000-01, Muttiah Muralitharan's 9 for 51 against Zimbabwe in 2001-02, and Subash Gupte's 9 for 102 against West Indies in 1958-59 share top billing on this measure.

4. Bowling strike rate
This is very important because in Test matches the objective for all bowlers is to take wickets. The actual strike rate and the relative strike rate are both acknowledged. Bowling figures of 20-2-50-4 are very good when the team has captured ten wickets in 150 overs. If three wickets are the limit, Haris Sohail's recent spell of 1.0-0-1-3 takes the top spot. If this is increased to five, Ernie Toshack's amazing spell of 3.1-1-2-5 against India in 1947-48 gets the top honours.

However, among the qualifying performances, Harbhajan Singh, for his spell of 4.3-0-13-5 against West Indies in 2006, Shane Watson, for his spell of 5.0-2-17-5 against South Africa in 2011-12 , and Hugh Trumble, for his spell of 6.5-0-28-7 against England in 1903-04 get the highest points on this measure.

5. Bowling accuracy
This is the parameter with the least weight among the nine. However, it is important to recognise bowling accuracy to some extent since accurate bowling has its value in most innings barring those in which the bowler is defending 500 or bowling with a lead of over 200 runs. However, in order to be fair to the bowlers, the individual bowling accuracy is matched against the team's bowling accuracy. Figures of 20-2-50-2 are very good when the team has conceded 400 runs in 100 overs.

Max Walker, for his accurate spell of 21.2-8-15-6 against Pakistan in 1972-73 and Fred Titmus, for his spell of 26.0-17-19-5 against New Zealand in 1965 get the highest points on this measure.

Not that there have not been more accurate spells. How can anyone forget the Bapu Nadkarni marathon of 32-27-5-0 against England in 1963-64? But that spell does not get anywhere in the ratings exercise because of the absence of wickets.

6. Pitch Quality
The match PQI is used for this calculation. This is a normalised value dealing with extremes, like India v New Zealand, Delhi, 1955-56: a batsman's paradise supreme - 1093 runs for ten wickets at one end, and Australia v South Africa, Melbourne, 1931-32: an MCG gluepot - 234 runs for 30 wickets, at the other. It is normalised to a 100-based value, with appropriate proportionate calculations either side of the median. On this basis, the extreme values are 82 for India-New Zealand and 14 for Australia-South Africa. A fifty in a match with a PQI of 30 would be more valuable than a 150 in a match with a PQI of 75. In contrast, 6 for 60 in a match with a PQI of 30 would be less valuable than 4 for 90 in a match with PQI of 75.

Subhash Gupte, in taking 7 for 128 against New Zealand in Hyderabad in 1955-56, faced the toughest pitch. (It is true that his bowling colleagues also bowled on this tough pitch.)

7. Overall batting quality
How good the opposing batsmen were is an important factor when evaluating bowlers' performances. Taking wickets is one thing, but sustained bowling against top-quality batting outfits is another. The Batting strength index for the innings is determined using the CLtd values for the top seven batsmen. Care is taken to exclude the nightwatchman, where applicable.

Eric Hollies, during his spell of 5 for 131, faced the toughest batting septet of all time in Bradman's farewell Test in 1948.

8. Innings type and status
This is the one factor that enabled me to give due recognition to first- and second-innings performances. All the innings situations have been summarised below.

- The first two innings are considered together to distinguish between situations where the team scores are 120/100, 120/400, 400/120 and 400/400. In the first case, dismissing the other team for a low score gets due credit but is diluted because it is a bowling pitch. The 120/400 and 400/120 are similar. The bowlers who dismissed the other team for 120 have bowled on a middling pitch and they get more credit. The bowlers who dismissed the other team for 400 have bowled on the same pitch and they get less credit. The 400/400 indicates a good batting pitch and the bowlers get a lot more credit. The situations in between are handled appropriately.

- The third innings is a tough one. Is the bowler bowling with the match approximately on equal terms, or backed by a lead (small to substantial), or handicapped by a deficit (small to substantial)? Innings wins are straightforward. A notional credit is given to the winning team's bowlers. For the other matches, the quantum of the lead decides the credit. A performance of 5 for 34 by Ian Johnson against South Africa in Durban in 1949-50 gets very good reward.

- The fourth innings is straightforward. What matters finally is the target and the margin of win. A combination of these two numbers is used to realise interim factors that determine the reward. The lower the target and/or the lower the margin, the more the bowlers are rewarded. Fanie de Villiers gets recognised for his 6 for 43 in Sydney in 1993-94 - a target of 117 and victory by five runs. Where the target is high or the winning margin is high, the rewards are suitably diluted.

The innings-type related value was the highest for Frederick Spofforth against England in 1882, when Australia defended 84 runs and won by seven runs.

9. Contribution to match result
This is a direct link to the result, the location and the relative team strengths. There are three locations (Home, Neutral and Away), three results (Win, Draw and Loss) and three team-strength situations (Strong, Comparable and Weak). The team is allotted x number of points based on these three parameters. The number of points will be highest for an away win by a weak team against a strong team and lowest for a home loss by a strong team against a weak team.

Winning a Test is important. I am not a great proponent of the American axiom of "winning is everything" but I feel strongly that winning is something. And the ratings process must give due credit to the bowling performances that were primarily responsible for a win as opposed to the ones that enabled the teams to draw.

The team points are then allocated to the bowlers based on their contribution to the result. The complex Match contribution values are used for this purpose. Hence, there has to be a combination of a magnificent bowling performance (eight-plus wickets) and an outstanding result (away win against a much stronger team) to get high points on this measure. As has already been seen, the contribution is worked out on a non-contextual basis and this is a good way of allocating the team points. The first eight parameters are contextual.

Sarfraz Nawaz, for his spell of 47.2-7-86-9 against Australia at the MCG in 1978-79 got the most points in this regard. It can be seen that this recognition is for a combination of magnificent performances by Sarfraz and Pakistan both.

A general view on the Bowler Performance Ratings
The ratings work is done by allotting points for the nine measures. The overall weight ranges from around 5% to 20%. The individual performance weights could have a wider range. For a performance to be in the top ten or thereabouts, it cannot score low on even a single measure. For a performance to be in the top 50, it could sustain sub-par scores on one or possibly even two measures. For a performance to be in the top 200, maybe two to three sub-par scores could be sustained. And so on.

As fans, what we perceive to be a top performance could be lacking in something or the other. You could expect Murali's 9 for 51 against Zimbabwe to be in the top ten. After all, this is the nearest any bowler has come to a ten-wicket haul without actually taking one. You may even have a strong emotional connection to the spell. However, the performance just misses out on a top 50 place because: 1) Sri Lanka were playing at home; 2) they were far stronger than Zimbabwe; 3) Zimbabwe's batting was decidedly weak - barring Andy Flower, the other batsmen did not even reach a CLtd average of 30. These are enough reasons to push this performance down, although it's still comfortably in the top 60.

And, for similar reasons, Murali's 8 for 70 at Trent Bridge and 9 for 65 at The Oval are comfortably in the top 30.

The bottom line is that it is extremely tough to take a good guess at what performances are ranked where. This problem is compounded because we are limited by our knowledge. Even if we have an encyclopaedic knowledge of the game, we are influenced by what we saw rather than by what we might have only read about. Take Trevor Bailey's 7 for 34 in Kingston, 1953-54. When one thinks of great bowling performances, this spell does not immediately jump up even though it's one of the few bowling performances with no negative points. Moving across different time periods, the same can be said about Tayfield's 9 for 113 against England or Richard Hadlee's 9 for 52 at the Gabba.

That is where computer-based analysis is superior. It has no heart to contend with and works with no limits or limitations. In a way, the only acquired bias could be mine, the creator's. So I have kept my heart to myself, set aside my own likes and dislikes, and assumed that the parameters I have set are correct. I should also be ready to fine-tune the system to accommodate the changes in the derived data content. My use of the newly available data derivatives would change the results, hopefully not drastically.

In view of the importance of the allocated weights, I am sure readers will ask me: "How do you know you have the correct weights?" A very valid question indeed. The answer is that there is no Eureka moment. It is a combination of common sense, knowing what to look for, looking for out-of-place performances and checking the overall summaries. My gut says what to expect in the overall percentage values. If "Base wicket value" exceeds 20%, I will be concerned. If "Bowling accuracy" exceeds 5%, I won't be pleased. If "Contribution to match result" is below 10% or above 15%, I will start to worry. If one performance is an undeserved total outlier, I won't sleep peacefully. And I have done a lot of consultation to ensure that there is consensus about the rankings.

If a bowling performance against a seemingly weak team ends up being placed very high, I make sure it deserves the high place. I have a special trick of selecting five random performances, say 18th, 92nd, 278th, 945th and 1987th, after each run of the program. Then I check those performances in the context of the match and satisfy myself that the placing looks reasonably okay in overall terms.

This compilation is the result of hard work over hundreds of days and is not for the faint-hearted. There is no easy tool to do the checking. That is one reason why I get upset when some readers ask inane, superficial and silly questions after a five-minute read of complex articles. Since I have spent nearly a thousand hours on this project, I request that the readers should at least spend that many seconds.

A preview of part two
I have presented below three charts that provide a preview of the analyses in part two. This will give you an idea of the way the analyses have been done.

First is a summary of the weights secured by the nine parameters.

I am very happy with the weights, which have been summarised across the 3461 qualifying performances. I know that all these are for taking three or more wickets, so I know that across all performances, the "Wickets Captured" weight will come be 20. The four parameters that get more than 10% weight are probably the most important and get their deserved places. Of course, it should be understood that the weights for individual performances will be quite different. A fourth-innings defence of a low target will have a high "Innings Type/Status" value. Any bowlers who bowled against the great batting teams of the 1948 Australians will have high "Overall Batting Quality" values. Bowlers who performed well on pitches with PQI greater than 60 will have high "Pitch Quality" values. And so on.

Next is a grouping of the 3461 bowling performances that qualified for the final summarising, by period.

Since the number of Tests played in different periods varies from 134 during 1876-1914 to 823 during 2000-2018, I had to adopt an alternative measure to compare. I worked out the frequency of top bowling performances per Test across the periods. During the first period, this was the highest at 1.81 performances per Test. This is understandable in view of the uncovered wickets, the infancy of batting skills, and three-day Tests. Then there was a dramatic drop to 1.48 during the period 1920-1939. Again, understandable since this period had some of the best batsmen who ever graced the crease: Bradman, Hammond, Hobbs, Hutton, McCabe, Headley etc.

The third era, of 1946-1959, saw the bowlers holding sway, with 1.55 performances per Test. The next period, 1960-1979, saw a lot of defensive, draw-at-all-costs batting and this is indicated by the all-time-low occurrence of 1.40 top bowling performances per Test. Then, there was a period with several world-class bowlers - Holding, Marshall, Hadlee, Warne, Murali, McGrath, Ambrose and Kumble. This has resulted in the next two periods showing increasing frequency in top bowling performances: 1.45 during 1980-1999 and 1.53 during 2000-2018. The overall average during the 2303 Tests played so far is 1.50.

Now let us look at a pie chart of the top bowling performances by innings.

In the 2303 Tests played so far, there have been 2303 first innings, 2284 second innings, 2223 third innings and 1543 fourth innings. This explains why only 14.8% of the performances are in the fourth innings. In terms of frequency, the third innings is the best, with 46.7%; followed by the second innings, with 44.4%; the first innings, with 38.6%; and finally, the fourth innings, with 33.2%. The last one is understandable since many fourth innings are inconsequential low-target innings or draws with no chance of a positive result.

Readers' thoughts welcome
Other than the featured performances, any reference to the ones outside my list of top 30 will be without any revelation of the actual position or rating points. I will only indicate a broad range. This will avoid any unnecessary comments back and forth.

How should I present the featured data? A slight change in weight given will move the performances around. However, I am confident that even if I change the weights, no more than three-fourths of the performances will go out of the top 30 table, and even so only down to positions 30-50. These are all once-in-a-lifetime performances.

So, do I present the top 30 as a single table with the rating points, or do I present them as groups of performances with only a range of rating points as the indicator? Maybe in sequence, as firsts among equals, within the group but avoiding the actual rating points? I think the readers should have a say. May I request you to send in your comments? In anticipation of these comments, let me keep all options alive.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems