Matches (14)
WI v NZ (1)
Men's Hundred (1)
ENG v SA (1)
Women's Hundred (1)
RL Cup (7)
ZIM v IND (2)
Asia Cup QLF (1)
Stats Analysis

Which Test batsmen are the most consistent? Let's look at how often they score fifties

Consistency is not only about scoring fifties or hundreds frequently, but also about maintaining even gaps between such scores

Himanish Ganjoo
Virat Kohli in a chat with Joe Root, England v India, 1st Test, Edgbaston, day four, August 4, 2018

The average gap between scores of 50 and above for Virat Kohli is 2.86 innings, while for Joe Root it is 2.63  •  Getty Images

When analysing batsmen, the notion of "consistency" is frequently discussed - the mark of a player who scores well and does it frequently enough. A batsman with a wide variation in scores, with a hundred in one game, and a failure in the next, is colloquially deemed erratic, and thus "inconsistent". But how does one exactly quantify this notion of consistency? Is there a rigorous measure that can pin down this criterion, which is abstract but often bandied about criterion?
There have been multiple forays into measuring consistency. Ananth Narayanan has used the concept of the median - which is the value that lies at the midpoint if all the scores of a batsman are arranged in an ascending or descending order - to measure how lopsided a batsman's distribution of scores is. In another analysis, he broke batting careers into ten-Test chunks, and measured how many of these phases were low-scoring, as a proxy for consistency.
More simply, though, let's go back to the simpler definition of consistency I outlined above. Someone who scores "well enough" frequently enough is a consistent player. Let us begin our analysis at this first step.
We must define a score that is "good", and look at how often players cross that barrier. For a start, let's say 50 runs or more makes for a good innings, and look at when batsmen cross that score. For instance, here is Virat Kohli's Test career, with his 50-plus scores in blue.
In 145 innings, Kohli has scored 50 or more runs 49 times. If we look at the grey gaps between his consecutive crossings of 50, the average "wait" between two such good innings is 1.9 innings. This means he scores 50 or more every 2.9th innings on average. We shall use this number to compare players.
Although the average gap between consecutive crossings of any barrier is correlated with the batting average, it is an enlightening way of looking at things: for how long do good batsmen "fail" on average before making a good score again?
Let's first look at the average gap between starts, which I define as crossing 20 runs. Here are the best batsmen by this measure, considering players with 5000 or more Test runs. Just to clarify again, a "mean gap" of 2.0 means the player reaches 20 every second innings in his career on average.
Gap between 20+ scores (Innings)
Name RPI Mean Gap
 DG Bradman  87.45  1.34
 JB Hobbs  53.04  1.38
 WR Hammond  51.78  1.45
 RB Kanhai  45.45  1.49
 SPD Smith  55.17  1.49
 KF Barrington  51.95  1.50
 MEK Hussey  45.51  1.52
 L Hutton  50.51  1.52
 ML Hayden  46.88  1.52
 GS Sobers  50.20  1.53
 IVA Richards  46.92  1.56
 AB de Villiers  45.89  1.57
 Javed Miandad  46.73  1.57
 CH Lloyd  42.94  1.58
 RB Richardson  40.75  1.58
 KC Sangakkara  53.22  1.59
 JH Kallis  47.46  1.59
 R Dravid  46.46  1.59
 RT Ponting  46.61  1.59
 DI Gower  40.35  1.61
As we can see, the best players get starts about every 1.5 innings on average. Don Bradman, as usual is in his own league, at 1.34. Test players with the highest averages dominate the top of this table, as is expected. Michael Hussey, despite having a lower average than the table toppers, scores high on this measure.
What is this gap when we raise the barrier to 50 runs, which can be considered a successful innings across situations and conditions?
Gap between 50+ scores (Innings)
Name RPI Mean Gap
 DG Bradman  87.45  1.86
 KF Barrington  51.95  2.33
 SPD Smith  55.17  2.35
 JB Hobbs  53.04  2.37
 Misbah-ul-Haq  39.56  2.45
 KC Sangakkara  53.22  2.53
 KD Walters  42.86  2.60
 JE Root  44.96  2.63
 IVA Richards  46.92  2.63
 KS Williamson  46.26  2.64
 L Hutton  50.51  2.65
 JH Kallis  47.46  2.67
 GS Sobers  50.20  2.70
 Mohammad Yousuf  48.27  2.70
 Inzamam-ul-Haq  44.15  2.72
 SM Gavaskar  47.30  2.73
 AB de Villiers  45.89  2.76
 SR Tendulkar  48.39  2.77
 GS Chappell  47.09  2.78
 RT Ponting  46.61  2.79
 BC Lara  51.52  2.79
 MEK Hussey  45.51  2.81
 DA Warner  46.74  2.85
 V Kohli  49.93  2.86
 Javed Miandad  46.73  2.86
 R Dravid  46.46  2.89
 RB Kanhai  45.45  2.91
 DCS Compton  44.33  2.91
 S Chanderpaul  42.38  2.92
 AR Border  42.17  2.92
Two notable new entrants in this table are Misbah-ul-Haq and Joe Root, both pillars of their middle orders. Despite having lower runs-per-innings figures than others on the list, they cross 50 every 2.45 and 2.63 innings respectively, which is a testament to their reliability. Doug Walters went past 50 in close to 40% of his knocks, and it shows in his high rating here. Brian Lara and Kohli are placed towards the bottom of this top-30 list, though their averages are high; that indicates their erratic scoring, with high peaks.
However, just looking at the average gap between good innings does not tell the full story. The actual spread of these high scores tells us more: how regularly a batsman has scored well. The average gap tells us the frequency of good scores, but what if those scores occur irregularly in a career? A batsman who crosses 50 with regularity is more consistent compared to one who goes through patches of very frequent scoring, mixed with lean patches of strings of low scores.
To illustrate this, let's look at two simple hypothetical careers of 20 innings each: Pauli and Erwin. Each has crossed 50 ten times, but the patterns of their scoring are very different. In the graph below, a 50-plus score is denoted by a blue bar, and all other innings are blank. We can see how Erwin's scores occur more haphazardly than Pauli's, who has crossed fifty with perfect regularity.
Both careers have the same average gap between crossings, which is every second innings. However, if we look at the sizes of the gaps themselves, we see a difference in the players' spread of good scores.
For Erwin, the gap between the first two fifties is two innings (he did not cross that barrier in innings three and four). His subsequent gaps are 0, 2, 0, 0, 5, 0, 0, 0 innings. On the other hand, the gaps for Pauli are all one innings each. Visually, it is simple to see how Pauli's career is more consistent.
Is there a way to quantify this? To boil it down to one number, we calculate the standard deviation of the gaps. We add up the square of the distance of each gap from the average gap, and take the mean of that. Remember, the average gap is one innings for both, since both cross 50 every second innings.
For Erwin, this would look like:
( (0 - 1)^2 + (2 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 + (5 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 ) / 9
We then take the square root of that. For Erwin, this gives us 1.6. For Pauli, it gives us 0.
Simply put, the "spread" of these gaps is higher for a more inconsistently constructed career, given the same average gap.
This example tells us that the spread of gaps divided by the average gap is a good measure of the regularity of scores in a career.
To summarise the method: we first define a barrier and label the crossing of that barrier as a good innings. We take the gaps between these and see how much they fluctuate. Let's plot these two quantities for all Test batsmen with more than 5000 runs, taking 50 runs as our barrier.
This shows the spread of these gaps is highly correlated with the average gap. To account for how much a batsman's scoring pattern actually deviates from this relationship, we divide the spread by the mean gap. This tells us how evenly scores are spread relative to a player's own frequency of scoring.
Remember that in our Pauli-Erwin example as well, it was important that they both had the same average gap. So we have to divide this spread by the average gap if we want to meaningfully compare batsmen.
Let us call this measure the Spread index, SI, which is obtained by dividing the standard deviation of gaps by the mean gap.
For the 96 batsmen under consideration, here is a table of those with the best SI, considering the barrier for a good innings to be 50 runs.
Batsmen with the best SI (for 50+ scores)
Name RPI Mean Gap Spread SI
 DG Bradman  87.45  1.86  1.08  0.58
 GS Chappell  47.09  2.78  1.73  0.62
 RB Kanhai  45.45  2.91  1.87  0.64
 JB Hobbs  53.04  2.37  1.54  0.65
 Younis Khan  47.41  3.16  2.08  0.66
 JE Root  44.96  2.63  1.75  0.66
 KP Pietersen  45.20  3.12  2.09  0.67
 KS Williamson  46.26  2.64  1.77  0.67
 SPD Smith  55.17  2.35  1.58  0.67
 MP Vaughan  38.90  3.92  2.63  0.67
 JL Langer  42.29  3.42  2.32  0.68
 Javed Miandad  46.73  2.86  1.95  0.68
 IR Bell  37.69  3.01  2.06  0.68
 TT Samaraweera  41.38  3.00  2.07  0.69
 MJ Slater  40.55  3.74  2.59  0.69
 DI Gower  40.35  3.58  2.50  0.70
 ME Waugh  38.42  3.12  2.20  0.71
 AJ Strauss  39.53  3.71  2.62  0.71
 DC Boon  39.06  3.57  2.53  0.71
 MD Crowe  41.56  3.37  2.39  0.71
 RN Harvey  44.88  3.02  2.14  0.71
 JH Kallis  47.46  2.67  1.90  0.71
 GS Sobers  50.20  2.70  1.93  0.71
 CA Pujara  45.63  2.95  2.11  0.72
 Mohammad Yousuf  48.27  2.70  1.95  0.72
 L Hutton  50.51  2.65  1.93  0.73
 ME Trescothick  40.73  3.33  2.43  0.73
 A Ranatunga  32.94  3.69  2.69  0.73
 GP Thorpe  37.68  3.28  2.40  0.73
 VVS Laxman  39.03  3.07  2.25  0.73
After Bradman, Greg Chappell scores more than 50 most uniformly. Rohan Kanhai shines with a low mean gap coupled with a low SI; his consistency of scoring brings him up to the third spot. Younis Khan, who has an average of more than three innings between fifty crossing, scores them with relatively high regularity.
A lower SI is generally better. However, since the SI is a function of the mean spread and the deviation, higher values in both will lead to the same SI as a player with lower values in both. The comparison between Younis and Root makes this clear: Root has a much smaller mean gap between fifties and spread, but his SI is the same as Younis'. Hence, for the full picture, we need to show the mean gap as well.
A batsman with low values of both crosses 50 often, while also doing it evenly, and a scatter plot of the two lets us compare the two facets of batting between players.
Kane Williamson and Root are very similar in the pattern of their 50-scoring, and although Steven Smith's relative regularity of scoring is similar to theirs, he crosses 50 more often. On the other hand, Kohli, despite having a high average, goes through highs and lows, which places him north-east of the other three. Cheteshwar Pujara is less prone to patches of good and bad form than Kohli, but his scoring is also slightly less frequent.
Surprisingly, Sachin Tendulkar and Lara are almost at the same frequency, but Tendulkar scored a little less regularly than Lara, which is expected in such a long career.
Finally, let's look at the same plot with a barrier of a century. When it comes to making hundreds, the most noticeable change is Kohli shifting into the elite, and Ken Barrington scoring hundreds highly irregularly.
In fact, we can use his case to illustrate the utility of the Spread Index: Barrington has scored hundreds as often as other elite batsmen, but they have come in bunches, separated by long century-less streaks. This bunching is displayed by his place on the plot: a low mean gap, but a very high SI.