Stats Analysis

# How long can a batsman score poorly before it's more serious than just a matter of bad form?

Why form is more about technique than scores

Kartikeya Date
19-Jan-2021
The fact that cricket is organised as a sequence of adversarial episodes - deliveries - has some interesting implications for the idea of a player's form. First, consider a dead-ball sport like snooker. The direction, speed and spin with which the cue ball is struck determine, according to the laws of mechanics, whether or not the target ball will be pocketed, and whether or not the cue ball will arrive in the desired position for the next colour. The great snooker player's mastery lies in their being able to control the direction, speed and spin of the cue ball with precision.
Unlike snooker, cricket is probabilistic. A batsman can have a perfect technique that is working with robotic precision, and still receive the unplayable ball, which moves off the seam or swings late in its trajectory such that it is beyond the reflexes of even the fastest human to adjust in time. The great fast bowler might bowl the perfect line and length but they could be bowling on an unresponsive, slow pitch, where the batsman has time to play everything off the surface. In cricket, the bowler and the batsman cannot determine or guarantee success. They can only manage the probability of it. The greatest bowlers are those who give themselves the best chance of taking wickets more consistently than their peers. The greatest batsmen are those who give themselves the best chance of scoring runs relative to their peers.
Just how good a chance do the best players in the game give themselves? Glenn McGrath managed more than two wickets in an innings in 43% of the innings he bowled in. Sachin Tendulkar got to 50 in 40% of the innings in which he was dismissed. So if you showed up at a random Test match and watched McGrath bowl or Tendulkar bat, on balance, you would see McGrath take fewer than three wickets in that innings, and Tendulkar score less than 50.
As with all such probabilistic situations, the point of method, technique, tactics and strategy for the batsman and the bowler is to maximise the probability of success. If good form is essentially a well-oiled, properly functioning system of methods, techniques, tactics and strategies, then can it be identified in the scores?
There is the additional fact in cricket that only the bowler gets to begin each play. The batsman can only react to what is bowled. For example, a batsman who is "in form" could still get an unplayable ball early in his innings, and, as commentators often like to point out, "be in good enough nick to get bat to ball". In such an instance, goes the conventional wisdom, being out of form is more likely to produce a play-and-miss than being in form.
Given that the quality of actions do not determine outcomes but only shift the probabilities of outcomes, what does it really mean to be in form? How can we tell if a batsman has been dismissed for a low score because of bad form or despite good form? At what point is a sequence of low scores just part of the expected variation in scores, and at what point does it indicate something further?
One way of examining this is to examine sequences of scores. But first, keep in mind that failure is the norm in cricket. The batsman who averages 50 typically has a median score of about 30 (Virat Kohli's median Test score is 28, while Brian Lara's is 33). This means that a majority of the best players' innings end before their individual score reaches about 30.
Consider a sequence of three consecutive scores. Of these, let "u" be the number of scores below the median for the player, and let "o" be the number of scores of at least the median for the player. The number of possible u|o combinations for three scores is 0|3, 1|2, 2|1 and 3|0. Two combinations involve the majority of scores being below the median (let's call these "under"), and two involve the majority of scores being at least equal to the median ("over"). What is the player's average for the innings following an under sequence, and what is the player's average following an over sequence?
For example, consider the following sequence of 21 scores (the batsman was dismissed each time): 76, 42, 77, 215, 7, 108, 10, 243, 14, 34, 58, 7, 3, 5, 16, 55, 4, 17, 9, 0, 5. The median score here is 16. The scores listed in the "Under (3)" column below are the scores following a sequence of three scores in which at least two are below the median. The scores listed under "Over (3)" are those that follow a sequence of three scores of which at least two are at least as high as the median.
 Under (3) Over (3) 243, 34, 5, 16, 55, 0, 5 215, 7, 108, 10, 14, 58, 7, 3, 4, 17, 9 Average: 51.1 over seven innings Average: 41.1 over 11 innings
For a start, I calculated this for sequences ranging from three innings to 23 innings for all batsmen who scored at least 5000 Test runs and averaged 50. The results are in the table below. The column title "Under - Over 7" indicates that the figure is the difference between the average for the under sequence and the average for the over sequence for sequences of seven scores. A positive figure (with a background in shades of blue) means that the average for the under sequence is higher than the average for the over sequence. A negative figure (shades of red) means that the average for the under sequence is lower.
For individual players, this type of record tends to be noisy. A player like Don Bradman, who tended to produce large scores and had a relatively short career, tends to have high fluctuations in this sort of analysis. But basically, what this record looks at is the pattern of scoring for these elite batsmen.
The last row gives the average of the under-over difference for all the players for each sequence. The pattern of this average of net under-over averages suggests that shorter sequences are less likely to be followed by better scores than longer sequences. There appears to be, at least for these elite batsmen, a threshold sequence length beyond which a reversion towards the player's overall average is more or less guaranteed. This appears to lie somewhere between nine and 15 innings.
If we ignore the identity of the individual batsmen and treat this set of elite successful batsmen (5000-plus career runs, 50-plus career average) as one large chronological batting record, and then organise the record according to length of sequence and not by individual player (see the table below), then the threshold, which lies somewhere between the nine- and 15-innings sequence length in the average of averages above, narrows to somewhere between 15 and 17 innings. Here all sequences of innings for all these elite batsmen are considered together and examined against their combined median.
Note that these are very successful batsmen, and so they don't really ever "fail" as such. But essentially, if a successful, elite Test batsman has a sequence of 17 scores in which at least nine are below the median, then in the next innings he's likelier to be "due a score" (as commentators often say) than if he has had a good run (i.e. crossed his overall median score at least nine times in the previous 17 innings). This is less likely to be the case for sequences shorter than 17 innings, and more likely to be the case for longer sequences.
For example, for innings following a seven-innings sequence in which they fail to reach their median score in the majority of innings, they average 52.4, while in the innings following a seven-innings sequence in which they reach at least their median scores in at least four innings, they average 56.3. Any team in any era in Test cricket would gladly accept either of these numbers!
If we expand the category of batsmen considered to include all batsmen who scored at least 5000 Test runs at an average of at least 35, then the under-over margin narrows, but the under - over figure still turns positive after 15 innings.
A glance at the pattern of the numbers of dismissals in the over and under categories suggests that as the sequence length increases from three to five to seven to eventually 23, the number of sequences in which a majority of scores are below the median declines. The longer the under sequence, the more likely it is to be followed by a big innings. And so, long under sequences are rare, since the longer the sequence, the more likely it is to already include a number of large scores. Note that as the length of the sequence approaches the total number of innings, the number of innings above and below the median approach equality.
The player's ultimate average over the large number of Test caps they end up winning represents the player's quality. Lean scoring periods and big scoring periods are just deviations from this mean. Do these deviations signal a shift in form? The record of the performance on the n+1th innings suggests not. Rather, it is external factors that shape the probabilities of outcomes in the encounter between bat and ball that seem to be dominant.
Even the best batsmen fail to reach 50 in a majority of their innings. This obviously does not mean that the best batsmen are out of form in the majority of their innings. Nor does it mean that the peculiar shortcomings of a particular batsman (like, say, Daryll Cullinan's difficulties against Shane Warne in particular and wristspin more generally) are in play in the majority of their innings. What appears as a good year (let's say, Mohammad Yousuf's world-record year, or Viv Richards' dominant 1976), should be thought of as a year in which the player's shortcomings were not in play compared to other years. What appears as a bad year is one in which the player's peculiar shortcomings were particularly in play. Or it could just be due to an especially good or bad run of fortune. For instance, India were bowled out for 36 after only 32 not-in-control deliveries in the third innings in Adelaide last year, but survived 132 not-in-control deliveries at a cost of only five wickets in the fourth innings in Sydney. If a player falls to one of his first mistakes more often than usual in a given year, he's unlikely to finish high in the averages.
Form can be identified in terms of technique. For example, changes to a batsman's footwork, or the shape of the backlift or trigger movement are observable, and experts like former players and coaches are constantly looking for these things. These can affect run output, but the very fact that they are identifiable by commentators on television suggests that they must be equally evident to the players and their colleagues in elite Test teams. These things can also be ironed out relatively quickly, often in a matter of three or four days, in the nets. At times technical issues might creep into a batsman's game and create problems for them in, say, India, but perhaps not in England.
The record suggests, though, that if a batsman isn't able to reverse a bad run of scores in 15-17 innings (roughly eight-ten Tests), then the problem is likely to be serious and the bad run of scores is likely to be about more than just the natural variation in scoring patterns brought about by a combinations of factors like the quality of opposing bowling and the pitch conditions. Then it may be time to worry. A player's form ought to be judged not by the scores but by observing their patterns of movement and technique. This requires an expert eye. The question of form is in the domain of technical experts, not the competitive record.

Kartikeya Date writes at A Cricketing View. @cricketingview