Series  Countries  Live Scores  Fixtures  Results  News 
Features

Photos  Video & Audio  Blogs  Statistics  Archive  Shop  Mobile 
A couple of years back I did a twopart analysis on Test player consistency. You can access the batsmenspecific article here. You have to move to the top of the page to view the article. Overall, it was well received. The analysis was based on a "slice concept". I split the careers of Test batsmen into slices of ten innings and looked at consistency across these slices. As many readers had expressed therein, this went past the unit of innings, which is the most important measurable contribution of a batsman. It also allowed a batsman to be very inconsistent within a slice but come out with acceptable numbers for the slice.
I realised that I have to do the batsmen consistency work with innings as the base, not even a Test. Based on Tests, a batsman could come out roses in the consistency stakes by scoring a 100 and 0. Perfect for the Test but way off as far as innings are concerned.
Let me remind the readers that I will not do any article which is not understood by 90% of the readers. These articles may not come through the statistical validations test but have to be based on common sense and understood by most of the readers. So there will not be any Zfactors or skewness coefficients, or whatever else it is that statisticians look for. Do not look for these in this article and complain about the absence of the same.
First, let me say that the score distribution for almost all batsmen is skewed (note only a verb is used) to the left. An established batsman's lowest score is 0 and the highest score could be anything from, say, 200 to 400. His mean score is around 50. This means that he would have more scores below the mean than above. This is what I meant by being skewed to the left. For the selected population of 200 batsmen, the average percentage of scores above the mean is only 35%. The highest is for Bruce Mitchell with 44.9% and the lowest is for Marvan Atapattu with 29.1%. So this is way away from a normal distribution and we have to adopt special methods to analyse the scores.
What is consistency? OED says: The quality of achieving a level of performance that does not vary greatly in quality over time. DicCom says: Agreement or accordance with facts, form, or characteristics previously shown or stated. FreeDic says: Reliability or uniformity of successive results or events. So what we are looking at is uniformity of performance, absence of surprises, reduction in number of outliers and probably clustering of performances towards the central positions.
Taking a pair of scores, it is clear and obvious that a 100 and 0 is woefully inconsistent, an 85 and 15, quite inconsistent, a 70 and 30, reasonably consistent, a 60 and 40 quite consistent and a 50 and 50 the pinnacle of consistency. For this analysis it does not matter if the 100 was scored masterminding a successful 150 for 9 chase or part of a 700 for 3 score in Faisalabad. Let us see how we can move forward on this premise.
Let us assume that this is a threeTest series and the eight batsmen below have played five innings each. All these batsmen have scored 250 runs in the series and are averaging 50. Let us get a handle on their consistency by perusing the scores, rather than through any mathematical methods.
A; 25@ 45@ 50@ 60@ 70@ (5) B: 10 45@ 55@ 65@ 75@ (4) C: 25@ 30@ 40@ 55@ 100 (4) D: 5 30@ 45@ 75@ 95 (3) E: 0 10 40@ 60@ 140 (2) F: 0 30@ 40@ 80 100 (2) G: 5 10 20 50@ 165 (1) H: 0 0 10 110 130 (0)
A is the epitome of consistency and can be called Mr Consistent (with apologies to Michael Hussey, the original Mr C). No really low or high score.
B and C can be called very consistent. B has got one low score and C, one high score. The other four are in the consistency zone.
D is consistent. There are two outliers: one on each side. Three are in the zone.
E and F can be called somewhat inconsistent. Only two of the five scores are in the consistency zone, i.e. in the middle.
G is quite unpredictable. Four of his scores are outliers. Tough to expect what his next score would be.
H is so inconsistent that we have no clue what he will do. A duck or 100 might come off his bat next.
Even though I used only a visual inspection while determining the consistency levels of these batsmen, we are beginning to get a handle on what analytical method can be used to determine consistency of a batsman. The key phrase is "consistency zone", which I used couple of times in these sentences.
Let me make a brace of somewhat sweeping statements and justify these later.
Define a consistency zone for each batsman and check how many of his innings are within this zone. The higher the percentage of innings within the consistency zone, the more consistent the batsman was.
There is nothing intrinsically wrong with this statement. There is no attempt to define a consistency zone across batsmen. This postulate accepts that the basis for consistency determination for Don Bradman would be totally different to the same for Habibul Bashar. It is dynamic and will accommodate significant changes across the career of batsmen. It could be applicable to selected parts of a batsman's career. So we seem to be on a very nice wicket.
The only problem seems to be to define a valid consistency zone, hereafter called Con_Zone. There is no mathematical solution. If one exists, I would not understand it myself and cannot explain the same in simple words to the readers. So I have to use common sense and the cricketing knowledge acquired over the years.
The one point I am certain is that for this exercise, the batting average cannot be used as the basis. Especially when I am going to say that 400* or 257* are two of the greatest outliers ever, what is the point of adding these runs but not the innings played? I have to use a Runs per innings (RpI), but a slightly modified one, RpxI, after taking care of the next bone of contention, the notouts. I will come to this later, after explaining the basis for Con_Zone.
After days of trials and evaluating aggregates of various measures, I have defined Con_Zone as the range of scores that falls between 50% of RpxI to 150% of RpxI. It is dynamic and varies according to the batsman's career performance. It gives me an exact RpxI width of scores, enough to give very high confidence level while proclaiming a batsman's consistency or lack of.
Three examples  Bradman's Con_Zone ranges between 44.4 and 133.3. Ken Barrington's Con_Zone ranges between 26.7 and 80.1. Habibul Bashar's between 15.3 to 45.8. While looking at these examples, do not forget that a 365 or 293 is as much of an outlier as a 0 or 1.
Now for the notouts. My first article in the Cordon was called "The vexed question of not outs in Test cricket". Unfortunately, I could not view the comments and respond to those because of certain technical issues. But I knew that there were arguments for and against my suggestion of extending the notout innings by his recentform runs. A revolutionary idea it was but some of the respondents felt that there was really no problem and I was trying to solve a nonexistent problem. They were probably correct. Some felt that the RpFI, described below, was an arbitrary number.
It is clear that the notouts have to be addressed properly. Let us take Garry Sobers with a basic RpI value of around 50. His 178* or 365* are clear outliers and have to be considered as valid innings. His 50* has to be considered, as a perfect innings, along with his 50. His 33* is considered since this is within the Con_Zone. His 5 or 8 are clear outliers and cannot be ignored. But what about the 16*? It is not fair to Sobers if we take this innings as one falling outside the Con_Zone (25.6 to 75.7). He could have scored 34 more runs or 134 more. On the other hand we cannot certify that this falls within the Con_Zone. He could have been out next ball.
In the article I have referred to, I also developed an alternate and simpler concept of considering only fulfilled innings(FI). These are the notouts above 50% of the RpI and all dismissals. It was an elegant and simple method.
Incidentally Milind has tackled the question of notouts in his excellent blog, which takes cricket analysis to a higher level. He has tweaked the RpFI, which I had created for the said article and created a further adjusted RpI, called µ, by mapping all notout innings based on their values. It is a lovely idea and the reader could get the complete information on this tweak and other fascinating analyses. Once you are there his earlier articles on Geometric Mean, Bradman's innings and the like can be viewed.
However, I have decided to stick to my RpFI concept since it is simpler and this is only a Batsman Consistency analysis. Like a perfect Lego block fitting, the beginning of the Con_Zone is pegged at 50% of the RpI value. So I have come to a (hopefully Solomonic and not Tughlaqian) decision for this analysis. I will ignore all notouts that are below the lowend of the Con_Zone (50% of RpI). These will be excluded from the innings count, RpI determination and consistency determination.
I can hear those knives being sharpened. Before you take those off the scabbard, look at it carefully. No batsman loses out. Sobers' 16* would be outside the consistency calculations, that is all. He will neither benefit nor be hampered. No assumption of any sort has been made regarding his innings. There are no magic numbers. The RpI, if anything, will only be slightly boosted. So any reader who is offended by this, if he takes a minute to think laterally, will see the soundness behind this tweak. And let us not forget, it is uniform but customised and dynamic treatment for all batsmen.
The final justification. For the 200 batsman considered, there are 26,172 innings and of these the excluded special notouts are just 642, a mere 2.4%. So there is a negligible impact on the numbers but a considerable improvement in the soundness of calculations.
The cutoff is 2684 runs. What? Such an odd number! Before anyone says that I have done this to exclude or include any specific player, let me say that my initial cutoff was 3000 Test runs. Twothousand, I felt, was too low since only around 3040 innings would have been played. Threethousand meant that a reasonable number of innings, well over 50, would have been played.
However, when I did a run with 2500, I suddenly found out that a new batsman started dominating the tables. That was Dudley Nourse. His numbers were way out and I felt that his inclusion would set a benchmark for other batsmen and would validate the approach taken very effectively. But he had scored only 2960 runs. Hence I lowered the cutoff to 2950 Test runs. After all it is my analysis. Finally I decided that instead of having runs as cutoff, I would select the top 200 run scorers. So the population size determined the cutoff. Hence the number 2684. Mark Burgess was the last batsman to get in. In the bargain, Glenn Turner, MAK Pataudi, Norman O'Neill, Stan McCabe and Keith Miller got in. Not a bad lot to look at.
Let us move on to the tables. I have also plotted the graph for five interesting batsman to get a visual idea of how the Consistency Index works.
No  Batsman  LHB  Ctry  Tests  Inns  NOs  Runs  Avge  AdjInns  AdjRuns  AdjRpi  ConsZone Range  ConsZone Inns  ConsIndex 

1  AD Nourse  Saf  34  62  7  2960  53.82  60  2924  48.73  24.4 to 73.1  31  51.7%  
2  WW Armstrong  Aus  50  84  10  2863  38.69  81  2833  34.98  17.5 to 52.5  38  46.9%  
3  BF Butcher  Win  44  78  6  3104  43.11  77  3097  40.22  20.1 to 60.3  34  44.2%  
4  H Sutcliffe  Eng  54  84  9  4555  60.73  82  4541  55.38  27.7 to 83.1  36  43.9%  
5  VL Manjrekar  Ind  55  92  10  3208  39.12  90  3208  35.64  17.8 to 53.5  39  43.3%  
6  JB Hobbs  Eng  61  102  7  5410  56.95  98  5348  54.57  27.3 to 81.9  42  42.9%  
7  CC Hunte  Win  44  78  6  3245  45.07  75  3223  42.97  21.5 to 64.5  32  42.7%  
8  WR Hammond  Eng  85  140  16  7249  58.46  137  7234  52.80  26.4 to 79.2  58  42.3%  
9  Imran Khan  Pak  88  126  25  3807  37.69  119  3741  31.44  15.7 to 47.2  50  42.0%  
10  ER Dexter  Eng  62  102  8  4502  47.89  100  4497  44.97  22.5 to 67.5  42  42.0%  
11  CC McDonald  Aus  47  83  4  3107  39.33  81  3099  38.26  19.1 to 57.4  34  42.0%  
12  IJL Trott  Eng  49  87  6  3763  46.46  86  3746  43.56  21.8 to 65.3  36  41.9%  
13  RB Richardson  Win  86  146  12  5949  44.40  140  5930  42.36  21.2 to 63.5  58  41.4%  
14  SR Watson  Aus  52  97  3  3408  36.26  97  3408  35.13  17.6 to 52.7  40  41.2%  
15  IR Redpath  Aus  66  120  11  4737  43.46  119  4725  39.71  19.9 to 59.6  49  41.2%  
16  RC Fredericks  L  Win  59  109  7  4334  42.49  107  4328  40.45  20.2 to 60.7  44  41.1% 
17  ND McKenzie  Saf  58  94  7  3253  37.39  90  3218  35.76  17.9 to 53.6  37  41.1%  
18  AB de Villiers  Saf  92  154  16  7168  51.94  149  7114  47.74  23.9 to 71.6  61  40.9%  
19  RB Kanhai  Win  79  137  6  6227  47.53  133  6188  46.53  23.3 to 69.8  54  40.6%  
20  DI Gower  L  Eng  117  204  18  8231  44.25  201  8184  40.72  20.4 to 61.1  81  40.3% 
21  PJL Dujon  Win  81  115  11  3322  31.94  113  3310  29.29  14.6 to 43.9  45  39.8%  
22  GS Sobers  L  Win  93  160  21  8032  57.78  156  7981  51.16  25.6 to 76.7  62  39.7% 
23  TW Graveney  Eng  79  123  13  4882  44.38  121  4872  40.26  20.1 to 60.4  48  39.7%  
24  GM Turner  Nzl  41  73  6  2991  44.64  71  2968  41.80  20.9 to 62.7  28  39.4%  
25  AJ Strauss  L  Eng  100  178  6  7037  40.91  175  7017  40.10  20.0 to 60.1  69  39.4% 
26  KF Barrington  Eng  82  131  15  6806  58.67  127  6778  53.37  26.7 to 80.1  50  39.4%  
27  L Hutton  Eng  79  138  15  6971  56.67  134  6916  51.61  25.8 to 77.4  52  38.8%  
28  GC Smith  L  Saf  117  204  12  9266  48.26  201  9248  46.01  23.0 to 69.0  78  38.8% 
29  RJ Hadlee  L  Nzl  86  134  19  3124  27.17  129  3100  24.03  12.0 to 36.0  50  38.8% 
30  AW Greig  Eng  58  93  4  3599  40.44  93  3599  38.70  19.3 to 58.0  36  38.7% 
Most consistent batsmen: When readers peruse the tables they will realise why I was so enthused about Dudley Nourse. Let me present his career numbers. 62 innings. The mean score was 48.7 allowing the Con_Zone range of 24.4 to 73.1. This entire range is indicative of acceptable scores. Two scores, 17* and 19*, are ignored. Nourse has 31 scores in the Con_Zone. He is the only batsman to have more scores inside the Con_Zone than outside it. If this is not consistency, that too across 16 years, I am not sure what is. He has two doublehundreds but the next highest score is 149. That explains his excellent Con_Index.
Herbert Sutcliffe and Jack Hobbs are almost inseparable even in this analysis, as they were on the field. For Sutcliffe, two unbeaten innings, viz., 1* and 13*, are excluded. For Hobbs, four innings, viz., 9*, 11*, 19* and 23*, are removed. Otherwise, look at how close their numbers are. Very similar Con_Zone ranges (~20 to ~80). Con_Index coming at well above 42%. These are their individual numbers. How well they would have performed together. Right at the top, as far opening pairs are concerned.
Wally Hammond, who followed Hobbs and Sutcliffe, has similar figures. His Consistency Index is also well above 42%. The top 20 of the table features batsmen who have Consistency Index values above 40%. This includes some unlikely batsman. Who would have expected the flamboyant Kanhai to have a fairly high value of 40.6%. David Gower is another surprise 40+% batsman featured here. Sobers and Barrington are two toplevel batsmen standing at just below 40%.
Contemporary batsmen: For all the problems he has faced recently, Trott is the most consistent of the contemporary batsmen. Thirtysix of his 86 qualifying innings are within the Con_Zone, giving him an index value of 41.9%. Watson might not have scored many hundreds but he is certainly high on the Consistency Index value table, with 41.2%. His Con_Zone range is, of course, lower at 1853. He is expected to deliver at lower levels.
Since Watson and Trott have played fewer matches, AB de Villiers' lays claim to be the most consistent current batsman. This is borne out by his recent recordbreaking form. His exclusions are 4*, 4*, 8*, 19* and 19*. He has 61 innings within the Con_Zone range of 2579, out of 149 qualifying innings. This gives him a high Consistency Index of 40.9%. Any number above 35% is very good and anything above 40% is outstanding.
Strauss with 39.4% and Langer, with 38.2% are in the top40.
Summary of a few top batsmen: Many top batsmen are not even in the top 50 of the table. Hence I have summarised the Consistency Index of a few top batsmen. Bradman is way down the table with a barely acceptable index of 30.8%. This is understandable since 15% of his innings are above 200 and there have to be compensating low scores.
Sachin Tendulkar's index value is a fairly low 31.2%, Brian Lara's is slightly better at 33%, Rahul Dravid at a relatively high 37.2%, Kumar Sangakkara is similarly placed at 36.8%, Ricky Ponting at a low index value of 32.9%, Jacques Kallis at a moderate 34.4%, and finally Sunil Gavaskar, at a very low 30.5%. To those who are surprised at the last figure, let me remind readers that Gavaskar was a poor starter and had 55 singledigit dismissals. And these have been balanced by 12 150plus scores.
No  Batsman  LHB  Ctry  Tests  Inns  NOs  Runs  Avge  AdjInns  AdjRuns  AdjRpi  ConsZone Range  ConsZone Inns  ConsIndex 

191  NS Sidhu  Ind  51  78  2  3202  42.13  78  3202  41.05  20.5 61.6  21  26.9%  
192  MJ Clarke  Aus  105  180  20  8240  51.50  176  8182  46.49  23.2 69.7  47  26.7%  
193  DL Amiss  Eng  50  88  10  3612  46.31  84  3569  42.49  21.2 63.7  22  26.2%  
194  TT Samaraweera  Slk  81  132  20  5462  48.77  127  5407  42.57  21.3 63.9  33  26.0%  
195  HW Taylor  Saf  42  76  4  2936  40.78  74  2908  39.30  19.6 58.9  19  25.7%  
196  C Hill  ~  Aus  49  89  2  3412  39.22  88  3402  38.66  19.3 58.0  22  25.0% 
197  JR Reid  Nzl  58  108  5  3428  33.28  108  3428  31.74  15.9 47.6  27  25.0%  
198  MN Samuels  Win  51  90  6  2983  35.51  88  2968  33.73  16.9 50.6  22  25.0%  
199  Mansur Ali Khan  Ind  46  83  3  2793  34.91  82  2779  33.89  16.9 50.8  19  23.2%  
200  Ijaz Ahmed  Pak  60  92  4  3315  37.67  90  3287  36.52  18.3 54.8  18  20.0% 
Now for the other end. The most interesting in this lot is Michael Clarke, with a really low index value of 26.7%. That means that just about one in four fulfilled innings have been within the Con_Zone range of 23 to 70. His exclusions are 6*, 14*, 17* and 21*. The fact that there are 16 other notouts has also contributed to this. He has had 47 singledigit dismissals and ten 150plus scores do not help.
Dennis Amiss, Clem Hill and Mansur Ali Khan are two prominent batsmen in this group. Let us look at the most inconsistent batsman amongst the selected 200  Ijaz Ahmed. Look at the Consistency Index. It is a very low 20%, which means one in five innings are within the Cons_Zone of 1854. He has only 18 innings in this group, out of a total of 90. Not surprising considering the fact that 33 innings, out of 90, a whopping 37%, are singledigit dismissals. No doubt compensated by 12 hundreds.
Now for a few graphs. The graphs are plotted in increasing order of scores. Only the fulfilled innings are plotted. Also the Con_Zone and mean are shown.
Con_Zone graphs for Don Bradman, Dudley Nourse and Ijaz Ahmed © Anantha Narayanan
Enlarge

Let us look at the graphs of three batsmen. Bradman is the king, albeit an inconsistent one, Nourse is the most consistent and Ijaz, the least consistent.
In Bradman's case, the reason for the inconsistency is very clear. Look at those seven zeroes and seven singledigit dismissals. At the other end, we have huge peaks relating to those 18 150plus scores. All pointing to nummerous innings of total domination or dismissals within the first hour. Perfect candidate for a high degree of inconsistency.
Look at Nourse's graph. Look at the way the graph moves up quickly and the width of the Con_Zone. He has had 13 singledigit dismissals but many intermediate scores. There are not many peaks. Confirmation of a very high degree of consistency. All these lead to a Consistency Index of over 50%. Very few innings are below 10.
Now for the other end. Look at the width of the Con_Zone of Ijaz . Especially look at the number of low scores. More than the peaks on the right hand side of the Con_Zone, it is the number of low scores which leads to a wholly inconsistent career. There are many innings below 10.
Con_Zone graph for Michael Clarke © Anantha Narayanan
Enlarge

This graph depicts the career of Clarke, the most inconsistent current batsman. This is in a way similar to Ijaz's graph. A very high number of innings to the lefthand side of the Con_Zone and significant number of innings to the right side. The width of the Con_Zone is quite low at only 26%. Look at how many innings are below 10. This is borne out by the fact that Clarke has scored four hundreds, three huge ones, in his last 11 Tests. The other 18 innings are 30 or lower.
Con_Zone graph for Wally Hammond © Anantha Narayanan
Enlarge

Hammond has been a very consistent player with an index value of 42.3%. Let us look at the career graph of Hammond. Look at the width of the Con_Zone. The number of innings within this zone is quite high. On either side there are no great tailoffs. Look at how few innings are below 10.
Con_Zone graph for AB de Villiers © Anantha Narayanan
Enlarge

Finally, the graph for de Villiers. It is far closer to the Hammond graph rather than the Clarke graph. A fairly wide Cons_Zone, just over 40%. All those recent fifties helped. There are not many innings below 10.
The common view is that the sedate, defensive batsmen are more consistent than the attacking batsmen. This unfounded adage has been given a serious jolt in this analysis. When attacking batsmen like Ted Dexter, Roy Fredericks, Rohan Kanhai, Gower, Sobers et al are in the top 40 and defensive stalwarts like Gavaskar, Shivnarine Chanderpaul, Kallis, Mohammad Yousuf, Hanif Mohammad et al are in the lower half of the table, there seems no justification for this axiom.
Couple of messages to my readers.
If you perceive this to be some sort of batsmanranking table and come out with comments such as: "xyz is ranked too high or low", it is your problem, not mine. This is not a ranking list at all. It is an indication of how consistent a batsman was, relative to his own mean. That is all. If this table does not conform to your subjective perception of a batsman's consistency, maybe it is time to change that to an objective perception and not find fault.
If anyone tries to hijack this article into a xyzlauding or pqrbashing exercise, I will be quite ruthless, cutting off such comments right at the top.
I have uploaded the table containing the data for all 200 qualifying players. To download/view this file, please CLICK HERE.
Halfway through the extensive work done on this article I located another very interesting gem that can be used to measure the Test batsman consistency. This was the positioning of the Median score (the middle score). It was clear that the closer the Median was to the Mean (RpI), more innings would be in the central area and this can also be used as an excellent measure of consistency. In fact, for quite some time, I was working with both measures and had also determined a composite Consistency Index. However, covering all these three measures would have made the article twice as long as my previous longest article and would have thrown ESPNcricinfo's publishing mechanism into disarray. I also did not want to miss any of the graphs for both measures. Hence there will be a second part covering the rest of the topic.
As I conclude the article, I read, with sadness, that Gary Gilmour passed away. An outstanding allrounder, Gilmour did not get enough opportunities to play Test cricket and World Series Cricket was also a magnet that drew him away. However, I have a special fascination and appreciation for Gilmour since my first ODI Top 100, released in 2002, had Gilmour's 6 for 14 in the 1975 World Cup semifinal as the best ever ODI bowling performance. This performance, and Richards' 189*, have topped every one of the ODI performance analysis tables that I have done during the past 12 years. Others might have done more, but this single performance will always make Gilmour stand out. Gilmour: R.I.P. My thoughts are with the bereaved family.
Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratingsrelated systems
Feeds: Anantha NarayananKeywords: Stats
© ESPN Sports Media Ltd.
 
Comments have now been closed for this article 

Anantha Narayanan
Anantha spent the first half of his fourdecade working career with corporates like IBM, Shaw Wallace, NCR, Sime Darby and the Spinneys group in ITrelated positions. In the second half, he has worked on cricket simulation, ratings, data mining, analysis and writing, amongst other things. He was the creator of the Wisden 100 lists, released in 2001. He has written for ESPNcricinfo and CastrolCricket, and worked extensively with Maruti Motors, Idea Cellular and Castrol on their performance ratingsrelated systems. He is an armchair connoisseur of most sports. His other passion is tennis, and he thinks Roger Federer is the greatest sportsman to have walked on earth.
On consistency, I'd think the Standard Deviation of a sample of averages/adjusted averages (such as against different countries, home and away, won and lost and drawn games etc.), with a minimum number of innings for each such variable (say 10 innings) as a qualifier, would best capture this measure.
I also see great merit in the (HQMedian)/(MedianLQ) measure, with values closer to 1 indicating higher consistency.
Perhaps some combination of the two measures could be worked out.
Looking forward to the bowlers' consistency measure now :)
[[
When I do the Median work I will look at your (HQMedian)/(MedianLQ) measure as an additional index. On Not outs, I prefer simpler methods which any reader can adopt.
To me my nonfulfilled innings method is the embodiment of simplicity. The idea is sound. The impact is negligible. I am only saying that an 18* is not considered for this work, that is all. The impact on averages is normally in the second decimal.
Ananth
]]
Hi Anantha, Good to see a new article on this topic. I remember the comments on the old article on batsmen consistency, which went down the path of intense statistical discussions around skewness coefficients and a formula of the kind (HQMedian)/(MedianLQ) to measure consistency.
On addressing the issue of notouts, I will repeat a suggestion I had offered on your article "ODI Batting giants Part 1":
1. Substitute each not out score with the mean/median of all dismissals at scores above the not out score (I prefer the median) 2. If the highest score is a not out score, take it as is. 3. Add all such scores to the total runs scored in dismissals. 4. Derive the Adj Average using value in 3. above as the numerator, and all innings played in as denominator.
contd..
Excellent post Ananth  as usual  back to where our heart lies  white flannels!I filtered for runs per innings>40. Got 100 entries (exactly half the population had actually an ave > 40!). Funny part was the top10 were still more or less the same folks (as listed here). But after the top 10, there are so many jumps (the 100th in the list of 40+ RPI is 194th in the overall list  Samaraweera)  which means that its one thing to be consistently making 1520 but yet another to make 40+ across at a good frequency. And those with Comf zone closer to 2575 had wide variations. Making consistently good scores (as against to scoring consistently) is definitely a huge ask. At a 40% consistency cutoff, there are only 13 (Gower at 13, 20th overall). From India, Sehwag over VVS and VVS over SRT shows how unsung these folks are.
[[
This filtered list of 13 is a creme de la creme. 40+ in both takes away any low RpI batsmen. Not many, since out of those 20 batsmen with 40% cons index, only 7 have less than 40 as RpI. This list is Nourse, Butcher, Sutcliffe, Hobbs, Hunte, Hammond, Dexter, Trott, RB Richardson, Fredericks, de Villiers, Kanhai and Gower. A combination of a few really great batsmen, a few notsogreat batsmen and a few attacking batsmen. Trott and de Villiers !!! And look at the number of Englishmen and West Indians: 6 and 5 respectively. And 2 South Africans. No Australian, Indian or Pakistani batsmen. Thanks for a nice interpretation.
Striking a different note, Test cricket lives: and how? England must rue the missed opportunities: batting beyond a lead of 350 and the 14 overs not bowled. Sri Lanka must rue their batting collapses in both innings. Also a great advertisement for DRS. I am just visualizing a situation of India at 200 for 9 with two balls to go, Shami Ahmad nicking a ball on to the pad and being given out. I wish that happens so that DRS is given a leg up.
And Sangakkara is fifth in the alltime average table, with 4000 runs cutoff, above Sobers, Hammond, Hobbs and Hutton.
Ananth
]] I think we could have this list with some of your old blogs (Bowling Quality Index) against tough attacks / Tough conditions. Let me see if I can get something out.
Ananth, while I agree that innings like 365, 375 and 400 are outliers, to get to that final score the batsman did achieve the conszone. Lets say a batsman having a conszone of 2575 has three scores of 30,35 and 55 and in the 4th inns he scores a 200. From a consistency point of view I think that 4th inns should qualify in the No of conszone inns and conszone index since he achieved his conszone in that 4th inns. Just that he went on to score much more. Secondly to get a serious list of batsman i think there should be a cutoff for RPI maybe around 25 (would filter out batsmen like Kirmani, Parore, Vaas, Akram, Warne) or even 30 (would filter out even Kapil, Hadlee, A Campbell). All these are great cricketers in their own respect.
[[
Santosh
In essence you are asking me to do a Batsmanlowscore analysis. My ideas are already running in that direction. As I see the numbers more and respond to comments, I realize that I keep on talking about nonfulfilled innings, singledigit dismissals and so on. All these relate to low scores. Maybe a comprehensive study of the subCZ_Start innings is what is needed.
Re the second point, there is nothing wrong in Hadlee, with a sub30 RpFI, occupying the excellent 29th place. After all in his own customized Con_Zone, he has nearly 40% scores.
Ananth
]]
One more comment. The current measure is very simple and I like it. But one thing is misses is the aspect of time. In this analysis you just count the number of innings in the zone and I think this is enough from a retrospective point of view. However, using this, you cannot arrive at a reasonable expectation when an active batman is coming to crease. What I would be interested is how consistent is he in the recent past. Maybe in last one year. So, this would be another angle to which you might have to look into. Maybe doing the above analysis for a running block of oneyear inningses would be enough.
[[
Out of these 200 batsmen probably 185 are already retired. So your idea, while it is a nice one, can be applicable only to those current batsmen. As I have already mentioned, this can be applied with no loss of value, to a period of time. Clarke might be 26% but over the past year he has been better at 32.6%. de Villiers, who is already quite high at 41%, has been extraordinary during the last year and is past 50%. Such statements will let us get insights into how players arrive into important series.
Ananth
]]
Very nice article Ananth! I agree with some of the guys that peak of certain players should be taken, rather than the whole playing life. Results become more and more skewed, as a player starts playing more and more cricket. You can't compare consistency of a player who has scored just 3000 runs to that who has scored more than 10000.
[[
Yes, it will be good. The key question is to define the peak. Probably the idea would be to try what I did in the Streak analysis. Take Bradman as the base. Look at 50 Tests (Bradman batted in only 50 Tests). For each batsmen determine his peak subcareer, of 50 Tests, dynamically and then do a Consistency analysis for this period. Lot of tricky work but can be done..
Ananth
]] One may also think about taking into consideration bigger innings as consistant, specially considering, Bradman's record! Moreover, will be waiting for such an article for ODIs as well! Jay
Anantha, it seems to me that Con_Zone = RpxI ± 50% is quite wider. For a starting point, it is good. But this zone may be subdivided into ±10%, ±20%, ±30%, ±40% & ±50% like the Archery FITA Target, RpxI ± 10% being the bull's eye zone. Can you please add respective columns for these, so that we can see who has the highest % of innings in RpxI ± 10% band? You may do this in your next article also, if you like this idea. Alternatively, once you have calculated a batsman's RpxI, would using SD be very much technical so that none will understand? It will be easy, Lower SD = Greater Consistency & Higher SD = Lower Consistency. Arnab, Kolkata
[[
Arnamb, my fear is that SD will not work at all since even the RpFI is somewhere in the RpI/Average range and the higher values will be way off. I calculated the SD using the Mean (RpI) and it was very high. (050)squared is fine but (40050)squared is very high. The distribution is skewed too much to the left for SD to have any relevance.
Splitting the Con_Zone into 5 zones might lead to too many numbers. Maybe RpFI ±15%, ±30%, ±50% would be better. I might not do this now because I cannot add anything to the article. I can do that with the followup articles. If nothing else, I can do and upload the tables.
Ananth
]]
Perhaps the wording of my comment was a bit strong because of my incredulity that no one had caught on to it  on cricket blog ! My apologies.
[[
No need for any apologies. You have not crossed any line.
Ananth
]] Re. " Kim Hughes scored 89 when the going was good at Headingley in 1981. In the second innings when the target was 130, he scored 0." If Hughes had not scored 89 in the first innings and a duck instead the target would then have been 219. It is mainly on account of the drama surrounding 4th innings chases that one tends to accord them more value. As far as the team is concerned  All runs during the course of a "match" count equally. Had Lara scored 153 in the first innings and 8 in the 2nd  mathematically the team would have still won.
[[
What you are really asking me to do a threepart analysis with the Test as the basis as the next one. Once I get past the "100 & 0 in a test is inconsistent" view, things will become clearer. The strongest point in doing Test as the basis is the fact Tests are won, lost or drawn and the performances have more relevance. Referring to my article it would not matter if the 100 was part of a 150 for 9 chase or 700 for 3.
Thanks for an excellent pair of comments.
Ananth
]]
It is wholly incorrect to use "innings" as the unit of consistency in Test matches. A team can, with any conceivable logic, only win/ lose / draw a "match". It cannot possibly win/ lose/ draw an "innings"  under any circumstances. Even to make such a claim is bizarre. i.e the smallest unit of win/loss/draw in Test cricket is a "match".
As such the only consistency of any importance is how much a batsman contributes to a particular "match".To the team a Lara scoring 8 and 153 in a single "match" is of exact consistency to him scoring 80 in each "innings". The unit of win/loss/draw is a "match". Period. If a bowler takes 8 wickets in a match , but with a split of 2 and 6 per "innings" we cannot possibly call him inconsistent.
Only in ODIS may we use single "innings".
It is mindboggling that such basic logical inconsistency has escaped notice.
[[
These are all points of view. If you are so sure about it, why is Bradman's average 99.96 and not 134.5. I am a great supporter of RpT and have used it in many of my calculations. The only thing I can say is that using a Test as the basis is a viable alternative and worth a look in. But not by saying "My way or the highway". Let me give you a counter to the Lara Test. Kim Hughes scored 89 when the going was good at Headingley in 1981. In the second innings when the target was 130, he scored 0.
Let me also add that I was working with both sets of numbers, per innings and per Test but decided on the innings as the most common delivery place. I can always revisit the Test basis, accepting that 120 & 0 is and 60 & 60 both lead to 130 for the Test.
Ananth
]]
In early 2003 Tendulkar played 3 unbeaten knocks that camouflage a lengthy slump. That period is partially responsible in search of G and µ. Of course two variables are not enough hence LS, Q1, Median, Q3 and HS are listed to get a better feel at distribution of scores. Every batsman is vulnerable at the beginning of an innings. Volatility  opposite of consistency  is a given. In general once set we want a player to dig deep. So the lack of consistency due to high scores is welcome. Instead of consistency, if we plot resilience, then both the con_zone and area above are the scores where a batsman overcomes the tyranny of low scores. µ, the AM, accounts very well for the skyscrapers. GeoMean is also influenced by the scores on the RHS of con_zone but not as much as AM. Independently a string of low scores will affect G. I believe the next article will independently look at the difference between G & µ to find careers with plenty of medium scores.
[[
Yes, Milind. The narrow definition of Consistency masks the true value of the innings. My feeling is that this is only the start of a wider analysis. See Harsh's comment. I toyed with doing something purely based on the belowhalfmean position. But decided on this broader definition of consistency. Maybe I should do that and separate the poor starters from the wellpastthefirsthour players.
Ananth
]]