THE CORDON HOME

BLOGS ARCHIVES
SELECT BLOG
June 15, 2013

Statistics

All the quirky Test numbers

Anantha Narayanan
Brian Lara's 400 not out in Antigua is the highest individual Test score  © AFP
Enlarge

This is a very different article. For some time now I have wanted to take an irreverent look at some of the numbers relating to Tests. I have done enough of serious number-crunching and analytical articles. This article is going to be fun. At the same time there are enough interesting sparks to make the readers think.

As such, I want to only add in one caveat, in the form of a message. I understand that the self-styled academics and fault-finders will not find this article interesting, not that they found anything of value in other articles. My message to them is, if you do not have fun with this type of analysis, it is your problem. Do not transfer that problem to others by making inane and worthless comments. My thanks, in advance, for this consideration.

I am going to look at Test batting, purely from the numbers point of view. In this article there might be a player or two mentioned, that will be all. I will look at the batsman scores, team scores, team fall of wickets, the related frequencies, the limits of these number of occurrences, the surprises and the patterns, with no reference to the executors of these numbers. I can assure you that you will find enough interesting gems in this potpourri of information. My apologies also for some haphazard presentation. The nature of analysis is such.

Batsman scores

Brian Lara (this is the only reference to an individual player in this article) scored an unbeaten 400. Nearly 10,000 batsmen scored the grand total of 0, some unbeaten. Between 0 and 400, 298 scores have been reached by batsmen. That means there are 103 scores yet to be reached. This was news to me also. This amounts to more than 25% of all scores of 400 and lower.

Let this be the starting point. Let us now look at these score distributions closely. To put things in perspective, let me first say that there have been 73,626 innings played during the 136 years of Test cricket. This analysis covers up to Test #2089, the second Test between England and New Zealand. That makes it an average of 35.2 innings per Test, this number makes sense. The maximum, of course, is 44 per Test and this works out to about 80%.

The lowest score not yet reached is 229. Why? I do not know. Considering that 231 was reached five times and 232, six times, there is no reason for this specific number, other than the fact that there has to be one number. The next two scores not reached are 238 and 245.

What is the highest score reached by multiple batsmen? Well, 334 has been reached by two batsmen (I have resisted the temptation to name the two batsmen), 333 by two batsmen and 329 by two batsmen. The beyond-two occurrences are at 275 (three times) and 274, also three times. These are just stray numbers: there seem to be no specific reasons.

What is the significance of 167? There has been no instance of a 167* score. Nor for that matter a score of 171*, 175* and 180*.

Now, for the most interesting of these numbers. Look at the following frequencies.

 
	Score Freq NO 
	96 93 5 
	97 75 5 
	98 72 13 
	99 86 5 
	100 149 56 
	101 104 27 
	102 106 32 
	103 108 34 
	104 95 20 
	
An innocuous set of numbers! Not at all. Look at the jump between the score of 99 and 100, well over 60%. This jump is way, way beyond the normal increase or decrease by a small number. What are the reasons? One can only surmise.

- The first reason is that the batsmen take a lot more care as they reach 100. That would explain the significant drop between the score of 96 and 97. Maybe the batsmen take that additional care when they come within a stroke of 100. I can hear someone saying 94 is also a stroke away: agreed, but this is a Test match.
- But why such a high number at 100. The only reason for this must be the number of declarations made when a batsman reaches hundred. Is this borne out by the number of not-outs? Of course, yes. Look at the increase in not-outs from five to 56. And the trend continues beyond also. Well over the 99-value of five, indicating later declarations.
- I am amazed at the importance the score of 100 plays. This obsession seems to have been there over the past 136 years.

Does this trend exist when going from 149 to150 or from 199 to 200?. Let us look at the figures.

 
	Score Freq NO 
	147 21 4 
	148 27 8 
	149 21 3 
	150 29 8 
	151 24 6 
	152 27 5 
	... 
	... 
	Score Freq NO 
	197 6 1 
	198 2 1 
	199 9 2 
	200 15 9 
	201 21 8 
	202 6 2 
	203 15 9 
	

What do we have here? Batsmen moving from 149 to150 show less of this trend than the their movement from 199 to 200. It is clear that the score of 200 is deemed to be far more valuable than the score of 150. Batsmen going from 149 to 150 has a change factor of 21 to 29. Their scores going from 199 to 200 shows clear variation trend: 9 to 15 and then to 21. This again indicates many declarations - revealed by an increase in not outs from 2 to 9.

At what score do we have the highest percentage of not-outs? We should ignore the single scores of 400*, 365* et al. There are two instances of scores on which batsmen remain unbeaten 60% of the times. At 200, there were 15 innings and nine were not outs, making this 60%. And at 203 also, there were 15 innings and nine were unbeaten. The previous discussion seems to clearly pave the way for this. As soon as the batsman reached 200, indicating a team score exceeding 400, there were declarations. Not always, of course. Incidentally there were two unbeaten innings of 199. At 100, the % of not outs is a very high 37.5%.

Now, for some statistical derivations.

The 50-percentile mark appears between 12 and 13. That means over 50% of innings have scores below or at 13. The Median value of scores is 13. The 90-percentile mark is reached between the scores of 68 and 69, meaning that below 10% of the scores are above 69. The Weighted Mean (by the frequency of such scores) of all scores is 28.3. In terms of runs scored at a specific score, the score of 30 gets the top place. There are 680 occurrences, leading to a total of 20,400 runs. I am not sure whether these figures mean anything to readers. But it is a nice thing to round up the analysis.

A distribution of individual scores of players in Tests  © Anantha Narayanan
Enlarge

The graphs for this article were quite difficult to do. There were two problems. The range was mind-boggling: 0 to 400, 0 to 952 and 26 to 829. This made drawing of graph quite difficult in view of ESPNcricinfo's limit of 640 pixels for the graphs. The bigger problem was the value of numbers. Over 10,000 for a score of 0 and 1 for 400 and a similar situation for the team scores.

Hence I have adapted different methods. For the Batsmen scores I have not looked at the entire range of 0 to 400 which would have been impossible to handle from both X and Y axis point of view. So I zeroed into a range of 80 to 220. These are important scores and the exclusion of scores up to 80 meant that the frequencies were more manageable. Anyhow the high scores had very small numbers.

The graph is self-explanatory. Readers can see visual evidence of all what we have discussed so far. The drops before 100, the huge spurts after reaching 100 and the moderate spurts around the 150 and 200 marks. There is no clear pattern emerging other than a general decrease in values.

Team scores (All)

In the 2,089 Tests 7,561 innings were played. The qualification is that at least a ball should have been bowled. With this macro number in place let us see some of the interesting facts about these numbers. First I will look into the set of all innings, complete and incomplete.

The range of team scores is 0 to 952. Yes, you read it correct. Zero was a valid score. I wondered whether my program had gone wrong. But then I may go wrong but my program, never. So I went and checked. Yes, it is true. In Test# 1293, New Zealand scored 0 for 0 off a single ball bowled by Sri Lanka. Why did they play a single ball? Wisden Almanack reports that "Vaas bowled one ball of New Zealand's second innings before the umpires agreed that the light was too bad."

The highest score is 952 and everyone, especially the Indian bowlers, would have painful memory of those 3 days. It is a fair guess how many days might have been needed in this Test to produce a result: Maybe 11. The lowest three scores not reached are 18, 56 and 557.

The score distribution for all innings follows a normal distribution between zero and 600. Beyond that the numbers are very low. The most popular score is 296, with 37 occurrences. Then come 252 and 223, with 34 occurrences each. Why? I have no idea. I am as much bemused as you are, with these specific numbers. And may I ask why there should be 10 scores of 521 and 1 of 525 and 2 of 514? Sure beats me.

I will not be doing any statistical derivations for these types of team innings since there are many short innings, especially in case of wicket-wins, and these distort the numbers.

A distribution of team scores in Tests  © Anantha Narayanan
Enlarge

Since there are 950+ individual scores reached by different teams, it is not possible to do a graph with the individual values. Even a pixel-a-score will go past 640, the maximum graph size. Hence I have grouped these scores in lots of ten score values. 0 to 9, 10 to 19, 20 to 29 and so on. Then we can at least get an idea of the distribution.

The graph follows a reasonably well-defined normal distribution between 0 to 600 and then an extended tapering off. Since these are scores in ranges (120-129, 210-219 and so on), this only gives an idea of the score distribution. The highest frequency is at the score range, 250-259, with 265. The next highest is some distance away, at 220-229, which has a frequency of 259. Surprisingly the next best is further back, 190-199, with a frequency value of 240. Let us not forget that many of these innings might be third innings declarations and fourth innings chases.

Team scores (Completed)

Now for the completed innings, which I will define as those in which all ten wickets were captured. There were 5,317 such completed innings. It is obvious that I do not include both the declared innings as well as those where one or more players did not bat.

The score distribution for all innings follows a normal distribution between 0 and 600. Beyond that the numbers are very low. This range is 26 to 849. Twenty-six was New Zealand's total against England in 1955. A deficit of 46 in the first innings, this eminently forgettable innings and a loss by an innings and 20 runs. It is amazing that this total was in danger of being toppled recently from its pedestal by Australia.

"849 all out". That surely reads funny. What was England aiming at? I know these were the days of timeless Tests. But 849 was surely a lot of runs. On top of that they did not enforce the follow-on, when ahead by 563 runs and as justice would have it, West Indies batted for nearly two days and drew the match. The next highest score is 747, made during 2005 by West Indies against South Africa.

There are quite a few scores not reached by dismissed teams below 60. Hence, looking at 60-plus scores, we find that no team has ever been dismissed for 69. The next such occurrence is at 514. Let me ask a question. Why should there be 18 occurrences of 211, 17 occurrences of 212, 27 occurrences of 214, 22 occurrences of 215 but only seven occurrences of 213? Probably nothing more than the way the dice fell.

The highest total reached by two different teams is 617. In the innings played category, 226 and 252 were the most frequent scores. No surprise that the most popular all-out score is also 296, with 30 occurrences, which means that this score was attained seven more times in an incomplete innings. Not unexpectedly, the next one is 252, with 29 occurrences which indicates five other incomplete innings. These two were in the top positions in the innings played category also. The third one is 240, also with 29 occurrences. What is so great about 296 and 252?

The mean score for the 5,317 innings is 273.6. The Median score is 257. Let us not forget that these are completed innings. So this number gives us a comfort feeling. The 90-percentile is reached around the score of 434 meaning that fewer than 10% of scores are above this mark.

A distribution of team "all-out" scores in Tests  © Getty Images
Enlarge

Even here, since there are over 800+ team scores, it is not possible to do a graph with the individual values. Hence I have grouped these scores in groups of ten score values. 0 to 9, 10 to 19, 20 to 29 and so on. Then we can at least get an idea of the distribution.

The graph follows a more well-defined normal distribution between 0 to 600 than the earlier team scores, possibly because this includes only all-out situations and the lowest is 26. Since these are scores in ranges (120-129, 210-219 and so on), this only gives an idea of the score distribution. The highest frequency is at the same score range, 250-259, with 213. The next highest is some distance away, again at 220-229, which has a frequency of 208. The next best is in between, at the range of 240-249, with a frequency value of 188.

Partnerships

Partnerships do not offer as much variety as batsman and team scores. A hundred is very important to a batsman but not necessarily to a pair. However this will round off the article.

There have been a total of 64,198 partnerships. This may not match some other data derivations since this analysis is solely based on the FoW data in the scorecards. The information about a player retiring and another one took taking his place, is not always available since only the scorecard is considered. The average runs added per partnership is 29.6.

There are well over 10,000 partnerships with no run added. The highest, of course, is 624 by the two Sri Lankan greats. There have been over 3,900 partnerships of 100 and over, 489 pairs of batsmen have added 200 and over. Seventy-seven have crossed 300 and 14 have gone past 400. Let me repeat that these figures might be off by a few since retirements and three-batsmen partnerships might affect the numbers.

There does not seem to the frenzied change of frequencies around the hundred-mark. This is understandable since a partnership reaching hundred is probably considered less significant than a batsman reaching hundred. The other significant feature of the individual and partnership distributions is the continuous declining trend from zero onwards, unlike the team distributions which follow a normal distribution pattern. The reason could be that the individual innings and partnerships end with a single dismissal and across different type of batsmen, the initial stages are most difficult to navigate through.

The difference between the scores of 0 and 1 is quite revealing. Individual score values for 0 is 9,191 and 1 is 3,682. Partnership frequency for 0 is 10,260 and 1 is 2,383. This steep drop for partnerships can be attributed to the fact that the partnership ends when either batsman gets out increasing the chances of it being broken at 0.

Fall of Wickets

I have already gone way past the estimated size of the article. Hence I am only going to give the summary of the fall of wickets numbers.

Seven-hundred-and-seventy wickets fell at 0, which is the highest. Then the numbers oscillate either side of 200 for the next 200 runs, indicating a very balanced distribution of fall of wickets. Then the numbers drop off slowly until we reach the first of the zero occurrence situations. This is as late as 622. Who could imagine that the first occurrence of the score at which no wicket has fallen would be 622? The highest score at which a wicket has fallen is 924. The highest score at which wickets have fallen in different matches is 707, at which 3 wickets have fallen.

One quirky table: Product of score and batting position ???

In line with the mood of the article, I am rounding off with a quirky table. A hundred scored at number nine is normally much more valuable than one scored at no.3. Many a late order fifty might be more valuable than a top order hundred. So I created a new table by multiplying the batsman score by the batting position (openers considered as 2). I created a table where the product was 1,000 and more. There are many variations possible to this idea. All high order positions can be normalized to, say, six. And so on. However the emphasis is more on high low-order innings.

A total of 142 innings qualified. I have shown here the top ten innings. The complete table can be downloaded. I am quite sure a finicky reader might say, "How can I multiply two disparate values?", "4x200 or 3x250 is more valuable than 10x100" and so on. I agree that it is quirky but does not matter a tinker's damn. This article is not about proper numerical methods but one which emphasises the beauty of numbers. No serious conclusions should be drawn.

 
	Batsman Test I BP Runs BP*Runs 

Wasim Akram 1336 2 8 257 2056 DG Bradman 257 3 7 270 1890 Imtiaz Ahmed 414 2 8 209 1672 MJ Clarke 2027 2 5 329 1645 IDS Smith 1139 1 9 173 1557 DS Atkinson 406 2 7 219 1533 SCJ Broad 1971 1 9 169 1521 DG Bradman 236 2 5 304 1520 KD Walters 796 1 6 250 1500 DPMD Jayawardene 1810 2 4 374 1496

I was almost certain of the top position. The very high double hundred by Wasim Akram, batting at number eight, leads the table. Bradman's famous 270 at number seven is in next position. Imtiaz Ahmed comes in next with his double hundred at number eight. Michael Clarke is one of only two batsmen to score a triple-hundred, batting at no.5 and this has helped him to be in the top half. Ian Smith's explosive 173 against India, at number nine, completes the top five. The other triple-hundred is by Don Bradman, also in this table. All of us are familiar with the recent innings of Stuart Broad.

To download/view the composite file containing all tables, a veritable treasure-trove of information, please CLICK HERE.

I have had so much fun in doing this article that I will do the same for ODIs also.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

RSS Feeds: Anantha Narayanan

© ESPN Sports Media Ltd.

Posted by red_forever on (June 27, 2013, 7:02 GMT)

"The minute he moves to 5/6 position he will move on, not holding the playing position for dear life."

Well then after the Wimbledon Federer will be ranked No5 below Nadal and ferrer as he will 1955 points of the 2000 he won due to 2012 Title and he himself has stated that he will continue to play.

Not that i say he should retire, but that to see him go out before the 2week is sad.
[[
I meant a continued positioning at 5/6 rather than "the minute" he moves to no.5/6. Maybe I used the wrong words. I would say that if he spends an year outside the top-4, he would consider quitting especially if his Grand Slam records are ordinary..
Ananth
: ]]

Posted by red_forever on (June 26, 2013, 20:01 GMT)

One day and everything changes. It was just yesterday we were discussing the Fed vs Nole final and now Andy Murray has the safest passage to a wimbledon final in years.

So the streak has ended at 36. Not sure you would like to hear this, but federer is clearly in decline. 2 Slam finals in last 14 attempts. He is losing to people who In his pomp he would have dispatched like bowling alley balls. Quarters and semis arent what we expect from 17 time champ. It hurts to see him walk off like that.

This season he was architectof his own of his decline.When after the Tour finals all players went off and rested he went and played exhibition matches in high altitude places in South america and pocketed cool $ 14Mn .When by the time of Aus open Nole and Murray were fresh, fed looked jaded . You should head in the may june Slam fest in proper mode but Fed took 2months off when others were busy fine tuning their skills against the best. Finally i sign off by saying those $ 14Mn did all
[[
But "nobody died". We have been privileged to live in the same era as Federer. There is no point in bringing in money and preparation and rest when you realize that two points won either side of the second set tie-break could have got Federer a straight set win. I rue the missed opportunities but I do not go looking for reasons. If he is on the decline, that is true. The minute he moves to 5/6 position he will move on, not holding the playing position for dear life. It is an individual game, he takes his own decision and he will take it, as night follows day.
Ananth
: ]]

Posted by red_forever on (June 25, 2013, 6:33 GMT)

Ananth: Now the stars seem to be lining up well for Federer especially after THE beast in Federer's head lost yesterday.

But still i feel this is a Djokovic tourney to lose. He is the favourite for me and then comes Murray and Federer despite this being played on Federer rented home. Given that Federer hasnt beaten a top 5 player in the last 6months. So that's the reason he is No 3 for me. But heart says he will reach Finals vs Djoker.
[[
I agree with all what you have said. And if there is a Fed-Djoko final, I cannot put it any less than 55-45 for Fed irrespective of anything else. At the same time if they were playing across the pond a couple of months later, it would be 58-42 for Djoko.
Ananth
: ]]

Posted by Anand_S on (June 21, 2013, 15:41 GMT)

Ananth: On a more statistical note, did you try fitting a distribution to the team scores and individual scores? From your bar graph the individual scores appear exponentially distributed and the team scores appear to be Gaussian (central limit theorem at work here?) What I did not verify is, if you take the averages of batsmen's individual scores at all batting positions, do the sum of these averages add up to the average team score? Should be according to what your analysis says. But importantly I wanted to see if I can get a bar graph for the distribution of batsmen's scores for each batting position. I work a lot on data analytics and your excellent work here is a data analyst's dream. I know my comment may be overwhelming for people who do not have any background in statistics. INfact, I would not blame you if you choose not to publish my comment for the same reason. Would still appreciate if you can respond to me by email.
[[
Anand, I was not a Statistics student. My knowledge of Statistics is fairly limited, good enough to do some minimal work and understand under-graduate level work but not good enough to go beyond. In fact the data table which can be downloaded is very exhaustive but may not contain all required data for your work. If I know that specifically I can create the required table and post it for your use. You have my mailid.
Ananth
: ]]

Posted by fred_carver on (June 21, 2013, 9:23 GMT)

So, is there any statistical significance to the Nelson?
[[
Probably not any significance. The 99 to 100 changes are far more startling. Probably because 99/100 is universal and 111 is only a Western situation.
Ananth
: ]]

Posted by alarky on (June 18, 2013, 13:59 GMT)

What beautiful matches we are watching at the Champions Trophy - the highest quality of cricket, being played in any format of the limited overs tournaments! Could some person of influence in international cricket be asked to carry out a poll, to find out from the fans: "Which should be scrapped, The ICC or The Champions Trophy"; and, make the result from the polls legal?

I am seeking the help every cricket blogger, to please let me share this opinion with all other cricket fans.
[[
I feel that the Champions' Trophy has become a lot more competitive and interesting because of the uncertainties of English weather and pitches. Cardiff changed overnight. Even at Oval yesterday there was great contest between bat & ball. I think the Champions' Trophy should not be scrapped. It is a concise, short and engrossing event. But what I don't want is for the Test Championship to be delayed further because of this.
Ananth
: ]]

Posted by cricket-india on (June 17, 2013, 16:49 GMT)

that's a lot of data and numbers for a professed 'fun article.' anyway thanks anant for the lowest score not yet reached by any test batsman so far (229), this was a question I had for steven a few months ago but he didn't answer.

Comments have now been closed for this article

ABOUT THE AUTHOR

Anantha Narayanan
Anantha spent the first half of his four-decade working career with corporates like IBM, Shaw Wallace, NCR, Sime Darby and the Spinneys group in IT-related positions. In the second half, he has worked on cricket simulation, ratings, data mining, analysis and writing, amongst other things. He was the creator of the Wisden 100 lists, released in 2001. He has written for ESPNcricinfo and CastrolCricket, and worked extensively with Maruti Motors, Idea Cellular and Castrol on their performance ratings-related systems. He is an armchair connoisseur of most sports. His other passion is tennis, and he thinks Roger Federer is the greatest sportsman to have walked on earth.

All articles by this writer