# 19 grounds, 19 years - an in-depth study

A few readers had made comments in response to my article on "Test Openers" that the pitch/ground conditions should be taken into account while determining the value of an opener's innings. I had responded with a short message on the difficulties of determining the true nature of any pitch/ground. I have been thinking over these comments and have felt that it is essential to explore this point in depth. Thanks to Tushar, and others before, for raising queries in this regard.

The most comical situation in an ODI telecast are the pitch specialist's comments. They are as reliable as a weather forecaster's. When Ravi Shastri pontificates "it is a belter", one can be rest assured that one in two innings would have floundered to 201 for 7 in 50 overs. Alternately when David Lloyd says with his "Roses" twang that "250 should be a winning score", I alwasys look for the situation 7 hours later when the batting team has successfully chased a 300+ total. I wish the broadcasters show a split image of the pitch specialist's comments and the innings scores.

Test matches are different. Normally the specialists comment on the first session and make overall comments. One thing I am sure. No pitch specialist, no analyst or for that matter no curator can, with confidence, forecast how the pitch would behave.

This analysis covers 19 premier Test grounds across 9 countries. MCG, SCG, WACA, Lord's, Oval and Headingley lead the field. These are the major Test playing grounds, with most of these grounds clocking in at over 100 Tests. Then I have taken two grounds from each of the other six major Test playing countries. One ground from Bangladesh completes the selection. This brings up the 19 grounds.

I have taken matches played in these grounds during the last 19 years (from 1.1.1990 onwards) for consideration. Barring Calcutta and Chennai where only 9 Tests have been played during these 19 years (because of BCCI's rotation policies), the other grounds have completed 10 or more Test matches, with 32 Tests at Lord's, London leading the field. A total of 338 Tests are analysed.

Anticipating the readers' comments, I looked at excluding the Test matches played against Bangladesh and Zimbabwe. However that is fundamentally wrong since this is a statistical analysis and I cannot take casual liberties with my selection methodology. Also one of the grounds is in Bangladesh. One should also not forget the fact that a strong team like India was dismissed for 75 on the opening day by South Africa in India and the same team, a few months back, scored 705 against a strong Australia at Sydney. So all the Tests are considered.

In order to have uniform conditions I have taken the **completed** (all out or delaration) first innings. This is to avoid a Test abandoned with the first innings standing at 24 for 3 or 150 for 5. Later innings vary a lot and will distort the figures considerably.

Readers should remember that this is a departure from my usual analysis insofar as it is a purely statistical analysis. I have tried to make the analysis simple and understandable and explained the statistical terms. With this background, let us look at the tables.

The first is a simple table listed in order of the Mean. The mean is an alternate term for Average. It is worked out by the following formula.

Sum of all values Mean = ----------------- No. of valuesMean is a very useful value for analysis. One can make a generalised observation on a possible score at the ground. However Mean is strongly affected by very high and very low values. As such, a pinch of salt should be available nearby. I have also got the mean of the most recent 5 Tests played on the ground and presented this and compared with the mean. That shows a recent trend.

**Table of Mean scores (in order of Mean)**

Ground Num Total Mean Last Ratio Tests Runs 5 matNational Stadium, Dhaka has the lowest mean. Understandable since that involves 7 innings by Bangladesh, 6 of these below 204. Asgiriya Stadium, Kandy also has a fairly low mean value. Here different teams have been dismissed for low scores. Surprisingly Kingsmead, Durban has also showed a penchant for low scores.National Stadium, Dhaka 10 2229 222.9 238 1.07 Asgiriya Stadium, Kandy 16 4098 256.1 173 0.68 Kingsmead, Durban 16 4333 270.8 247 0.91 Basin Reserve, Wellington 24 6752 281.3 231 0.82 National Stadium, Karachi 12 3446 287.2 299 1.04 Sabina Park, Kingston 15 4373 291.5 320 1.10 Eden Park, Auckland 16 4706 294.1 282 0.96 S.S.C Ground, Colombo 29 8966 309.2 278 0.90 M.A.C Stadium, Chennai 9 2871 319.0 285 0.89 Kensington, Bridgetown 19 6115 321.8 344 1.07 Wanderers, Johannesburg 19 6118 322.0 261 0.81 Gaddafi Stadium, Lahore 16 5204 325.2 363 1.12 Melbourne Cricket Ground 20 6707 335.4 318 0.95 Sydney Cricket Ground 22 7900 359.1 399 1.11 Lord's, London 32 11665 364.5 449 1.23 Headingley, Leeds 16 5860 366.2 407 1.11 Eden Gardens, Calcutta 9 3348 372.0 426 1.15 W.A.C.A. Ground, Perth 19 7090 373.2 431 1.16 Kennington Oval, London 19 7380 388.4 374 0.96

At the other end, Eden Gardens, WACA and Oval have had a fairly high Mean values. It is surprising that there is almost a 75% difference between the low and high Mean values.

Asgiriya Stadium, Kandy has shown an alarming dip in the first innings scores recently. The ratio is 0.68. Basin Reserve, Wellington has seen its Mean value dip by 20%. At the other end, there is a marked increase in first innings scores at Lord's.

The Mean does not reflect the data distribution truly. A simple example. A batsman scoring 100 and 0 in the two innings of a test has a Mean value of 50, which is the same value of another batsman who has scored 50 and 50. However the two values of the first batsman have a much higher degree of variance. This is determined by the measure Standard Deviation which is probably the most used of all statistical measures.

**Table of Standard Deviation and CoV (in order of CoV)**

Ground Mean StdDevn CoVNational Stadium, Karachi 287.2 77.2 26.9 % Melbourne Cricket Ground 335.4 92.0 27.5 % Sabina Park, Kingston 291.5 85.9 29.5 % Kingsmead, Durban 270.8 84.0 31.1 % Eden Gardens, Calcutta 372.0 126.6 34.1 % W.A.C.A. Ground, Perth 373.2 136.7 36.7 % Sydney Cricket Ground 359.1 132.2 36.9 % Eden Park, Auckland 294.1 116.0 39.5 % Kennington Oval, London 388.4 154.1 39.7 % Kensington, Bridgetown 321.8 129.1 40.2 % National Stadium, Dhaka 222.9 92.0 41.3 % S.S.C Ground, Colombo 309.2 130.6 42.3 % Wanderers, Johannesburg 322.0 139.1 43.3 % Lord's, London 364.5 163.4 44.9 % Asgiriya Stadium, Kandy 256.1 115.3 45.1 % M.A.C Stadium, Chennai 319.0 148.2 46.5 % Headingley, Leeds 366.2 172.3 47.1 % Gaddafi Stadium, Lahore 325.2 161.5 49.7 % Basin Reserve, Wellington 281.3 147.7 52.5 %

**Standard deviation**is the measurement of the distribution of data about the Mean value and describes the dispersion of data on either side. A low standard deviation indicates that the data set is clustered around the mean value, whereas a high standard deviation indicates that the data is widely spread with significantly higher/lower figures than the mean. The squaring and taking root option eliminates the problem with negative values.

This calculation is described by the following formula in fig 1, where the two 'x' values represent Mean and individual value (sign immaterial). Instead of n, n-1 is used as the divisor.

The three English grounds have a very high value of SD, indicating quite a lot of dispersion. Karachi, Durban and Kingston have low SD values indicating a clustering of values around the Mean value.

Standard Deviation has little interpretable meaning on its own unless the Mean value is also reported alongwith. For a given standard deviation value, it indicates a high or low degree of variability only in relation to the mean value. For this reason, it is easier to get an idea of variability in a distribution by dividing the Standard Deviation with the Mean. If this is then represented as a % of Mean, it is called as Coefficient of Variation (CoV), which is a dimension-less ratio.

In general, a low CoV indicates a lower value of SD w.r.t. Mean and a high ratio indicates vice versa. Where CoV is quite high, such as Basin Reserve and Lahore, it would be next to impossible to do any prediction of expected scores. For these and a few other grounds, the SD is around half the Mean value and there is wide dispersion of scores. On the other hand look at MCG and Karachi. The low CoV indicates a heavy clustering of values around the Mean and one can do a decent attempt at predicting a score or at least a score range.

Now we come to an analysis of the quartile scores and Median. Three measures are important in this analysis. Q1 is the first quartile score, the score which is at 25% position. Q3 is the third quartile score, the score which is at 75% position. But the most important score is Q2, known more as Median which is the score at mid-point. If there are odd number of entries, the Median is the mid-score. If there are even scores, the Median is the average of the two mid-point scores.

**Table of Quartile values and QVC (in order of QVC)**

Ground SD Q1 Median Q3 QVCThe Quartile Variation Coefficient (QVC) which is determined by the formula given below represents a measure of central dispersion. It is also a dimension-less ratio. Even though this takes into account only 50% of data, the QVC is a very valuable measure since the 50% considered is the most important either-side-of-middle areas. This can also be expressed as a % value.Eden Gardens, Calcutta 119.4 305 371.0 428 0.17 Melbourne Cricket Ground 89.6 270 342.5 394 0.19 Sydney Cricket Ground 129.2 291 317.5 451 0.22 National Stadium, Karachi 73.9 216 270.5 337 0.22 S.S.C Ground, Colombo 128.3 234 285.0 380 0.24 M.A.C Stadium, Chennai 139.8 235 257.0 391 0.25 Sabina Park, Kingston 83.0 225 265.0 374 0.25 Wanderers, Johannesburg 135.4 226 302.0 411 0.29 Eden Park, Auckland 112.3 203 283.5 380 0.30 Kingsmead, Durban 81.3 198 261.5 366 0.30 National Stadium, Dhaka 87.3 160 193.5 298 0.30 Basin Reserve, Wellington 144.6 174 245.0 342 0.33 Kensington, Bridgetown 125.7 224 298.0 446 0.33 W.A.C.A. Ground, Perth 133.1 239 373.0 485 0.34 Asgiriya Stadium, Kandy 111.7 150 263.5 305 0.34 Kennington Oval, London 150.0 236 380.0 484 0.34 Lord's, London 160.8 255 350.5 528 0.35 Gaddafi Stadium, Lahore 156.4 183 291.0 398 0.37 Headingley, Leeds 166.9 198 375.5 515 0.44

Q3 - Q1 QVC = ------- Q3 + Q1

A low value indicates a very strong clustering of values around the Median. For instance for MCG, the Median is 342 runs, the Q1 value is only 70 runs away and the Q3 is only 52 runs away. So the Q1-Q3 differential is only 146 while the overall range, as seen next, is a whopping 392. Similar situation for Eden Gardens and SCG.

On the other hand, a high QVC indicates a thinning of the central area. Take Headingley. The median is 375, Q1 is 177 away and Q3 is 140 away. Q1-Q3 is a high 317 out of a total Range of 481 runs.

**Table of Ranges and SDs (in order of Range-SD ratio)**

Ground SD Low High Range Ratio ScoreThere is another important measure which is the Range, which is the difference between the low score and high score. In other words this measure indicates the range of scores, as its name indicates. By itself the Range is of no great relevance. It has to be seen in relation to the SD. Hence I have worked out a ratio of Range to SD. The above table is sequenced by this ratio.M.A.C Stadium, Chennai 148.2 167 560 393 2.65 Headingley, Leeds 172.3 172 653 481 2.79 Sabina Park, Kingston 85.9 164 431 267 3.11 National Stadium, Dhaka 92.0 107 400 293 3.18 Kennington Oval, London 154.1 173 664 491 3.19 National Stadium, Karachi 77.2 196 450 254 3.29 Gaddafi Stadium, Lahore 161.5 147 679 532 3.29 Kingsmead, Durban 84.0 139 420 281 3.35 Eden Gardens, Calcutta 126.6 185 616 431 3.40 Asgiriya Stadium, Kandy 115.3 71 469 398 3.45 Lord's, London 163.4 77 653 576 3.52 Basin Reserve, Wellington 147.7 110 660 550 3.72 W.A.C.A. Ground, Perth 136.7 82 602 520 3.80 Wanderers, Johannesburg 139.1 119 652 533 3.83 Kensington, Bridgetown 129.1 102 605 503 3.90 S.S.C Ground, Colombo 130.6 89 600 511 3.91 Eden Park, Auckland 116.0 139 621 482 4.16 Sydney Cricket Ground 132.2 150 705 555 4.20 Melbourne Cricket Ground 92.0 159 551 392 4.26

Normally the ratio is between 2.0 and 6.0. Anything outside these values indicates a way-out distribution of values, either a completely dispersed distribution or a completely centralized distribution.

A low value, say 2.65 for Chennai indicates a high SD value while a high value, such as 4.26 for MCG, indicates a low SD value. A low ratio indicates a wide dispersion and a high ratio indicates central clustering.

**Conclusion**:

1. Mean scores are a reasonable indicator of the expected score. Prediction based on Mean & SD is a possible task. Let us take Kingston. The mean is 292 and SD is 83. If one takes an empirical formula of Mean + or - 0.5 of SD, one can estimate a first innings score of between 251 to 333. One could even increase by the last 5 Test average factor, 1.10, leading to an educated estimate of 276 to 366. Let me see what happens since I am writing this before even the Kingston toss. *(On 6/2/09) Ha! England scored 318, smack mid-point of this projection. Not a bad attempt.*

2. Evaluation of an innings and individual score is virtually impossible. Headingley has had scores of 570 for 7 during 2007 vs West Indies and 203 all out during 2008 against South Africa. Let us say that Australia or England score 350 in the first innings at Headingley, a few months later. Compared to 2008, it is a great performance while compared to 2007, it is a poor performance. What does one do with any degree of confidence. One can use the Mean value for such analysis, with no great degree of confidence. However as a single point of measure in a broad frame of analysis, it is worth considering.

3. How does one evaluate an innings at Dhaka. The low Mean value will increase the valuation of most innings. However the low scores have been caused by string of low Bangladeshi scores. If we exclude the Bangladeshi scores, then there will be no data available. Other grounds do not present this difficulty since there are not many Bangladeshi innings. Especially in India, where BCCI, with its infinite arrogance, has never invited Bangladesh.

4. The wide variations in innings scoring patterns between grounds belonging to same country is amazing. Look at the figures for the two Pakistani grounds and two Indian grounds.

5. There is a recent batting domination in England and drop in scores at Kandy and to a lesser extent at Wanderer's and Basin Reserve.

**Graphs**: I have done no Graphs barring for two grounds, Lord's and Headingley - chronological scores to show the yo-yo nature of scores. A BoxPlot is an excellent means of pictorially depicting the quartile variations but we need to do one for each ground.

Please click here for a chronological list of Tests, for selected grounds.

Please click here for a list of Tests sequenced by runs scored, for selected grounds.

I have given explanations to the best of my knowledge. However since my knowledge of statistics is of an acquired nature, there might be errors and/or alternate explanations. I call upon my fellow columnists and readers to come in with their own suggestions and comments.

Anantha Narayanan has written for ESPNcricinfo and CastrolCricket and worked with a number of companies on their cricket performance ratings-related systems

Comments have now been closed for this article

Very good articles at your blog Good day did where i can find RSS entries?

Ananth,

Nice article, as you normally do. One question though - why solely look at runs made in the first innings, irrespective of the time taken?

A pitch that is 'a belter' is generally one in which the batting can dominate the bowling, as opposed to one in which batsmen are made to work hard. These pitches are characterised by a combination of high numbers of runs, low numbers of wickets, and a high runs-per-over rate.

My personal opinion here, but if a team is all out for 400 in the first session of the second day, that would be a belter (~4.0rpo). If a team is all out for 400 near the end of the last session of the second day, then the batsmen have been made to work hard for their runs (~2.5rpo).

Perhaps run accumulation speed should be a factor in this analysis? This should actually benefit grounds that Bangladesh play on, since they generally accumulate their runs at a fast rate, and correctly penalise grounds where teams graft an innings.

[[ Luckily scoring rates for teams is not a problem, as compared to individual batsman scoring rates. Will look at this. The term "belter" is a Ravi Shastri pronouncement, normally only in ODI matches. While on the subject, Botham's pitch report before the ill-fated North Sound test was outstanding. Ananth: ]]Excellent article Ananth. However I thought I would let you know that India's 705-7 dec v Australia at Sydney was in 2004 and their 76 all out v South Africa was in 2008, and therefore not a few months apart as you have stated.

This refers to the mail received from Aziz on the method used to project the Kingston first innings. First let me say that this is my own derivation and not picked up from any text book. I have worked out a projection range which is 0.5 of Standard Deviation on either side of a value which could be the Mean or an Adjusted Mean. The adjustment is one'e own. In this case, the increase in average for the last 10 tests. When I look through the working, I probably reversed the steps. The correct method would have been to adjust the Mean by increasing it by 10% AND THEN work out a range which is half SD on either side. The final result would been 320 +/- 43 on either side which is almost the same as indicated in the article.

Overall a good article. I was quite intrigued by your Kingston projection methodology. What was the basis of 0.5 of Std Dev on either side. Is there a formula for that. Your final projection was quite good, notwithstanding the 51 in the second innings. But this method of projaction I have not seen anywhere.

[[ Aziz, Because of the length of the article, I missed explaining that. I will send a separate comment on this point since there is a bit of explaining to do. Ananth: ]]Now for the negative one. Why could the study not have been done for all grounds, with, say 5 Test matches as a minimum criteria. that would have enabled many of us to have a look at our favourite or home grounds, in my case, a ground where not many Tests are played.

[[ I think that is a valid point and not necessarily a negative comment. My only problem was that the number of grounds would have been too many and the tables would have become long. In fact I started with 10 grounds and then expanded to 19. Probably I would do a one-off exercise and post the same in a convenient place. Ananth: ]]Two comments, one slightly negative and the other one positive. This is a very good article bringing out the pitch/ground variations. Some of our long-felt myths have been shattered. I always thought that Leeds was the most difficult ground in the world to bat on. That is not true as also some of the English grounds. Also look at the variations amongst Sri Lankan grounds. All in all a good study.

Received directly from Saurabh Aggarwal A good post despite its academic overtones. You seem to have explained the statistical terms quite well for the laymen. Is it possible to do a 19 year list of great Test innings with the bowling quality and the above discussed ground characteristics as the key points. Would let us enjoy the modern classics from an analytical viewpoint.

[[ That seems to be a good idea. Let me look at it. It so happens that 1989/90 is also the beginning point of the careers of two of the greatest batsmen ever known, two batsmen whose average height is barely 5' 6" !!! Ananth: ]]Yes, an estimate using n will be off a little bit when you have a small number of observations -- hence n-1 is traditionally used. However, when you have many observations, using n instead of n-1 actually has smaller error in approximating the true variance.

Anyway, like you mentioned, it doesn't affect the result of your calculation much :). Thanks for the great article!

[[ Aneesh You are correct. I also noted that the impact of this change was more pronounced for the two Indian grounds (only 9 matches each) as compared to the other grounds. Ananth: ]]Hi Ananth, I noticed some commenters complaining that you should have used (n-1) in computing standard deviation.

In fact, both n and (n-1) are perfectly acceptable to use. They will give similar estimates of the standard deviation. Using n-1 will give an unbiased estimator of the variance (variance = std-dev^2 ) that has a larger error, whereas using n will give a slightly biased estimator of variance with a smaller error.

[[ Aneesh I got the same feeling when I re-read some of the web-articles on SD. However I changed from n to n-1 because of the following example. Say the values are 1, 2, 3. Using n-1 gives a SD of 1 which seems to be the correct SD. Using n gives a value of around .81 or so. Ananth: ]]Very good articles at your blog Good day did where i can find RSS entries?

Ananth,

Nice article, as you normally do. One question though - why solely look at runs made in the first innings, irrespective of the time taken?

A pitch that is 'a belter' is generally one in which the batting can dominate the bowling, as opposed to one in which batsmen are made to work hard. These pitches are characterised by a combination of high numbers of runs, low numbers of wickets, and a high runs-per-over rate.

My personal opinion here, but if a team is all out for 400 in the first session of the second day, that would be a belter (~4.0rpo). If a team is all out for 400 near the end of the last session of the second day, then the batsmen have been made to work hard for their runs (~2.5rpo).

Perhaps run accumulation speed should be a factor in this analysis? This should actually benefit grounds that Bangladesh play on, since they generally accumulate their runs at a fast rate, and correctly penalise grounds where teams graft an innings.

[[ Luckily scoring rates for teams is not a problem, as compared to individual batsman scoring rates. Will look at this. The term "belter" is a Ravi Shastri pronouncement, normally only in ODI matches. While on the subject, Botham's pitch report before the ill-fated North Sound test was outstanding. Ananth: ]]Excellent article Ananth. However I thought I would let you know that India's 705-7 dec v Australia at Sydney was in 2004 and their 76 all out v South Africa was in 2008, and therefore not a few months apart as you have stated.

This refers to the mail received from Aziz on the method used to project the Kingston first innings. First let me say that this is my own derivation and not picked up from any text book. I have worked out a projection range which is 0.5 of Standard Deviation on either side of a value which could be the Mean or an Adjusted Mean. The adjustment is one'e own. In this case, the increase in average for the last 10 tests. When I look through the working, I probably reversed the steps. The correct method would have been to adjust the Mean by increasing it by 10% AND THEN work out a range which is half SD on either side. The final result would been 320 +/- 43 on either side which is almost the same as indicated in the article.

Overall a good article. I was quite intrigued by your Kingston projection methodology. What was the basis of 0.5 of Std Dev on either side. Is there a formula for that. Your final projection was quite good, notwithstanding the 51 in the second innings. But this method of projaction I have not seen anywhere.

[[ Aziz, Because of the length of the article, I missed explaining that. I will send a separate comment on this point since there is a bit of explaining to do. Ananth: ]]Now for the negative one. Why could the study not have been done for all grounds, with, say 5 Test matches as a minimum criteria. that would have enabled many of us to have a look at our favourite or home grounds, in my case, a ground where not many Tests are played.

[[ I think that is a valid point and not necessarily a negative comment. My only problem was that the number of grounds would have been too many and the tables would have become long. In fact I started with 10 grounds and then expanded to 19. Probably I would do a one-off exercise and post the same in a convenient place. Ananth: ]]Two comments, one slightly negative and the other one positive. This is a very good article bringing out the pitch/ground variations. Some of our long-felt myths have been shattered. I always thought that Leeds was the most difficult ground in the world to bat on. That is not true as also some of the English grounds. Also look at the variations amongst Sri Lankan grounds. All in all a good study.

Received directly from Saurabh Aggarwal A good post despite its academic overtones. You seem to have explained the statistical terms quite well for the laymen. Is it possible to do a 19 year list of great Test innings with the bowling quality and the above discussed ground characteristics as the key points. Would let us enjoy the modern classics from an analytical viewpoint.

[[ That seems to be a good idea. Let me look at it. It so happens that 1989/90 is also the beginning point of the careers of two of the greatest batsmen ever known, two batsmen whose average height is barely 5' 6" !!! Ananth: ]]Yes, an estimate using n will be off a little bit when you have a small number of observations -- hence n-1 is traditionally used. However, when you have many observations, using n instead of n-1 actually has smaller error in approximating the true variance.

Anyway, like you mentioned, it doesn't affect the result of your calculation much :). Thanks for the great article!

[[ Aneesh You are correct. I also noted that the impact of this change was more pronounced for the two Indian grounds (only 9 matches each) as compared to the other grounds. Ananth: ]]Hi Ananth, I noticed some commenters complaining that you should have used (n-1) in computing standard deviation.

In fact, both n and (n-1) are perfectly acceptable to use. They will give similar estimates of the standard deviation. Using n-1 will give an unbiased estimator of the variance (variance = std-dev^2 ) that has a larger error, whereas using n will give a slightly biased estimator of variance with a smaller error.

[[ Aneesh I got the same feeling when I re-read some of the web-articles on SD. However I changed from n to n-1 because of the following example. Say the values are 1, 2, 3. Using n-1 gives a SD of 1 which seems to be the correct SD. Using n gives a value of around .81 or so. Ananth: ]]Technical, but a superb bit of work. Well done. The value of this analysis is not as a standalone. This could be developed further in two aspects. 1. A comparison of grounds will give you the expectation from it (further analysis can be done on the time of the season for the matches, which can have a dramatic effect on the expected result in some grounds). This can give you a far better expectation for first innings totals then just a plain average. 2. More importantly and returning to the batting analysis, you can cross-reference the ground result with the batting results, thus making the value of the inning (a much debated topic in previous articles) more meaningful. A high individual score on a low scoring ground with a low standard deviation should be considered a higher value inning. Note that using only the last 19 years, when team cycles are around 8 years in length means the values are tilted for (or against) certain teams.

[[ Don It is ironic that I am more confident of doing an estimate of the expected score using this analysis than evaluating an innings which has already been played. My estimate of a team cycle is probably just above 10 years. So this covers two cycles. I feel a factor based on upto 20 years can comfortably be used as a measure to analyze batting/bowling. Ananth: ]]Thanks for the quick action, Ananth.

A minor quibble. Once you have updated the analysis, please also update the figures attached. Especially Figure 1.

[[ Ambuj Fig 1 is a jpg file extracted from Wikipedia. I have to extract an alternate formula and send to Cricinfo for correction. Will do that soon. Hence I have left it as it is and corrected the text, you will note. Ananth: ]]As suggested by Muzher and Ambuj, I have changed the SD divisor from n to n-1 and updated the article. Fortunately the changes are minimal and do not change the observations. My thanks to both of them.

I agree with Muzher. You should have used (n-1) for Standard Deviation. The difference between using (n-1) and (n) is simple. If the mean is independently known, then (n) is used. If mean is calculated from the data set, then (n-1) is used. In this case, you have calculated mean from the data set itself, hence (n-1) should have been used. Please confirm from your mathematician friends if you still have doubt.

[[ Ambuj/Muzher I get your points. Unfortunately the technical terms are confusing even for a mathematics (but not statistics) student like me. Both Wolfram and Wikipedia give both alternatives without being specific. Over the next few hours I will correct the base to n-1 and update the article. Luckily the changes are likely to be minimal. Thanks for the nice way in which both of you have pointed this out. Ananth: ]]Interesting analysis, which led me to go a bit further. It struck me that some of the difference in mean scores on a ground will be because of the difference in relative strengths of the most likely teams to play there. So the Oval, Lords, WACA at the top of the list might be there because better teams play more games there.

So, I carried out an analysis that takes into account of which team is the batting team in first innings, and then look at the difference in mean scores for each ground. This makes quite a bit of difference. So for instance National Stadium Dhaka's adjusted mean is 330 compared with the unadjusted 223. This is because a weaker team bats there more often. The other thing I did was to look at the statistical significance of differences. There are hardly any, adjusting for the the batting team; the Oval is a bit higher scoring than the rest. So most of the difference in means between grounds is either due to chance or due to the teams that play at each most regularly

[[ You are correct. There is almost no chance of Oval hosting a non-England game although we may very well see a Pakistan-Australia game there. So the single team characteristics are inherently embedded in the numbers. But my feeling is that, barring Bangladesh and to a lesser extent West Indies (except over the past 3 days), the other teams, over the years are comparable. Ananth: ]]Umm. WACA ahead of the Adelaide Oval as a premier Test Ground? The WACA has only hosted 36 Tests since its first in 1970, while the Adelaide Oval has had 67 matches since its first in 1884. Even the Gabba has hosted more cricket than the WACA, with 51 matches since 1931.

[[ Stu, I have not gone just on the number of tests played. Variety was the criterion. I can always do a second analysis of the missing grounds. Ananth: ]]Standard deviation should use divisor (n - 1) not n. Otherwise this is an interesting analysis.

[[ Muzher I saw n-1 only in very few resource areas. Both Wikiperdia., Wolfram and my own books suggested n. Ananth: ]]Thanks for looking into the characteristics of different cricket grounds around the world, it is indeed interesting. However, I feel for the analysis of a given venue to be meaningful the quality of cricket played by the home side that always plays there has to be accounted for. For example, when I see that the 7 grounds with highest mean scores are in England, Australia, and one in India, I am unsure to what extent the high scoring is due to the grounds themselves and to what extent it is due to the teams that play there regularly. Just how you would achieve a better analysis with so few tests played at each ground I don't know. But what you've done is a good starting point, thanks.

I would have liked Mohali or Bangalore Included in the analysis since Calcutta on recent times has not hosted many .. But the results are acceptable .. the analysis needs to be lauded and the pitch analysis can be done with this data ..

[[ It so happens that Mohali and Bangalore have also hosted 9 tests each during this period. So these two could as well have been considered. Ananth: ]]No featured comments at the moment.

I would have liked Mohali or Bangalore Included in the analysis since Calcutta on recent times has not hosted many .. But the results are acceptable .. the analysis needs to be lauded and the pitch analysis can be done with this data ..

[[ It so happens that Mohali and Bangalore have also hosted 9 tests each during this period. So these two could as well have been considered. Ananth: ]]Thanks for looking into the characteristics of different cricket grounds around the world, it is indeed interesting. However, I feel for the analysis of a given venue to be meaningful the quality of cricket played by the home side that always plays there has to be accounted for. For example, when I see that the 7 grounds with highest mean scores are in England, Australia, and one in India, I am unsure to what extent the high scoring is due to the grounds themselves and to what extent it is due to the teams that play there regularly. Just how you would achieve a better analysis with so few tests played at each ground I don't know. But what you've done is a good starting point, thanks.

Standard deviation should use divisor (n - 1) not n. Otherwise this is an interesting analysis.

[[ Muzher I saw n-1 only in very few resource areas. Both Wikiperdia., Wolfram and my own books suggested n. Ananth: ]]Umm. WACA ahead of the Adelaide Oval as a premier Test Ground? The WACA has only hosted 36 Tests since its first in 1970, while the Adelaide Oval has had 67 matches since its first in 1884. Even the Gabba has hosted more cricket than the WACA, with 51 matches since 1931.

[[ Stu, I have not gone just on the number of tests played. Variety was the criterion. I can always do a second analysis of the missing grounds. Ananth: ]]Interesting analysis, which led me to go a bit further. It struck me that some of the difference in mean scores on a ground will be because of the difference in relative strengths of the most likely teams to play there. So the Oval, Lords, WACA at the top of the list might be there because better teams play more games there.

So, I carried out an analysis that takes into account of which team is the batting team in first innings, and then look at the difference in mean scores for each ground. This makes quite a bit of difference. So for instance National Stadium Dhaka's adjusted mean is 330 compared with the unadjusted 223. This is because a weaker team bats there more often. The other thing I did was to look at the statistical significance of differences. There are hardly any, adjusting for the the batting team; the Oval is a bit higher scoring than the rest. So most of the difference in means between grounds is either due to chance or due to the teams that play at each most regularly

[[ You are correct. There is almost no chance of Oval hosting a non-England game although we may very well see a Pakistan-Australia game there. So the single team characteristics are inherently embedded in the numbers. But my feeling is that, barring Bangladesh and to a lesser extent West Indies (except over the past 3 days), the other teams, over the years are comparable. Ananth: ]]I agree with Muzher. You should have used (n-1) for Standard Deviation. The difference between using (n-1) and (n) is simple. If the mean is independently known, then (n) is used. If mean is calculated from the data set, then (n-1) is used. In this case, you have calculated mean from the data set itself, hence (n-1) should have been used. Please confirm from your mathematician friends if you still have doubt.

[[ Ambuj/Muzher I get your points. Unfortunately the technical terms are confusing even for a mathematics (but not statistics) student like me. Both Wolfram and Wikipedia give both alternatives without being specific. Over the next few hours I will correct the base to n-1 and update the article. Luckily the changes are likely to be minimal. Thanks for the nice way in which both of you have pointed this out. Ananth: ]]As suggested by Muzher and Ambuj, I have changed the SD divisor from n to n-1 and updated the article. Fortunately the changes are minimal and do not change the observations. My thanks to both of them.

Thanks for the quick action, Ananth.

A minor quibble. Once you have updated the analysis, please also update the figures attached. Especially Figure 1.

[[ Ambuj Fig 1 is a jpg file extracted from Wikipedia. I have to extract an alternate formula and send to Cricinfo for correction. Will do that soon. Hence I have left it as it is and corrected the text, you will note. Ananth: ]]Technical, but a superb bit of work. Well done. The value of this analysis is not as a standalone. This could be developed further in two aspects. 1. A comparison of grounds will give you the expectation from it (further analysis can be done on the time of the season for the matches, which can have a dramatic effect on the expected result in some grounds). This can give you a far better expectation for first innings totals then just a plain average. 2. More importantly and returning to the batting analysis, you can cross-reference the ground result with the batting results, thus making the value of the inning (a much debated topic in previous articles) more meaningful. A high individual score on a low scoring ground with a low standard deviation should be considered a higher value inning. Note that using only the last 19 years, when team cycles are around 8 years in length means the values are tilted for (or against) certain teams.

[[ Don It is ironic that I am more confident of doing an estimate of the expected score using this analysis than evaluating an innings which has already been played. My estimate of a team cycle is probably just above 10 years. So this covers two cycles. I feel a factor based on upto 20 years can comfortably be used as a measure to analyze batting/bowling. Ananth: ]]Hi Ananth, I noticed some commenters complaining that you should have used (n-1) in computing standard deviation.

In fact, both n and (n-1) are perfectly acceptable to use. They will give similar estimates of the standard deviation. Using n-1 will give an unbiased estimator of the variance (variance = std-dev^2 ) that has a larger error, whereas using n will give a slightly biased estimator of variance with a smaller error.

[[ Aneesh I got the same feeling when I re-read some of the web-articles on SD. However I changed from n to n-1 because of the following example. Say the values are 1, 2, 3. Using n-1 gives a SD of 1 which seems to be the correct SD. Using n gives a value of around .81 or so. Ananth: ]]