May 13, 2010

The anomalous contraction of the Duckworth-Lewis method

Rajeeva Karandikar and Srinivas Bhogle
The D/L method as applied to Twenty20 is a sawed-off version of the system governing ODIs. There's a need to adapt the method for the short form based on actual Twenty20 match data

On May 3, 2010, there were two curious applications of the Duckworth-Lewis method in the ICC World Twenty20. West Indies defeated England even though they only scored 60 for 2 in 6 six overs in reply to England's 191 for 5, and Sri Lanka came perilously close to being eliminated because Zimbabwe were only required to score 44 in five overs to defeat Sri Lanka's 173 for 7.

There was outrage at the D/L targets and also surprise. The D/L method is now proven in 50-over matches, so why was it giving wonky targets in Twenty20 matches? Clearly it was because the ICC was trying to fit a model designed for 50-over matches to 20-over games. The fit wasn't working. The trousers were too big.

What can a young lad do if he's forced to wear his dad's trousers? Essentially one of two things: cut off the long legs and walk around pretending the trousers now fit, or put the trousers in a washing machine and hope they shrink sufficiently.

The ICC has so far chosen the first method. It's surprising we didn't have situations like the ones on May 3 earlier. Our little calculation, which we will explain as we go along, suggests that the shrunken trousers might have done a better job that day.

The Duckworth-Lewis rationale
Till D/L came along, ODI targets for the team chasing were only based on overs remaining: if the team batting first scored 250 in 50 overs and then rain washed out 25 overs, the team chasing only had to score 125+1 = 126 to win, with all 10 wickets in hand, which was obviously unfair.

New variants were therefore tried, including the much-maligned "most-productive overs" (MPO) idea. MPO isn't such a bad idea if the interruption occurs between the two innings, but it wasn't capable of handling one or more during-the-innings interruptions. The horror of a South Africa target of 22 runs in 13 balls turning into the impossible 21 runs off one ball was an extreme manifestation of this inability.

The D/L method cleverly combined overs remaining and wickets in hand into a single combined resource. It wasn't easy to model this complex interplay between the two resources, but Duckworth and Lewis did it quite brilliantly. Better still, their method solved the vexing problem of during-the-innings interruptions.

Over the years the D/L method has become even better, especially after the Professional Edition, which required the use of a computer, was introduced. It would be fair to say that the method now resolves interruptions in 50-over matches almost perfectly.

The D/L method is well explained in the first graph to the right. When the team chasing begins its innings, it is like an ant sitting at the top left corner of the graph. There are 50 overs remaining and the "combined resource" available is the full 100%.

There are 10 curves in the graph, corresponding, to the number of wickets remaining. At the start of the innings our ant is at the topmost point of the top curve, corresponding to "10 wickets remaining". After every ball is bowled, the ant moves a step to the right along the curve. If a wicket falls, the ant drops vertically to the curve below.

Now, suppose 20 overs have been bowled and two wickets have been lost. The ant will be in line with "30 overs remaining", and on the green curve, corresponding to "8 wickets remaining". A visual guess suggests that about 65% of the combined resource is still available. If, instead, five wickets are down at this stage, the ant would be on the orange curve and the combined resource would only be 45%.

Finally, imagine that 20 overs are lost at this stage due to rain. What would our ant do? Well, it would simply trot down the green (or orange) curve and stop at the point corresponding to 10 overs remaining. The combined resource now available would be about 30% for the green line and about 25% for the orange line.

So at every stage of the match we know exactly how much of the combined resource percentage is still available. Let's call this R2. And let R1 be the combined resource that was available to the team batting first (would be 100% if the 50 overs are completed or all 10 wickets are lost). Let S be the score of the team batting first. The D/L target and the par score are then calculated by playing around with R1, R2 and S.

If we stare a little more at these curves, we'll find that D/L has been rather clever. Look at the higher curves (corresponding to 10, 9 or 8 wickets remaining). They aren't coming down too quickly to start with, because wickets in hand tend to be more valuable in the early part of the chase, but towards the end all the curves seem to fuse into one single, fat curve, because as the match nears its end, overs and balls remaining are far more valuable than wickets in hand.

The complete evolution of a limited-overs match can therefore be gleaned by looking at the entire span of the curves. That's how dad's trousers look.

D/L for Twenty20 by cutting the trousers
Let's now see what we are doing when we try to employ D/L for Twenty20. We're cutting off our full curves and only retaining the part to the right of "20 overs remaining", i.e. pretending that a Twenty20 is simply an ODI reduced to 20 overs a side. This is a convenient assumption, but it may not be entirely valid. How many ODIs can we recall in which not a single wicket has fallen after 30 overs? And what do we do about the field restrictions in the first six overs of a Twenty20?

To get an idea of how things change when we look at D/L curves only to the right of "20 overs remaining", compare the second graph with the first.

These curves are a lot flatter, and when only five or six overs remain, the top six or seven curves collapse to become almost a diagonal. It is as if D/L is saying, "I don't care about wickets in hand at this stage, my targets only depend on overs remaining." In other words, we're almost back to the pre-D/L days of run rate-based targets, even if we go through the motions of using D/L.

In the England-West Indies match. England scored at 9.55 runs per over. Using a simple run rate-based target, West Indies only needed to score 9.55*6 = 58. The D/L target, if the match had been reduced to six overs at the start of the innings, would have be 66, i.e. a run rate of 11 per over. An increase of 66 - 58 = 8 runs just can't offset the very considerable advantage that West Indies would have enjoyed: of having all 10 batsmen, favourable field restrictions for two of the six overs, and a hard ball.

However, the rain interruption didn't occur at the innings break. West Indies came out to chase believing they would play their full 20 overs. After a 14-ball blitzkrieg, in which they scored 30 runs without losing a single wicket, there was a spell of rain and the match was curtailed to six overs. This very severe "during-the-innings" interruption further hurt D/L and gave West Indies a massive advantage: they now had to get six runs fewer, i.e. 60 in six overs!

West Indies further helped their cause by not losing a single wicket before the interruption. If they had been, say, 30 for 2 instead, the D/L target would have been 71. All the good fairies had apparently come together to bless West Indies and damn D/L.

If the Sri Lanka-Zimbabwe match had been reduced to five overs at the start of the innings, the D/L target for Zimbabwe would have been 52, i.e. a run rate of 10.4 to counter Sri Lanka's run rate of 8.65. This again seems excessively generous, especially with two overs of field restrictions.

That match, too, went awry. It rained after Zimbabwe had scored 4 without loss after the first over. They returned to bat imagining they had about 100 more to get in 10 more overs. Sadly, they failed to stay ahead of the par score; if Zimbabwe had scored 40 in the next four overs with just the loss of that one wicket, they would have won a thoroughly undeserved victory.

These examples provide further proof that D/L is in deep trouble if the overs come down to just five or six in a Twenty20, though it is noteworthy that even with a minimum of 10 overs, the D/L model regains reasonable control.

D/L for Twenty20 by shrinking the trousers
Imagine for a moment that we had to describe Twenty20 cricket to someone who woke up after a ten-year slumber. Would we say that Twenty20 is "like the last 20 overs of an ODI", or would we say that it is "like a complete ODI in which everything happens much, much faster"?

We rather fancy the latter. Why not, then, assume that the complete D/L curves, designed for 50-over matches, also adequately depict the evolution of a Twenty20 match?

We fiddled around with the over-by-over D/L standard tables (available in the public domain) to do exactly that. Here's (third graph) how the curves appear now, with the recalibrated combined resources.

How would the England-West Indies match have panned out using this "shrunken trousers" model for D/L? Our calculations indicate that West Indies would have needed to score 69 in 6 overs if the match had evolved in exactly the same fashion with that interruption at 30 for no loss after 14 balls.

A target of 69 certainly appears more reasonable, but what if the interruption had occurred between the innings and West Indies knew from the start that they only had six overs to bat? The target then would have been 87, or 14.4 runs per over, with only two overs of field restrictions. This appears steep, but we mustn't forget that 191 too is a lot of runs.

If Zimbabwe knew from the start that they could only bat for five overs in reply to Sri Lanka's 173, their target to win would have been 68, at 13.5 runs per over, with two overs of field restrictions. But given the way the match actually panned out, with an interruption at 4 without loss after the first over, Zimbabwe's target using this shrinking model would have been 60 in five. This must appear much more reassuring than a mere 44.

Send for the tailor
The shrunken-trousers model certainly appears to give more satisfactory results than the cut-trousers one for the two matches on May 3, but we would need to look at many more examples before we can recommend the former with any degree of authority; there is a lurking fear that it may set very high targets, especially if the interruption occurs between innings.

There is also the problem of field restrictions in the first six overs. D/L has never accommodated this batting phase into its mix. The D/L explanation is that the more adventurous batting during the field-restriction phase is compensated by the greater propensity to lose wickets. This explanation seems valid in 50-over matches, where the loss of a wicket significantly reduces the combined resource percentage. But in Twenty20, especially with the cut-trousers model practiced currently, losing a wicket brings down the combined resource percentage by much less, if at all, and it is much less likely to inhibit adventurous strokeplay. Given the nature of Twenty20, and the sort of audience they attract, it may be worthwhile to retain field restrictions during the first six overs at all times, whatever the state of the match!

All these fixes are only for the short term. There is a compelling and urgent need to redraw D/L-like curves for Twenty20 based on actual Twenty20 match data. The IPL repository itself contains about 175 matches, and there must be at least 100 match-data sets from international games. We feel certain that D/L can come up with these new curves, and if they don't feel so inclined, we would be happy to participate in a parallel initiative.

Rajeeva Karandikar is a professor at Chennai Mathematical Institute. Srinivas Bhogle heads the Bangalore centre of TEOCO Software Pvt Ltd. The authors would like to thank K Vijay for his suggestions via Twitter

Comments have now been closed for this article

  • Tim on May 16, 2010, 19:31 GMT

    It does seem unfair I would also say that for a ODI match to count there should be a least 8 overs bowled as a 5 over match is a bit unfair. In 50 over cricket there has to be 20 overs a side to count for a match which is 40% of the total number of overs.

  • Pelham on May 16, 2010, 15:29 GMT

    Going back to the original point of this article, there seems to be a misapprehension that the exisitng tables were designed for 50 over cricket only. The original 1997 rules, which I still have, gave separate tables for different lengths of innings, but these were based on scaled versions of the same curves, just changing the definition of 100% resource. On the first revision, these were combined into a single table of 50 overs (the maximum now needed), but this was always intended for any match length up to 50 overs. This is why the values are defined in terms of "overs left". D/L have repeatedly checked the effect of powerplays and found no need to alter the tables for them. Therefore, there is no need to "pretend" that a 20 over innings is like the last part of a 50 over innings: simple logic will tell you that the scoring possibilities are the same. I simply do not understand the logic behind the "shrunken trousers" model, and in any case the data do not fit that model.

  • David on May 16, 2010, 11:45 GMT

    @Pelham_Barton: I like the uniform idea as an analogy. I still disagree that Rule 2 is unfair: I think that in your example Rule 1 is unfair on the fielding side. I think that the target increasing with runs scored sounds counter-intuitive, but that it's the right way to go. Look at it this way: what's actually the only relevant statistic in a run chase? It's the runs required, not the runs scored. You can be 200-2 after 30 overs but if you're chasing 600 you're still not doing very well. So any argument based on the runs scored is going to lead to some unhappy conclusions. In your example, the game has been reduced to 80 to win off 20 overs. Any other numbers are irrelevant: 80 off 20 is now "the game" in its entirety. So this should be reduced after the rain to 40 to win off 10 overs. (Agreed, a team on 320 chasing 400 is more likely to win than a team on 120 chasing 200, but to exactly the same extent that they're more likely to get to 360 than the team on 120 is to get to 160).

  • David on May 16, 2010, 10:34 GMT

    @hattima: Yep, that's the right formula - it's the same as the one in my 20:36 post, because "Required runs after resumption" = New Target - Runs Scored. Under the current D/L method WI at 2-0 go from needing 190 off 106 to 59 off 22; at 60-0 they go from 132 off 106 to 1 off 22. With mine, 190 off 106 becomes 48 off 22 and 132 off 106 becomes 33 off 22. It's worth pointing out that this is what the resources formula says is the "equivalent" run chase, and there's plenty of statistical analysis to back it up. The issue of *exploitability* that you raise is a separate, and valid one. You certainly can't build "whether the batting captain thinks it's going to rain" into the model, just as you can't factor in team strength, pitch conditions etc. It's true that D/L, in assuming teams don't know it's going to rain, becomes more exploitable in twenty20, but this is mostly due to the problems we've already outlined. Preserve probabilities, and it becomes much less of an issue.

  • Pelham on May 16, 2010, 10:05 GMT

    For anyone who still has an open mind, the following may help. Suppose (never mind how) we could redesign cricket so that teams could reasonably be expected to score at a uniform rate through an innings and that Team1 has scored 250 off 50 overs. If Team2's innings is reduced to 40 overs before the start, we would all agree they should be set 200 to tie, 201 to win. If the match was ended by its first interruption after 40 overs of Team2's innings, we would also agree to apply the par score of 200 to tie. If Team2 had reached 170 off 30 overs and then 10 overs are lost, we could have (Rule1) maintain the expected scoring rate across the whole of Team2's innings: still 200 to tie, or (Rule2) maintain the rate now required by Team2 at 4rpo: 210 to tie. Rule1 is unfair because Team2's asking rate has come down from 4rpo to 3rpo: Rule2 is unfair because Team2's target has gone up by 10 runs as a result of their own success. You have to choose between them. D/L is analogous to Rule1.

  • Apratim on May 16, 2010, 8:04 GMT

    @David, I see your point regarding the formula, what I had in mind was to decrease the required runs according to the resources, so it should have been: Required runs after resumption = (Required runs before resumption)*(resources in hand after resumption)/ (resources in hand before resumption). It was a mistake not to divide by the resources as it would not have stayed 100%. But, I still think preserving the" probability" is not feasible: the formula you gave would help teams like ZIM in the SL match, or even worse, teams like Afganisthan in the AUS match. Teams always know the weather forecasts, and play accordingly. With D/L there is still the need to score at a good rate to achieve what WI did, and that involves risk. But with your formula, they'd preserve wickets, score, say, 5-6 runs in 4-5 overs; and would still have a chance to go broke for the 17-18 runs in 1 over after the break. Work out the target score of WI in 6 assuming they were 2-0 in 2.2 and you'll see my point.

  • David on May 15, 2010, 19:50 GMT

    @hattima: That formula can't work, because the target gets bigger as the break progresses (as "unused resources after break" gets smaller). Also you need it to be true that for a degenerate 0-over break, you get New Target = Old Target. So you could have New Target = Runs Scored + (1-X+Y) * Runs Required, where X is unused resources at break, Y is unused resources after break, but this is equivalent to my formula (I divided by total resources because by convention, total resources = 100% only for a 50-over match), which doesn't work because as Y tends to 0, the runs required tends to (1-X)*(Runs required at break) and it needs to tend to 0. The system you propose is pretty much what we have at the moment, but I'm arguing it's flawed: how's it fair for a team with a 60% chance of winning at the break to have a 99% chance of winning after it? The fact that the match might have been terminated is irrelevant once it resumes. You want to give them "about as hard a chase as they had before".

  • Apratim on May 15, 2010, 13:58 GMT

    @David, what I had in my mind is: New Target=Runs Scored+(unused resources at break - unused resources after break)* (Run required at the time of interruption). The total resources should be 100% and hence no division needed in the formula. So it is closer to your original formula than what you have written above. However, I do not fully like the "prob preserving idea" (assuming it's achievable). Suppose a team would win at the break if no further play is possible. So their winning chance is 1. Now if just 1 ball can be bowled after the resumption, by D/L system in most cases they'd still win. But according to your system the winning prob will have to be reduced to, say, 0.6. So, D/L would reward heavily the team who anticipates the break and you'd penalize them. I'd want a more continuous system, which should change their prob of win to, say, 0.99, and gradually reduce 0.6 as more and more balls are left. It seems fairer to me to give some credit to a team playing the situation.

  • Pelham on May 15, 2010, 12:53 GMT

    @Anupam Mathur (and lucyferr): I agree with you that part of this article contains as clear an explanation of how D/L works as any that I have ever seen. However, as has been pointed out by several people (including me) in earlier comments, D/L have already done the analysis that the article's authors are asking for. Far from reaching weird conclusions, their analysis indicates that T20 does behave exactly like the last 20 overs of a 50 over match amd so there is no need for a separate set of tables.

  • David on May 15, 2010, 11:34 GMT

    @hattima - As for using the difference instead of the ratio, the only formula that makes sense to do that would have to be New Target = Old Target - {[(unused resources before break - unused resources after break)/(total resources)] * Runs required}, since if you have a degenerate 0-over break then the target has to stay the same. I don't think this preserves probabilities, though: in the case where Zim are 5-1 after 5 overs, it sets a new target of 57. The problem is that if you have x% of your total resources unused before the break, and there's not long left after the break, then you basically just times the runs required by about x%: this suffers from a similar problem to the current method. A mathematician would say that as "unused resources" tends to 0, you want "runs required" to tend to 0, but it doesn't, it tends to "runs required before the break" * x%.

  • No featured comments at the moment.