May 13, 2010

The anomalous contraction of the Duckworth-Lewis method

Rajeeva Karandikar and Srinivas Bhogle
The D/L method as applied to Twenty20 is a sawed-off version of the system governing ODIs. There's a need to adapt the method for the short form based on actual Twenty20 match data

On May 3, 2010, there were two curious applications of the Duckworth-Lewis method in the ICC World Twenty20. West Indies defeated England even though they only scored 60 for 2 in 6 six overs in reply to England's 191 for 5, and Sri Lanka came perilously close to being eliminated because Zimbabwe were only required to score 44 in five overs to defeat Sri Lanka's 173 for 7.

There was outrage at the D/L targets and also surprise. The D/L method is now proven in 50-over matches, so why was it giving wonky targets in Twenty20 matches? Clearly it was because the ICC was trying to fit a model designed for 50-over matches to 20-over games. The fit wasn't working. The trousers were too big.

What can a young lad do if he's forced to wear his dad's trousers? Essentially one of two things: cut off the long legs and walk around pretending the trousers now fit, or put the trousers in a washing machine and hope they shrink sufficiently.

The ICC has so far chosen the first method. It's surprising we didn't have situations like the ones on May 3 earlier. Our little calculation, which we will explain as we go along, suggests that the shrunken trousers might have done a better job that day.

The Duckworth-Lewis rationale
Till D/L came along, ODI targets for the team chasing were only based on overs remaining: if the team batting first scored 250 in 50 overs and then rain washed out 25 overs, the team chasing only had to score 125+1 = 126 to win, with all 10 wickets in hand, which was obviously unfair.

New variants were therefore tried, including the much-maligned "most-productive overs" (MPO) idea. MPO isn't such a bad idea if the interruption occurs between the two innings, but it wasn't capable of handling one or more during-the-innings interruptions. The horror of a South Africa target of 22 runs in 13 balls turning into the impossible 21 runs off one ball was an extreme manifestation of this inability.

The D/L method cleverly combined overs remaining and wickets in hand into a single combined resource. It wasn't easy to model this complex interplay between the two resources, but Duckworth and Lewis did it quite brilliantly. Better still, their method solved the vexing problem of during-the-innings interruptions.

Over the years the D/L method has become even better, especially after the Professional Edition, which required the use of a computer, was introduced. It would be fair to say that the method now resolves interruptions in 50-over matches almost perfectly.

The D/L method is well explained in the first graph to the right. When the team chasing begins its innings, it is like an ant sitting at the top left corner of the graph. There are 50 overs remaining and the "combined resource" available is the full 100%.

There are 10 curves in the graph, corresponding, to the number of wickets remaining. At the start of the innings our ant is at the topmost point of the top curve, corresponding to "10 wickets remaining". After every ball is bowled, the ant moves a step to the right along the curve. If a wicket falls, the ant drops vertically to the curve below.

Now, suppose 20 overs have been bowled and two wickets have been lost. The ant will be in line with "30 overs remaining", and on the green curve, corresponding to "8 wickets remaining". A visual guess suggests that about 65% of the combined resource is still available. If, instead, five wickets are down at this stage, the ant would be on the orange curve and the combined resource would only be 45%.

Finally, imagine that 20 overs are lost at this stage due to rain. What would our ant do? Well, it would simply trot down the green (or orange) curve and stop at the point corresponding to 10 overs remaining. The combined resource now available would be about 30% for the green line and about 25% for the orange line.

So at every stage of the match we know exactly how much of the combined resource percentage is still available. Let's call this R2. And let R1 be the combined resource that was available to the team batting first (would be 100% if the 50 overs are completed or all 10 wickets are lost). Let S be the score of the team batting first. The D/L target and the par score are then calculated by playing around with R1, R2 and S.

If we stare a little more at these curves, we'll find that D/L has been rather clever. Look at the higher curves (corresponding to 10, 9 or 8 wickets remaining). They aren't coming down too quickly to start with, because wickets in hand tend to be more valuable in the early part of the chase, but towards the end all the curves seem to fuse into one single, fat curve, because as the match nears its end, overs and balls remaining are far more valuable than wickets in hand.

The complete evolution of a limited-overs match can therefore be gleaned by looking at the entire span of the curves. That's how dad's trousers look.

D/L for Twenty20 by cutting the trousers
Let's now see what we are doing when we try to employ D/L for Twenty20. We're cutting off our full curves and only retaining the part to the right of "20 overs remaining", i.e. pretending that a Twenty20 is simply an ODI reduced to 20 overs a side. This is a convenient assumption, but it may not be entirely valid. How many ODIs can we recall in which not a single wicket has fallen after 30 overs? And what do we do about the field restrictions in the first six overs of a Twenty20?

To get an idea of how things change when we look at D/L curves only to the right of "20 overs remaining", compare the second graph with the first.

These curves are a lot flatter, and when only five or six overs remain, the top six or seven curves collapse to become almost a diagonal. It is as if D/L is saying, "I don't care about wickets in hand at this stage, my targets only depend on overs remaining." In other words, we're almost back to the pre-D/L days of run rate-based targets, even if we go through the motions of using D/L.

In the England-West Indies match. England scored at 9.55 runs per over. Using a simple run rate-based target, West Indies only needed to score 9.55*6 = 58. The D/L target, if the match had been reduced to six overs at the start of the innings, would have be 66, i.e. a run rate of 11 per over. An increase of 66 - 58 = 8 runs just can't offset the very considerable advantage that West Indies would have enjoyed: of having all 10 batsmen, favourable field restrictions for two of the six overs, and a hard ball.

However, the rain interruption didn't occur at the innings break. West Indies came out to chase believing they would play their full 20 overs. After a 14-ball blitzkrieg, in which they scored 30 runs without losing a single wicket, there was a spell of rain and the match was curtailed to six overs. This very severe "during-the-innings" interruption further hurt D/L and gave West Indies a massive advantage: they now had to get six runs fewer, i.e. 60 in six overs!

West Indies further helped their cause by not losing a single wicket before the interruption. If they had been, say, 30 for 2 instead, the D/L target would have been 71. All the good fairies had apparently come together to bless West Indies and damn D/L.

If the Sri Lanka-Zimbabwe match had been reduced to five overs at the start of the innings, the D/L target for Zimbabwe would have been 52, i.e. a run rate of 10.4 to counter Sri Lanka's run rate of 8.65. This again seems excessively generous, especially with two overs of field restrictions.

That match, too, went awry. It rained after Zimbabwe had scored 4 without loss after the first over. They returned to bat imagining they had about 100 more to get in 10 more overs. Sadly, they failed to stay ahead of the par score; if Zimbabwe had scored 40 in the next four overs with just the loss of that one wicket, they would have won a thoroughly undeserved victory.

These examples provide further proof that D/L is in deep trouble if the overs come down to just five or six in a Twenty20, though it is noteworthy that even with a minimum of 10 overs, the D/L model regains reasonable control.

D/L for Twenty20 by shrinking the trousers
Imagine for a moment that we had to describe Twenty20 cricket to someone who woke up after a ten-year slumber. Would we say that Twenty20 is "like the last 20 overs of an ODI", or would we say that it is "like a complete ODI in which everything happens much, much faster"?

We rather fancy the latter. Why not, then, assume that the complete D/L curves, designed for 50-over matches, also adequately depict the evolution of a Twenty20 match?

We fiddled around with the over-by-over D/L standard tables (available in the public domain) to do exactly that. Here's (third graph) how the curves appear now, with the recalibrated combined resources.

How would the England-West Indies match have panned out using this "shrunken trousers" model for D/L? Our calculations indicate that West Indies would have needed to score 69 in 6 overs if the match had evolved in exactly the same fashion with that interruption at 30 for no loss after 14 balls.

A target of 69 certainly appears more reasonable, but what if the interruption had occurred between the innings and West Indies knew from the start that they only had six overs to bat? The target then would have been 87, or 14.4 runs per over, with only two overs of field restrictions. This appears steep, but we mustn't forget that 191 too is a lot of runs.

If Zimbabwe knew from the start that they could only bat for five overs in reply to Sri Lanka's 173, their target to win would have been 68, at 13.5 runs per over, with two overs of field restrictions. But given the way the match actually panned out, with an interruption at 4 without loss after the first over, Zimbabwe's target using this shrinking model would have been 60 in five. This must appear much more reassuring than a mere 44.

Send for the tailor
The shrunken-trousers model certainly appears to give more satisfactory results than the cut-trousers one for the two matches on May 3, but we would need to look at many more examples before we can recommend the former with any degree of authority; there is a lurking fear that it may set very high targets, especially if the interruption occurs between innings.

There is also the problem of field restrictions in the first six overs. D/L has never accommodated this batting phase into its mix. The D/L explanation is that the more adventurous batting during the field-restriction phase is compensated by the greater propensity to lose wickets. This explanation seems valid in 50-over matches, where the loss of a wicket significantly reduces the combined resource percentage. But in Twenty20, especially with the cut-trousers model practiced currently, losing a wicket brings down the combined resource percentage by much less, if at all, and it is much less likely to inhibit adventurous strokeplay. Given the nature of Twenty20, and the sort of audience they attract, it may be worthwhile to retain field restrictions during the first six overs at all times, whatever the state of the match!

All these fixes are only for the short term. There is a compelling and urgent need to redraw D/L-like curves for Twenty20 based on actual Twenty20 match data. The IPL repository itself contains about 175 matches, and there must be at least 100 match-data sets from international games. We feel certain that D/L can come up with these new curves, and if they don't feel so inclined, we would be happy to participate in a parallel initiative.

Rajeeva Karandikar is a professor at Chennai Mathematical Institute. Srinivas Bhogle heads the Bangalore centre of TEOCO Software Pvt Ltd. The authors would like to thank K Vijay for his suggestions via Twitter

Comments