An Analysts' Game accidentally invented by Sportsmen
Cricket, according to the Indian social philosopher Ashis Nandy, is an Indian game accidentally invented by the British. One could equally call it an analysts’ game accidentally invented by sportsmen. The game is, and has always been, about statistics in a way that no other sport can even come close to (even baseball doesn’t have as many variables).
The statistical approach is reflected in the activities around every game of cricket. Literally every ball is recorded to see whether a run is scored, or a wicket is taken. And, this happens, and has always happened, at every level of the game, whether it’s an international, a top league match at first-class or club level, or whether it’s the Under 11 C’s playing in a match with a couple of parents watching. It’s exactly the same scoring system, and the same template will be used to record the information.
If there were any doubt as to how scientific cricket can be, for over 150 years, it has had an almanac dedicated to it - Wisden. Almanacs are generally reserved for scientific activities, whether it’s astronomical or meteorological information or calendars for the year, showing the phases of the moon or sunsets or celestial positions.
So, for a data-rich sport, it feels most appropriate to have an almanac. This has always recorded the facts, the statistics, the analyses of the game, and it represents the Bible for the great, and the not so great of the game. If your performance is recorded in Wisden, then it counts, and is recorded definitively for posterity – if you look hard enough in the 1994 Wisden, you will find my name!
It is informative that the record of a bowler’s performance is literally called the “bowling analysis”, or “bowling figures”, and cricketers will study those 4 numbers against a name and interpret the effectiveness of a performance in an instant. This speed and level of insight from a set of unremarkable numbers are what business organisations can only dream about from their analysts and key players.
I remember reading about an amazing performance by a famously accurate Indian spin bowler in an international match between India and England in Chennai in 1964. The bowling analysis read as follows:
Bapu Nadkarni: 32 – 27 – 5 – 0
To most people, that’s just a set of numbers. To a cricketer, that’s instantly identifiable as an astonishing, almost unbelievable, performance. Apparently, the batsmen said afterwards that they just couldn’t hit the ball past the first set of fielders.
So, cricket is the most data-rich sport, and much of it, as shown above, is recorded in a highly, and consistently, structured way. This should make it the sport most suited to predictive analysis. And, indeed, predictive analysis is currently used within the fabric of the game, at least at the highest level.
Rain causes havoc for cricket in a way that it does for no other sport. And so, it has always been an issue on how to deal with calculating what a fair score is when the elements cause a reduced-overs match.
The Duckworth-Lewis method (D/L for short) does sound satisfactorily like some sort of advanced mathematical technique – indeed, it is named after the two British mathematicians who created it. This is the calculation to establish a rain-affected target score. It has been in use for twenty years now. It is being tweaked on an ongoing basis to reflect different scenarios, and the changing nature of players with bigger bats being able to score more runs, so it’s an actively maintained example of predictive analytics.
I can’t think of any other sport where a predictive algorithm has been built into how the game is played. And it gives us an insight into how predictive analytics can work within an organisational framework, and one with lots of structured data.
So what learnings can insight managers take from the two decades of the D/L method?
Make sure it works!
Let’s start by saying that it’s generally accepted in cricketing circles this is the most accurate and fair way of creating a target score. So, as a predictive analysis programme, it has clearly proven its value. And that has to be the primary lesson – make sure it answers the question it was designed for.
Explain it to a non-techie
However, it is a cliché within cricket that no-one, with the exception of Profs Duckworth and Lewis, actually understand the D/L method. The factors that can influence the calculation – at what point in which innings the rain caused the interruption, for instance – mean that it’s a surprisingly technical equation.
Practically, that has meant the Duckworth-Lewis method is seldom used outside of international or first-class matches around the world. Despite the fact that the use case for it applies to all levels of the game, and is particularly in need in the UK due to the vagaries of the British summer, the calculation itself is not sufficiently understood in the wider cricket community for people to want to use it in their own local club matches.
This provides interesting lessons for insight managers. People find it hard to trust an algorithm. This is clearly a fairly irrational position to take, especially when the algorithm has clearly demonstrated its value. But, as any CDO or Head of Insight will tell you, trust in the available data is a major factor in driving take-up of insight and analytics systems. If people don’t fully understand the data and the calculated metrics that sit on top of them, they won’t use them as broadly as they could.
Work hard to find the right use case
Cricket is an industry with lots of highly structured data, and has an unusually data-literate set of “stakeholders”. However, despite this, beyond the D/L method, there have been few data-driven, predictive applications developed.
If one of the most data-literate sports is not taking advantage of its potential in this space, does this mean that there is something more problematic with predictive?
One of the reasons is around funding. Unless you are at the top level of the game, there is little money to invest in AI. Most cricket clubs scarcely have the money to pay for a sightscreen, let alone software. But, in truth, that’s the same with any organisation, and why investment in Machine Learning focuses on a few, big-ticket use cases. Don't expect you can solve every problem with ML techniques.
Just because you can use predictive techniques, it doesn't mean you will always get it right!
But also, despite the amount of data available to even your average cricketer, it’s not getting any more predictable.
• In the recent Champions Trophy held, the two strong favourites, England and India, were both beaten comprehensively - at the semi-final and final stages respectively – by Pakistan.
• Last year, in the final session of the last day of the season, the English County Championship was won by Middlesex, who beat Yorkshire in the “winner-takes-all” last match, after a batting collapse by the latter in the last afternoon. Had the match been drawn, a third county, Somerset, would have won the title.
• And the rise of the short-format game, T20, has led to increased public interest, but also an explosion of creative and powerful hitting by batsmen, and a corresponding evolution in bowling techniques to combat the batsmen.
So, at the very time when there are more statistical, scientific techniques to analyse performance, the game itself reminds us that just because we can use machine learning techniques does not necessarily make it easier to predict events with complex and multiple variables, especially those with humans being naturally competitive.