The Exploits of a Sports Analyst
The scene is a central London office, an arbitrary weekday. It’s overcast, so our protagonists are illuminated only by the sickly glow of the multiple monitors, flickering, reflected streams of data playing back and forth over the analysts’ steely-eyed visages. There’s insight to be found, goddammit, and they’re not going to let anything get in the way of unearthing those gems of information from the tortured mixed metaphor of data.
“So who do we think’s going to win the golf this week then?”
Sportsfan has arrived. You know him, I think every office has one. He’s the guy who likes sport, and when I say he likes sport, I mean he relates every single aspect of existence back to some overwrought football reference. Like when you were explaining the complexities of integrating a source with no primary key into a multi-channel dataset and he said,
“Oh yeah, it’s just like trying to fit Wilshire into the Arsenal midfield. You’ve got to play him, right, but no matter where you put him, they’re not really gonna be able to link up, right?”
So, Sportsfan announces his arrival and the analysts rub their eyes and turn to address this intrusion.
“You know, the golf’s on, and I was just wondering, who you guys thought might win.”
If Sportsfan has recognised the analysts’ irritation, he makes no sign. Dressed in a green sunvisor and tartan plus fours, he takes a stance at the end of a bank of desks and begins taking practise swings. With a biro.
Slowly, their massive brains churning back through the necessary files required to converse in the human tongue, the analysts formulate an initial response. The oldest one speaks, his voice hoarse from lack of use.
“7.5% up year on year. 3 standard deviations from the mean. Error margin 22%” he states flatly.
Sportsfan is obviously confused and the analyst adds this result to his calculations. Another minute passes before his next response is computed.
“Not interested. Golf. No data. No insight,” he croaks and turns back to his screens.
There’s Always Data In Everything
“Ah but you’re wrong, there’s always data in everything, and golf especially,” Sporstfan begins. “Because it’s mostly just the player against the course and the conditions, so there aren’t even that many variables really, in fact, if you think about it, it should be pretty easy to predict how each player’s going to do.”
He’s rambling now, but the prospect of un-investigated data has disturbed the analysts’ finely tuned insight hunting senses. One of the younger ones begins to emit intermittent high pitched whines. The oldest one’s left hand twitches rapidly, rhythmically tapping out the Fibonacci sequence on the desk
“Tell. Us of. Data for. The Golf and Players. We must predict the outcome now.”
Oblivious to the tension Sportsfan elaborates;
“Well, I reckon this course suits Stenson pretty well, but the bookies think Johnson’s form means he’s going to do well. I like Stenson, because when he got asked what differentiated him form the other golfers, he said ‘Well, I’m better looking.’ Pure class. So I want to have a little flutter on him to beat Johnson at the PGA Championship.”
“Enough. Give us. The data. We will give answer”
The analysts bent to the task, hands blurring and eyes flashing. The European tour, The PGA, course guides, gps apis, weather conditions, past performance, tournament results, all available to be assimilated. The office temperature rose a few degrees as the analysts applied all their processing power. Databases were built and queried, formulas spat shot coefficients, wind speed impact on driving distances were vectored. Sportsfan merely looked on, the biro having transformed from 4-iron to putter, as he idly tries to sink a ball of elastic bands into the waste paper basket from 3 yards.
The analyst’s smug declaration causes Sportsfan to whiff the putt, smacking the biro into the table and snapping it half. He gives it a rueful look and tosses it over his shoulder, naturally it lands perfectly in the waste paper basket. The old analyst’s red-rimmed eyes regard him with tangible disdain, but he continues.
“PGA Championship 2015. Henrik Stenson will score. -5. Dustin Johnson will score. +1. Stenson beats Johnson.”
Sportsfans face brightens, “Really? Great! I’m off to stick a few quid on that! The bookies have Johnson a dead cert to do well, so I’ll get some good odds. Thanks guys!”
Sportsfan swerves, pauses over the elastic-band ball to do a quick Cruyff turn, and darts out of the office. In silence now, the analysts peacefully resume their endless quest.
Win Some; Lose Some
The scene is a central London office, a week later. Sportsfan arrives, in full Tottenham kit and boots, the studs clicking against the dusty wooden floor. He wastes no time on pleasantries, storming up to the old analyst and spinning his chair away from the desk.
“Your stupid analysis lost me £100! Johnson was brilliant, he thrashed Stenson. Your data was no good! What have you got to say for yourself! All this analytics is just pointless messing around!”
The analyst raised a gnarled finger and pointed to one his innumerable screens. Sportsfan followed his gaze and slowly processed the glowing green text:
“Predicted Score Stenson. -5. Actual Score Stenson -5.
Predicted Score Johnson +1. Actual Score Johnson -12.
Insufficient Data. Too many Variables. Sample Size too Small. Error Margin 90-95%”
As he read through the screen, he heard a strange coughing sound coming from the old analyst. Annoyed, he realised the ancient creature was laughing. With a sound like the creaking spine of an antique book, the analyst raised his shoulders in a shrug.
“You. Win some. You lose some.”
Liberties may have been taken, but Station10's first adventure in sports analytics is a real and ongoing experiment. Making use of the vast array of freely available sports data, we’ve begun to build out a model in an attempt to predict Henrik Stenson’s score at any given tournament.
You might ask why – but we think it’s a pretty valuable demonstration of how there’s always data available, not just in businesses geared up to collect it but in almost any field. We also think it shows that unexpected data sources can still have effective real world applications.
For the record, we’ve used the model on two tournaments, and got Stenson’s exact score correct at both. Unfortunately it didn’t work so well applying it to Dustin Johnson…But it wouldn’t be any fun if there wasn’t any room for improvement.)