The Fighting Cock is a forum for fans of Tottenham Hotspur Football Club. Here you can discuss Spurs latest matches, our squad, tactics and any transfer news surrounding the club. Registration gives you access to all our forums (including 'Off Topic' discussion) and removes most of the adverts (you can remove them all via an account upgrade). You're here now, you might as well...

Get involved!

Maffs 'n' graffs - statistical porn

Thread starter CJJ
Start date 21 Mar 2014

Latest Spurs videos from Sky Sports

Éperons

#141

Incidentally, by PDO, we're about 10th, which means we're being lucky. This fits with the pythagorean expectation as well:

http://jameswgrayson.wordpress.com/2014/02/14/european-tables-february-14th/

Here's a PDO look at the end of the AVB era:

http://jameswgrayson.wordpress.com/2013/12/16/the-firing-of-steve-clarke-in-graph-form-2/

As for PDO, this explains it:

http://jameswgrayson.wordpress.com/2011/04/04/a-primer-pdo/

Millbanks

Spurs - Champions Cup of Europe Winners 2011

#142

CJJ said:
For anyone who wants to try and wish their tits off for CL Quals/4th place...

Or just the curious ones...

I've made an interactive/'live' league table so you can see how the results affect the table. Much easier than using your head to work it out.

Xlsx version will be a bit better, but incase you can't open newer office files, the XLS is there too.

All done with formulae, so no worries about VBA and what not..

PLPredictor.xlsx
PLPredictor.xls

Thanks gorgeous x :ledleylick:

:ledleylick:

Scott_D

#143

Éperons said:
Duly noted. Even so, the only reason +/- is even in discussion in hockey is because you can assume that each player has more to do with a goal, as there are only six of them on the ice. In football, it's probably an even smaller amount of "influence." Especially when so many goals are scored on the break (or because of defensive lapses).

Absolutely. And even in hockey there isn't much comfort in assigning "blame" for shots/goals equally to all five skaters. Defensive defenders have shown to have very little ability to drive offensive zone play, etc.

Éperons said:
On the flip side, Lloris was considered a poor keeper because he had such a low save percentage (for some reason, we gave up few shots, but the ones we did give up would go in!).

The Cartilage Free Captain site shows the Spurs with the highest % of shots coming from the "Danger Zone" so that seems consistent, and a reasonable explanation for a low total save %.

Éperons said:
By pythagorean expectation, we are absolutely playing out of our minds in terms of points, compared to the number of goals we're scoring (and giving up).

No surprise, seeing the flat goal differential surrounded by teams positive in the double digits.

Thanks for the links.

VirginiaSpur said:
Cartilage Free Captain has been doing this kind of shot analysis all season.

Thanks. Some great stuff there.

By the way, these were the links that I alluded to in my first post but was unable to provide:
http://www.statsbomb.com/2014/03/the-spanish-inquisition-roberto-soldado/
http://www.statsbomb.com/2014/02/tottenham-lucky-tim-sherwood-and-why-results-can-be-misleading/

Éperons

#144

Thanks for those statsbomb pieces.

http://www.statsbomb.com/2014/02/to...nagers-crap-managers-or-something-in-between/

This one argues that AVB and Sherwood have nearly inverted seasons. AVB's side took lots of shots (compared to whom they were playing) but scored few goals. TS's side takes few shots (compared to whom they're playing) but scores many goals. Theoretically neither is sustainable. It's possible that the managers are partly responsible for this kind of performance, but that's not terribly likely.

CJJ

Supporter

#145

CJJ

Supporter

#146

VirginiaSpur

"Vladimir, Stop!"

#147

Your charts are so pretty CJJ. If they could be eaten I think they would redefine modern cuisine.

Éperons

#148

These month-by-month charts are *dangerous*. (Though pretty).

Win% and points% both have binomial probabilities. Either you get the win or you don't.

Even with the large number of matches, I'd still venture that no monthly result is more than two standard deviations from the mean, meaning that it's safe to assume that the fluctuation is random.

Points are a bit trickier, but we'll see. I probably won't be able to run the calculations tonight or tomorrow, though.

CJJ

Supporter

#149

Éperons said:
These month-by-month charts are *dangerous*. (Though pretty).

Win% and points% both have binomial probabilities. Either you get the win or you don't.

Even with the large number of matches, I'd still venture that no monthly result is more than two standard deviations from the mean, meaning that it's safe to assume that the fluctuation is random.

Points are a bit trickier, but we'll see. I probably won't be able to run the calculations tonight or tomorrow, though.

Confirms what we know about #Spursy...

Slow start followed by a great run of Xmas form, picks up another gear in Feb to leave us feeling 'this could be the season...' followed by a slump and trough in April, but then going on a great streak in May to make everyone forget how shit the manager was when we needed him to pull through.

Alternatively, you could make direct correlations I'm sure between Europa Games & League form going sour

Éperons

#150

CJJ said:
Confirms what we know about #Spursy...

It "confirms" no such thing.

Since the 1994/1995 season, Spurs have won, on average, 40.3% of their matches and scored 48.9% of the points available. For ease of argument, I'm going to round the former to 40% and drop the latter, since it's a misleading/useless statistic.*

As I said in the previous post, win percentage is a binomial. Even though a match can have three results, here we care about only two: either you win the match or you don't (losses + draws). It's like flipping a coin: either it comes up heads or it comes up tails.

But while we know that, with a coin, it'll eventually be heads 50% of the time, we don't know ahead of time what Spurs' win percentage will be. But now we do. In order to flip a coin to see if Spurs will win or lose, you need a coin that, somehow, comes up heads only 40% of the time.

However, as anyone who has flipped a coin knows, if you flip a coin only a handful of times (say, ten), then it's entirely reasonable to assume that it won't be an even 50/50 split. Sometimes you'll get four heads. Sometimes six. More rarely three or seven. Even more rarely two or eight. And very, very rarely, you'll get one or nine or none or ten. So even though, over the course of 757 matches, Spurs won 40% of them, it's completely reasonable to assume that, given 20 matches at random, they will have won eight, or seven, or nine, or even six or ten.

That's the problem with the chart: it takes entirely reasonable fluctuations and posits that there is something "#spursy" about what can be described more directly as near-random chance. This is the problem of sample size. To wit:

In a binomial distribution, the standard deviation is the amount that you can assume, on average, that the results will differ from the mean. So looking at August, that means that, given 69 matches, we can assume that Spurs would normally win 40% of them, or 28. But if we pick any 69 random matches from the pool of 757, we can assume that Spurs will have won, on average, between 24 and 32 of them, since the standard deviation is 4.

Even results that are over one standard deviation from the mean are considered, in terms of statistical inference, as within the realm of normal variance. Typically results have to drift two standard deviations away for people to start taking notice.

Our "worst" month in terms of variance from the mean, April, looks rather damning. 32% instead of 40%! But looks can be (and are) deceiving. Even being 1.58 standard deviations from the mean means only that, if you picked 91 matches at random from the pile, we'll have won 29 (or fewer) of them over 6% of the time. Still not in the realm of statistical significance. This is especially problematic since picking 91 matches at random ignores dependencies that exist among matches played close together. In other words, if months mattered, and last week's match had an effect on next week's match, then the correlation standard is even higher.

In short, the null hypothesis ("what month Spurs are playing in has no correlation to how likely they are to win the match") remains unassailed. When trying to "confirm" an effect, one has to work hard to disprove the null hypothesis. Your charts, in breaking down win percentage by month, have not met that standard.

* Not all point combinations from the "available points" pool are possible, so probabilities strike me as exceptionally tricky to calculate. All the statistic tells us is that draws happen with some frequency.

alijamieson

#151

CJJ is there a way to document who is responsible for the defensive errors leading to goals? (and maybe if there's any common selections that are shipping more errors than others?)

CJJ

Supporter

#152

Éperons said:
It "confirms" no such thing.

Since the 1994/1995 season, Spurs have won, on average, 40.3% of their matches and scored 48.9% of the points available. For ease of argument, I'm going to round the former to 40% and drop the latter, since it's a misleading/useless statistic.*

As I said in the previous post, win percentage is a binomial. Even though a match can have three results, here we care about only two: either you win the match or you don't (losses + draws). It's like flipping a coin: either it comes up heads or it comes up tails.

But while we know that, with a coin, it'll eventually be heads 50% of the time, we don't know ahead of time what Spurs' win percentage will be. But now we do. In order to flip a coin to see if Spurs will win or lose, you need a coin that, somehow, comes up heads only 40% of the time.

However, as anyone who has flipped a coin knows, if you flip a coin only a handful of times (say, ten), then it's entirely reasonable to assume that it won't be an even 50/50 split. Sometimes you'll get four heads. Sometimes six. More rarely three or seven. Even more rarely two or eight. And very, very rarely, you'll get one or nine or none or ten. So even though, over the course of 757 matches, Spurs won 40% of them, it's completely reasonable to assume that, given 20 matches at random, they will have won eight, or seven, or nine, or even six or ten.

That's the problem with the chart: it takes entirely reasonable fluctuations and posits that there is something "#spursy" about what can be described more directly as near-random chance. This is the problem of sample size. To wit:

In a binomial distribution, the standard deviation is the amount that you can assume, on average, that the results will differ from the mean. So looking at August, that means that, given 69 matches, we can assume that Spurs would normally win 40% of them, or 28. But if we pick any 69 random matches from the pool of 757, we can assume that Spurs will have won, on average, between 24 and 32 of them, since the standard deviation is 4.

Even results that are over one standard deviation from the mean are considered, in terms of statistical inference, as within the realm of normal variance. Typically results have to drift two standard deviations away for people to start taking notice.

Our "worst" month in terms of variance from the mean, April, looks rather damning. 32% instead of 40%! But looks can be (and are) deceiving. Even being 1.58 standard deviations from the mean means only that, if you picked 91 matches at random from the pile, we'll have won 29 (or fewer) of them over 6% of the time. Still not in the realm of statistical significance. This is especially problematic since picking 91 matches at random ignores dependencies that exist among matches played close together. In other words, if months mattered, and last week's match had an effect on next week's match, then the correlation standard is even higher.

In short, the null hypothesis ("what month Spurs are playing in has no correlation to how likely they are to win the match") remains unassailed. When trying to "confirm" an effect, one has to work hard to disprove the null hypothesis. Your charts, in breaking down win percentage by month, have not met that standard.

* Not all point combinations from the "available points" pool are possible, so probabilities strike me as exceptionally tricky to calculate. All the statistic tells us is that draws happen with some frequency.

As much as I appreciate the time you take to straighten things out, I do worry that you're missing the humour in most of this?
:adeohshit:

:adeohshit:

'Spursy' isn't a thing, it's a phenomenon we light heartedly use to describe bad luck. Luck itself is such a contradiction to statistics, so'proving' something is #spursy was supposed to be an ironic joke...

:adegrin2:

:adegrin2:

CJJ

Supporter

#153

alijamieson said:
CJJ is there a way to document who is responsible for the defensive errors leading to goals? (and maybe if there's any common selections that are shipping more errors than others?)

Probably, i'd need to put the data together. At the moment, can't be bothered. lol.

If our matches have shown us anything lately, though, it's that all our defenders are shit except Vlad

Éperons

#154

CJJ said:
'Spursy' isn't a thing, it's a phenomenon we light heartedly use to describe bad luck. Luck itself is such a contradiction to statistics, so'proving' something is #spursy was supposed to be an ironic joke...

You're trolling us with this entire thread, then?

I know it's the first of April and all, but, seriously? You write:

CJJ said:
Slow start followed by a great run of Xmas form

And so on. You narrativise the charts that you clearly spent more than two minutes making. And it's all just to poke fun at the very idea that a chart has some kind of interpretive value?

Based on your charts, I explained there is no evidence that Spurs typically have a "slow start". There is no evidence that Spurs typically have a "great run of Xmas form". The fact that they won a slightly larger percentage of games in the "December" bucket than in the "September" bucket can be explained, wholly, by "swings and roundabouts."

You arbitrarily break up the data using a measurement system developed millennia ago, a measurement system that splits up the arbitrary amount of time it takes the earth to go around the sun into twelve unequal chunks. And then try to tell a story about it. Spurs are bad in September, good in December, blah blah. That story is, simply, bullshit. Spurs win some, and they don't win more.

Tracking a single manager's narrative is one thing, as it can show the club's growth over time (and even then has to account for variance). Lumping two decade's worth of players, managers, promoted clubs, etc. together and saying something like "Spurs don't do so hot in September in comparison to December" is a farce.

When you say, then, that you mean your charts as a farce, well, frankly, I don't believe you. Your responses to

VirginiaSpur only make me more confident in that impression.

Finally, I've argued on this site for years that there is no such thing as "#spursy". The first post I ever had that got promoted to the front page established that commitment (even before the term caught on here), and probably half of the threads I have ever started on TFC have been efforts to further combat the idea that our club is somehow cursed with bad luck. We have a certain baseline, and momentary fluctuations (winning five in a row, losing five in a row) are simply that: fluctuations that even out over time. Even over the course of a single year, they even out (see Harry's schizophrenic final season).

CJJ

Supporter

#155

Éperons said:
You're trolling us with this entire thread, then?

I know it's the first of April and all, but, seriously? You write:

And so on. You narrativise the charts that you clearly spent more than two minutes making. And it's all just to poke fun at the very idea that a chart has some kind of interpretive value?

Based on your charts, I explained there is no evidence that Spurs typically have a "slow start". There is no evidence that Spurs typically have a "great run of Xmas form". The fact that they won a slightly larger percentage of games in the "December" bucket than in the "September" bucket can be explained, wholly, by "swings and roundabouts."

You arbitrarily break up the data using a measurement system developed millennia ago, a measurement system that splits up the arbitrary amount of time it takes the earth to go around the sun into twelve unequal chunks. And then try to tell a story about it. Spurs are bad in September, good in December, blah blah. That story is, simply, bullshit. Spurs win some, and they don't win more.

Tracking a single manager's narrative is one thing, as it can show the club's growth over time (and even then has to account for variance). Lumping two decade's worth of players, managers, promoted clubs, etc. together and saying something like "Spurs don't do so hot in September in comparison to December" is a farce.

When you say, then, that you mean your charts as a farce, well, frankly, I don't believe you. Your responses to VirginiaSpur only make me more confident in that impression.

Finally, I've argued on this site for years that there is no such thing as "#spursy". The first post I ever had that got promoted to the front page established that commitment (even before the term caught on here), and probably half of the threads I have ever started on TFC have been efforts to further combat the idea that our club is somehow cursed with bad luck. We have a certain baseline, and momentary fluctuations (winning five in a row, losing five in a row) are simply that: fluctuations that even out over time. Even over the course of a single year, they even out (see Harry's schizophrenic final season).

How am I trolling anyone?

We're enjoying looking at some graphs and stats over the last 20 years or so. The very least it does is remind people that there was some justification in managers being booted over the years, but it's not meant to be a lecture.

All of the graphs and data is real and took me a long time to put together, it offers a little reassurance but no one is arguing that we can win the premier league through mathematical algorithms

I never said the charts were meant as a farce at all, I was talking about our form based on what month were in, you know, the thread you quoted?

You're saying that we can't prove anything we're not allowed to talkabout "spursy" - Don't take yourself and others so seriously, it's meant to be enjoyable, Buzz!

CJJ

Supporter

#156

So, with the talk of new strikers in the air, I thought I'd refresh my own memory of what our other strikers have been like. Some surprising stats. Credit to

MyFootbalFacts for some of the figures, which I've double checked and, aside from an extra Soldado appearance making 26 this season, all are up to date. A number of sources were used to cross reference and reinforce the figures.

These are "Premier League only" stats and I've included players who have been played in a forward role. The likes of Bale & VDV are questionable, but as they have both been utilised in an off-the-striker role, thought I'd throw them in. There's a lot of contentiousness in there, like 'that player was in a shit team' and 'that player was in a shit league' etc, but that's what their facts and figures are. Players like Konoplyanka are scouted based on their performances in the teams they come from, so I've included the other stats in a justifiable manner.

I've highlighted our current two in bold and with a blue block next to them, and also Harry Kane is highlighted in grey - the difference being, for me, Kane hasn't had any proper opportunities in the team and shouldn't be taken too seriously as part of the stats.

The last column shows the difference in strike rate for us compared to elsewhere.

Note that Adebayor is more prolific in the league for us than Berba. Who'd have thunk it?

MyFootbalFacts

#157

Some very good work there, CJJ

ILSpur

#158

CJJ said:
Note that Adebayor is more prolific in the league for us than Berba. Who'd have thunk it?

That is surprising, at least I think so. Maybe that Berbatov just had more of a flair to his scoring than Ade so we remember them differently?

CJJ

Supporter

#159

MyFootbalFacts said:
Some very good work there, CJJ

I was going to do a "before they joined us" and "after" to see if we destroy careers lol.

Couldn't be bothered

CJJ

Supporter

#160

ILSpur said:
That is surprising, at least I think so. Maybe that Berbatov just had more of a flair to his scoring than Ade so we remember them differently?

Makes you wonder what SEA could do if he was a bit more committed/enthusiastic at his worse as his is at his best times

You must log in or register to reply here.

Share:

Facebook X Bluesky LinkedIn Reddit WhatsApp Email Link

Top