Reconsidering the ruling on Khorne?
Moderator: TFF Mods
-
- Emerging Star
- Posts: 346
- Joined: Mon May 19, 2014 9:33 pm
Re: Reconsidering the ruling on Khorne?
I think you'd be better with a chi-squared than a t-test for that.
Reason: ''
-
- Ex-Cyanide/Focus toadie
- Posts: 2565
- Joined: Fri Jul 24, 2009 4:55 pm
- Location: Near Reading, UK
Re: Reconsidering the ruling on Khorne?
Really? My understanding would be that a t test would be better for comparing two samples, i.e. seeing if there is a difference between CRP and CRP+ zons. Chi squared is better used for comparing measured with expected (i.e. what the hypothesis says) values rather than two sets of measured values. Could be wrong, ofc 

Reason: ''
-
- Emerging Star
- Posts: 346
- Joined: Mon May 19, 2014 9:33 pm
Re: Reconsidering the ruling on Khorne?
Yeah, I think I'd go chi squared and use CRP as the expected and CRP+ as the observed. Your expected does not need to be theoretical, per se, you just need a base line. Then once you have made a change (such as +'ing the CRP) you can see if the new results match the expectation that nothing has changed (the null hypothesis). That way you can see if there is a statistical difference between the two samples. It's a much easier test as well.
The difficulty with a t-test is that the distribution should be normal, and that limits what you can look at. I've made a massive assumption here, thinking about it, that you'd mainly want to look at the win/draw/loss rate!
The difficulty with a t-test is that the distribution should be normal, and that limits what you can look at. I've made a massive assumption here, thinking about it, that you'd mainly want to look at the win/draw/loss rate!
Reason: ''
- VoodooMike
- Emerging Star
- Posts: 434
- Joined: Thu Oct 07, 2010 8:03 am
Re: Reconsidering the ruling on Khorne?
You'd think wrong, then.wulfyn wrote:I think you'd be better with a chi-squared than a t-test for that.
Next time you want to feign understanding using google, add "central limit theorem" to your list of things to surf-and-skim, please.wulfyn wrote:The difficulty with a t-test is that the distribution should be normal, and that limits what you can look at.
Reason: ''
-
- Emerging Star
- Posts: 346
- Joined: Mon May 19, 2014 9:33 pm
Re: Reconsidering the ruling on Khorne?
I don't need google to understand the central limit theorem, other convergeance theorems, or indeed the basis of Gaussian distributions. If you take win/draw/lose results, even a lot of them, you are not going to get a normal distribution, ever. In the same way if you were to toss a coin or roll a dice you would not get a normal distribution, ever.
If you were to take those results, put them into sets of 100, and then map the distribution of the sets then you would get a normal distribution. But then you are not mapping the results, but the sets of the results, which is not the same thing at all. This only works due to the underlying stochastic mechanisms that drive Gaussian distributions, and pretty much any distribution when grouped and aggregated would work this way. One of the (minor) aspects of my job is calculating model deviance by using the central limit theorem to show that the aggregate model error is normally distributed, and testing on that basis.
But please do explain to me how your win/draw/lose result would work in a t-test.
If you were to take those results, put them into sets of 100, and then map the distribution of the sets then you would get a normal distribution. But then you are not mapping the results, but the sets of the results, which is not the same thing at all. This only works due to the underlying stochastic mechanisms that drive Gaussian distributions, and pretty much any distribution when grouped and aggregated would work this way. One of the (minor) aspects of my job is calculating model deviance by using the central limit theorem to show that the aggregate model error is normally distributed, and testing on that basis.
But please do explain to me how your win/draw/lose result would work in a t-test.
Reason: ''
-
- Legend
- Posts: 5334
- Joined: Sun May 05, 2002 8:55 am
- Location: Copenhagen
- Contact:
Re: Reconsidering the ruling on Khorne?
Hi Dode and Mike,
Dode,
But anyway. Thanks for clearing this up. I stand corrected. In a good way. So I am using inferential data after all when examining the CRP data for anomalies. And I have been since NTBB2014. Actually what confused my was this post by Mike from 5 pages back:
Also, thanks for the suggestions generated over the past few pages. I will Work them into the page 2 weeks from now, when my schedule clears. Lots of good stuff.
Now. Last Things:
Anyway, you bring up comparing apples and oranges two times.
I'm not. Because I'm not comparing anything. As I've said repeatedly.
I rely on the inferential CRP data to best identify problem teams.
I don't want so say that I "justify" it with data, because I'm sure that "justify" implies actual proof. And I haven't done ANOVA, t-test or whatnot (which would be that 'comparison', BTW) - so I haven't got proof. Which I acknowledge.
So - I use (or whichever verb you, Dode, think does not carry a hidden and decietful implication of proof) the data - in TV-bands and with mirror matches removed - to best identify the problem teams.
Then I apply CRP+ (which has no connection to that data, only to the BBRC and myself).
And I apply the NT Roster Tweaks - to the teams identified, but the specific iterations of the tweaks are not match-data driven.
And then we play
I'm not out to prove anything. And I'm not collecting/comparing match-data.
I'm having fun, and I'm suggesting that others who share my mind set could have fun with these rules too.
For comparison - but not in actuality - you might call this PBBL13, since none of the PBBL editions were match-data driven.
In the same way, none of the changes between NTBB editions were driven by CRP+ match-data.
They were driven by the response/feedback from the players. Not to make the tweaks more precisely within the tiers. But to identify what didn't Work/wasn't fun.
In the majority of cases this was not about (as you suggest) 'what I liked'. Quite the opposite in fact. This was the "kill your darlings" process, that is tremendously helped by involving other people.
For example: My first shot at NTBB Halflings had Dryads. It wasn't that the addition of dryads didn't help the team. It was that everybody hated the idea. So they got cut and replaced with something else.
So, CRP data to locate problem teams.
Tweak the teams, apply CRP+, have fun.
I think I've gotten out of this conversation what I can for now. I'll get to the rewrite in a few weeks.
Cheers
Martin
Dode,
Actually, you said:I read the text, but the much of the table is presented as means. It is therefore descriptive data.
So, I present inferential data, alongside some descriptive data - and that rubs off so the inferential data becomes descriptive? Is that how it works? Weird. How about in the table below that when I present only the inferential data for clarity?Your FUMBBL data is only means, which cannot be inferential.
But anyway. Thanks for clearing this up. I stand corrected. In a good way. So I am using inferential data after all when examining the CRP data for anomalies. And I have been since NTBB2014. Actually what confused my was this post by Mike from 5 pages back:
I figured he had spotted me using descriptive data as inferential data. That threw me. But obviously not then. As I said: Great.What folks like plasmoid (and really the BB community in general) do is mix simple descriptives with distilled water, the blood of a virgin (likely themselves), and squirrel dandruff, then wave their hogwart's magic wand over it, and pretend the results are valid inferential statistics.
Also, thanks for the suggestions generated over the past few pages. I will Work them into the page 2 weeks from now, when my schedule clears. Lots of good stuff.
Now. Last Things:
I can't wait. As the shift to inferential stats i 2014 was what caused the majority of recent changes, I don't expect any more. Certainly not to CRP+. But I have no reason to change the rosters either.I'll believe that when I see it!![]()
Anyway, you bring up comparing apples and oranges two times.
I'm not. Because I'm not comparing anything. As I've said repeatedly.
I'll clarify yet Again:What misunderstanding? You said it yourself: "I use CRP data (even though there is not enough to do inferential statistics) as my most accurate available way of identifying the problem teams."
I rely on the inferential CRP data to best identify problem teams.
I don't want so say that I "justify" it with data, because I'm sure that "justify" implies actual proof. And I haven't done ANOVA, t-test or whatnot (which would be that 'comparison', BTW) - so I haven't got proof. Which I acknowledge.
So - I use (or whichever verb you, Dode, think does not carry a hidden and decietful implication of proof) the data - in TV-bands and with mirror matches removed - to best identify the problem teams.
Then I apply CRP+ (which has no connection to that data, only to the BBRC and myself).
And I apply the NT Roster Tweaks - to the teams identified, but the specific iterations of the tweaks are not match-data driven.
And then we play

I'm not out to prove anything. And I'm not collecting/comparing match-data.
I'm having fun, and I'm suggesting that others who share my mind set could have fun with these rules too.
For comparison - but not in actuality - you might call this PBBL13, since none of the PBBL editions were match-data driven.
In the same way, none of the changes between NTBB editions were driven by CRP+ match-data.
They were driven by the response/feedback from the players. Not to make the tweaks more precisely within the tiers. But to identify what didn't Work/wasn't fun.
In the majority of cases this was not about (as you suggest) 'what I liked'. Quite the opposite in fact. This was the "kill your darlings" process, that is tremendously helped by involving other people.
For example: My first shot at NTBB Halflings had Dryads. It wasn't that the addition of dryads didn't help the team. It was that everybody hated the idea. So they got cut and replaced with something else.
So, CRP data to locate problem teams.
Tweak the teams, apply CRP+, have fun.
I think I've gotten out of this conversation what I can for now. I'll get to the rewrite in a few weeks.
Cheers
Martin
Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
Or just visit http://www.plasmoids.dk instead
-
- Super Star
- Posts: 779
- Joined: Tue Feb 12, 2013 4:05 pm
Re: Reconsidering the ruling on Khorne?
Sure. Humen coachs will continue to have MAX one catcheur per team. For the one turn at turn 8/16.plasmoid wrote: In the end I went with AV8 - and still would over the other suggestions in this thread - because AV8 is the tweak that changes the human team's playing style the least. It doesn't change the catchers role. He is basically the same.
Reason: ''
-
- Ex-Cyanide/Focus toadie
- Posts: 2565
- Joined: Fri Jul 24, 2009 4:55 pm
- Location: Near Reading, UK
Re: Reconsidering the ruling on Khorne?
Sigh. No. There is data there that is only means, and that cannot be inferential. There is also some inferential data, but there is a lot of data which is only means.So, I present inferential data, alongside some descriptive data - and that rubs off so the inferential data becomes descriptive?
You are, but there are other sources of error which might well mean your data is out. The CI given is the smallest possible assuming a random sampling of games from the "population", whereas we know that it's not a random sample but a self-selecting one: people choose the race they play and some people play only that race or a large proportion of games from that race. I'm sure there are other influences I am missing, but there you are.So I am using inferential data after all when examining the CRP data for anomalies.
Also, your samples are single samples over 24 races and 14 "bands" of TV. That's 336 individual samples to 95CI. There is a (very) large chance that at least one of those CIs does not contain the real mean for the population, but you don't mention that at all. The odds are about 2 in 3 that at least one of your 21 highlighted banded CIs does not contain the real mean for the population (0.95^21 = 0.34). We can't tell which, but you don't even mention it.
How, precisely, are you "identifying" these teams? Your page very strongly implies you are comparing their 95CI range to the tier range (as redefined by you). You label them as "problem" teams based on their 95CI range being outside the tier range. That's your justification, and it is a justification.I rely on the inferential CRP data to best identify problem teams.
And how do you know the problems have been resolved in any way? You don't. And if you don't know that your tweaks have resolved the problem then you're not actually addressing the problem at all. It therefore makes no difference whatsoever what tweaks you are making or to which teams, since you're not addressing what you claim to be the problem in the first place. What's the point of identifying a problem if you're not addressing it? And if you think you are addressing it then how do you know that you are if you are not measuring again? It's utterly illogical. Nothing wrong with a test/adjust cycle, and nothing wrong with "I prefer it that way", but it's nonsense to test, adjust and then just say "screw it, I like it".So, CRP data to locate problem teams.
Tweak the teams, apply CRP+, have fun.
BTW, I took a look at the NAF data from the 2013/2014 stuff posted on the NAF website (match level games). Here you go:

24 samples, so at least one of them probably does not contain the real racial mean

Edit: clearer pic
Edit 2: corrected minor error in chart
Reason: ''
- Digger Goreman
- Legend
- Posts: 5000
- Joined: Sun Jun 25, 2006 3:30 am
- Location: Atlanta, GA., USA: Recruiting the Walking Dead for the Blood Bowl Zombie Nation
- Contact:
Re: Reconsidering the ruling on Khorne?
*Sigh*, can I have my golems back at 100k now...? 

Reason: ''
LRB6/Icepelt Edition: Ah!, when Blood Bowl made sense....
"1 in 36, my Nuffled arse!"
"1 in 36, my Nuffled arse!"
- Darkson
- Da Spammer
- Posts: 24047
- Joined: Mon Aug 12, 2002 9:04 pm
- Location: The frozen ruins of Felstad
- Contact:
Re: Reconsidering the ruling on Khorne?
Necro better than Skaven....
Reason: ''
Currently an ex-Blood Bowl coach, most likely to be found dying to Armoured Skeletons in the frozen ruins of Felstad, or bleeding into the arena sands of Rome or burning rubber for Mars' entertainment.
-
- Ex-Cyanide/Focus toadie
- Posts: 2565
- Joined: Fri Jul 24, 2009 4:55 pm
- Location: Near Reading, UK
Re: Reconsidering the ruling on Khorne?
No, can't actually say they are any different.Darkson wrote:Necro better than Skaven....
Reason: ''
-
- Legend
- Posts: 5334
- Joined: Sun May 05, 2002 8:55 am
- Location: Copenhagen
- Contact:
Re: Reconsidering the ruling on Khorne?
Hi guys,
Babass - well, there's something I can respond to with the testing I'm doing. There are human teams in my current NTBB league. And I've played one myself. None of those had just 1 catcher. Nor did they only use them on short drives.
Digger - I know that you're unhappy with the Golem price. But what in those stats indicates to you that necro ought to be better?
Cheers
Martin
Babass - well, there's something I can respond to with the testing I'm doing. There are human teams in my current NTBB league. And I've played one myself. None of those had just 1 catcher. Nor did they only use them on short drives.
Digger - I know that you're unhappy with the Golem price. But what in those stats indicates to you that necro ought to be better?
Cheers
Martin
Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
Or just visit http://www.plasmoids.dk instead
- Digger Goreman
- Legend
- Posts: 5000
- Joined: Sun Jun 25, 2006 3:30 am
- Location: Atlanta, GA., USA: Recruiting the Walking Dead for the Blood Bowl Zombie Nation
- Contact:
Re: Reconsidering the ruling on Khorne?
If nothing else is learned here, gentlemen, anything based on Plas's voodoo "statistics" is indefensible....
Citing spit, to prove spit, is just spit....
The only thing missing in this lesson is why no elfin one has ever asked for the ream of assumptions this so-called data is based on....
Now the optimist in me wants to think that Plasmoid had altruistic leanings.... The realist in me remembers a partisan BBRC.... The realist is ascendent here....
Citing spit, to prove spit, is just spit....
The only thing missing in this lesson is why no elfin one has ever asked for the ream of assumptions this so-called data is based on....
Now the optimist in me wants to think that Plasmoid had altruistic leanings.... The realist in me remembers a partisan BBRC.... The realist is ascendent here....
Reason: ''
LRB6/Icepelt Edition: Ah!, when Blood Bowl made sense....
"1 in 36, my Nuffled arse!"
"1 in 36, my Nuffled arse!"
-
- Emerging Star
- Posts: 355
- Joined: Wed Sep 15, 2010 4:14 pm
Re: Reconsidering the ruling on Khorne?
Digger Goreman wrote:If nothing else is learned here, gentlemen, anything based on Plas's voodoo "statistics" is indefensible....
Citing spit, to prove spit, is just spit....
The only thing missing in this lesson is why no elfin one has ever asked for the ream of assumptions this so-called data is based on....
Now the optimist in me wants to think that Plasmoid had altruistic leanings.... The realist in me remembers a partisan BBRC.... The realist is ascendent here....
You know, sometimes its better to stay off the keyboard and let people think you are an ass, rather actually type something and provide them with proof.
Reason: ''
-
- Emerging Star
- Posts: 335
- Joined: Fri Mar 25, 2005 5:26 pm
- Location: London, UK
Re: Reconsidering the ruling on Khorne?
That looks suspiciously like low TV data. Is that any use to anyone but low TV/leagues & trournaments?dode74 wrote: BTW, I took a look at the NAF data from the 2013/2014 stuff posted on the NAF website (match level games). Here you go:
24 samples, so at least one of them probably does not contain the real racial mean![]()
Nerfing woods would probably be a bad idea in a Box/MM type environment. Anything that discourages people from playing elves hurts diversity and is a bad move.
With more diversity people may well take tackle earlier which in turn might reduce the elves's effectiveness.
What do the stats tell us about that?
Reason: ''