Reconsidering the ruling on Khorne?

Wulfyn · Post by **Wulfyn** » Mon Jul 13, 2015 7:29 pm

I think you'd be better with a chi-squared than a t-test for that.

dode74 · Post by **dode74** » Mon Jul 13, 2015 8:16 pm

Really? My understanding would be that a t test would be better for comparing two samples, i.e. seeing if there is a difference between CRP and CRP+ zons. Chi squared is better used for comparing measured with expected (i.e. what the hypothesis says) values rather than two sets of measured values. Could be wrong, ofc

Wulfyn · Post by **Wulfyn** » Mon Jul 13, 2015 9:36 pm

Yeah, I think I'd go chi squared and use CRP as the expected and CRP+ as the observed. Your expected does not need to be theoretical, per se, you just need a base line. Then once you have made a change (such as +'ing the CRP) you can see if the new results match the expectation that nothing has changed (the null hypothesis). That way you can see if there is a statistical difference between the two samples. It's a much easier test as well.

The difficulty with a t-test is that the distribution should be normal, and that limits what you can look at. I've made a massive assumption here, thinking about it, that you'd mainly want to look at the win/draw/loss rate!

VoodooMike · Post by **VoodooMike** » Tue Jul 14, 2015 2:33 am

wulfyn wrote:I think you'd be better with a chi-squared than a t-test for that.

You'd think wrong, then.

wulfyn wrote:The difficulty with a t-test is that the distribution should be normal, and that limits what you can look at.

Next time you want to feign understanding using google, add "central limit theorem" to your list of things to surf-and-skim, please.

Wulfyn · Post by **Wulfyn** » Tue Jul 14, 2015 6:27 am

I don't need google to understand the central limit theorem, other convergeance theorems, or indeed the basis of Gaussian distributions. If you take win/draw/lose results, even a lot of them, you are not going to get a normal distribution, ever. In the same way if you were to toss a coin or roll a dice you would not get a normal distribution, ever.

If you were to take those results, put them into sets of 100, and then map the distribution of the sets then you would get a normal distribution. But then you are not mapping the results, but the sets of the results, which is not the same thing at all. This only works due to the underlying stochastic mechanisms that drive Gaussian distributions, and pretty much any distribution when grouped and aggregated would work this way. One of the (minor) aspects of my job is calculating model deviance by using the central limit theorem to show that the aggregate model error is normally distributed, and testing on that basis.

But please do explain to me how your win/draw/lose result would work in a t-test.

plasmoid · Post by **plasmoid** » Tue Jul 14, 2015 7:11 am

Hi Dode and Mike,

Dode,

I read the text, but the much of the table is presented as means. It is therefore descriptive data.

Actually, you said:

Your FUMBBL data is only means, which cannot be inferential.

So, I present inferential data, alongside some descriptive data - and that rubs off so the inferential data becomes descriptive? Is that how it works? Weird. How about in the table below that when I present only the inferential data for clarity?

But anyway. Thanks for clearing this up. I stand corrected. In a good way. So I am using inferential data after all when examining the CRP data for anomalies. And I have been since NTBB2014. Actually what confused my was this post by Mike from 5 pages back:

What folks like plasmoid (and really the BB community in general) do is mix simple descriptives with distilled water, the blood of a virgin (likely themselves), and squirrel dandruff, then wave their hogwart's magic wand over it, and pretend the results are valid inferential statistics.

I figured he had spotted me using descriptive data as inferential data. That threw me. But obviously not then. As I said: Great.

Also, thanks for the suggestions generated over the past few pages. I will Work them into the page 2 weeks from now, when my schedule clears. Lots of good stuff.

Now. Last Things:

I'll believe that when I see it!

I can't wait. As the shift to inferential stats i 2014 was what caused the majority of recent changes, I don't expect any more. Certainly not to CRP+. But I have no reason to change the rosters either.

Anyway, you bring up comparing apples and oranges two times.
I'm not. Because I'm not comparing anything. As I've said repeatedly.

What misunderstanding? You said it yourself: "I use CRP data (even though there is not enough to do inferential statistics) as my most accurate available way of identifying the problem teams."

I'll clarify yet Again:
I rely on the inferential CRP data to best identify problem teams.
I don't want so say that I "justify" it with data, because I'm sure that "justify" implies actual proof. And I haven't done ANOVA, t-test or whatnot (which would be that 'comparison', BTW) - so I haven't got proof. Which I acknowledge.
So - I use (or whichever verb you, Dode, think does not carry a hidden and decietful implication of proof) the data - in TV-bands and with mirror matches removed - to best identify the problem teams.

Then I apply CRP+ (which has no connection to that data, only to the BBRC and myself).
And I apply the NT Roster Tweaks - to the teams identified, but the specific iterations of the tweaks are not match-data driven.

And then we play

I'm not out to prove anything. And I'm not collecting/comparing match-data.
I'm having fun, and I'm suggesting that others who share my mind set could have fun with these rules too.
For comparison - but not in actuality - you might call this PBBL13, since none of the PBBL editions were match-data driven.
In the same way, none of the changes between NTBB editions were driven by CRP+ match-data.
They were driven by the response/feedback from the players. Not to make the tweaks more precisely within the tiers. But to identify what didn't Work/wasn't fun.
In the majority of cases this was not about (as you suggest) 'what I liked'. Quite the opposite in fact. This was the "kill your darlings" process, that is tremendously helped by involving other people.
For example: My first shot at NTBB Halflings had Dryads. It wasn't that the addition of dryads didn't help the team. It was that everybody hated the idea. So they got cut and replaced with something else.

So, CRP data to locate problem teams.
Tweak the teams, apply CRP+, have fun.

I think I've gotten out of this conversation what I can for now. I'll get to the rewrite in a few weeks.
Cheers
Martin

babass · Post by **babass** » Tue Jul 14, 2015 8:17 am

plasmoid wrote: In the end I went with AV8 - and still would over the other suggestions in this thread - because AV8 is the tweak that changes the human team's playing style the least. It doesn't change the catchers role. He is basically the same.

Sure. Humen coachs will continue to have MAX one catcheur per team. For the one turn at turn 8/16.

dode74 · Post by **dode74** » Tue Jul 14, 2015 1:06 pm

So, I present inferential data, alongside some descriptive data - and that rubs off so the inferential data becomes descriptive?

Sigh. No. There is data there that is only means, and that cannot be inferential. There is also some inferential data, but there is a lot of data which is only means.

So I am using inferential data after all when examining the CRP data for anomalies.

You are, but there are other sources of error which might well mean your data is out. The CI given is the smallest possible assuming a random sampling of games from the "population", whereas we know that it's not a random sample but a self-selecting one: people choose the race they play and some people play only that race or a large proportion of games from that race. I'm sure there are other influences I am missing, but there you are.
Also, your samples are single samples over 24 races and 14 "bands" of TV. That's 336 individual samples to 95CI. There is a (very) large chance that at least one of those CIs does not contain the real mean for the population, but you don't mention that at all. The odds are about 2 in 3 that at least one of your 21 highlighted banded CIs does not contain the real mean for the population (0.95^21 = 0.34). We can't tell which, but you don't even mention it.

I rely on the inferential CRP data to best identify problem teams.

How, precisely, are you "identifying" these teams? Your page very strongly implies you are comparing their 95CI range to the tier range (as redefined by you). You label them as "problem" teams based on their 95CI range being outside the tier range. That's your justification, and it is a justification.

So, CRP data to locate problem teams.
Tweak the teams, apply CRP+, have fun.

And how do you know the problems have been resolved in any way? You don't. And if you don't know that your tweaks have resolved the problem then you're not actually addressing the problem at all. It therefore makes no difference whatsoever what tweaks you are making or to which teams, since you're not addressing what you claim to be the problem in the first place. What's the point of identifying a problem if you're not addressing it? And if you think you are addressing it then how do you know that you are if you are not measuring again? It's utterly illogical. Nothing wrong with a test/adjust cycle, and nothing wrong with "I prefer it that way", but it's nonsense to test, adjust and then just say "screw it, I like it".

BTW, I took a look at the NAF data from the 2013/2014 stuff posted on the NAF website (match level games). Here you go:

24 samples, so at least one of them probably does not contain the real racial mean

Edit: clearer pic
Edit 2: corrected minor error in chart

Digger Goreman · Post by **Digger Goreman** » Tue Jul 14, 2015 1:28 pm

*Sigh*, can I have my golems back at 100k now...?

Darkson · Post by **Darkson** » Tue Jul 14, 2015 3:59 pm

Necro better than Skaven....

dode74 · Post by **dode74** » Tue Jul 14, 2015 4:11 pm

Darkson wrote:Necro better than Skaven....

No, can't actually say they are any different.

plasmoid · Post by **plasmoid** » Tue Jul 14, 2015 10:41 pm

Hi guys,
Babass - well, there's something I can respond to with the testing I'm doing. There are human teams in my current NTBB league. And I've played one myself. None of those had just 1 catcher. Nor did they only use them on short drives.

Digger - I know that you're unhappy with the Golem price. But what in those stats indicates to you that necro ought to be better?

Cheers
Martin

Digger Goreman · Post by **Digger Goreman** » Tue Jul 14, 2015 10:57 pm

If nothing else is learned here, gentlemen, anything based on Plas's voodoo "statistics" is indefensible....

Citing spit, to prove spit, is just spit....

The only thing missing in this lesson is why no elfin one has ever asked for the ream of assumptions this so-called data is based on....

Now the optimist in me wants to think that Plasmoid had altruistic leanings.... The realist in me remembers a partisan BBRC.... The realist is ascendent here....

legowarrior · Post by **legowarrior** » Tue Jul 14, 2015 11:22 pm

Digger Goreman wrote:If nothing else is learned here, gentlemen, anything based on Plas's voodoo "statistics" is indefensible....

Citing spit, to prove spit, is just spit....

The only thing missing in this lesson is why no elfin one has ever asked for the ream of assumptions this so-called data is based on....

Now the optimist in me wants to think that Plasmoid had altruistic leanings.... The realist in me remembers a partisan BBRC.... The realist is ascendent here....

You know, sometimes its better to stay off the keyboard and let people think you are an ass, rather actually type something and provide them with proof.

koadah · Post by **koadah** » Tue Jul 14, 2015 11:35 pm

dode74 wrote: BTW, I took a look at the NAF data from the 2013/2014 stuff posted on the NAF website (match level games). Here you go:

24 samples, so at least one of them probably does not contain the real racial mean

That looks suspiciously like low TV data. Is that any use to anyone but low TV/leagues & trournaments?

Nerfing woods would probably be a bad idea in a Box/MM type environment. Anything that discourages people from playing elves hurts diversity and is a bad move.

With more diversity people may well take tackle earlier which in turn might reduce the elves's effectiveness.
What do the stats tell us about that?

Talk Fantasy Football

Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?

Re: Reconsidering the ruling on Khorne?