Talk Fantasy Football

Posted: **Mon Jan 21, 2013 2:45 pm**

I don't think that was a part of the scope, but if it is then you're right in that some form of data collection of that aspect is needed. Either way, any anecdotal evidence collection which has taken place so far has been purely self-selecting.

Posted: **Tue Jan 22, 2013 9:03 am**

plasmoid wrote:I know that we don't have sufficient hard data - but statistics isn't the only kind of data. The social sciences consider soft data quite valid, especially when combined with hard data. To wit, if I tell you something it's just an anecdote and subjective. But if lots of people tell you the same thing, then it becomes something more.

This is completely untrue. What you're calling "soft data" is not valid for any of the purposes you're attempting to use it for: anecdote and case study are simply general exploration so that people can get ideas on where to perform serious study... they are never used in final conclusions in any way. If lots of people tell you the same thing then all you have is a suggestion as to where you can start looking at the real numbers - the opinions of people do not represent serious data unless what you're studying is people's opinions.

plasmoid wrote:So I do think discussion and experience has a place in this.

Discussion and experience is what created the thing you're trying to fix.

plasmoid wrote:Box data would certainly be both TV-matched and from a skewed environment.

If your intention is to focus on short-term play then there's no problem with the data, especially if you, say, prune the dataset to only include teams with <x> or fewer games, where <x> is the number of games you expect in a single season (if a single season is your goal). The vast majority of teams in FUMBBL's box, or FOL, play 10 or fewer games. Additionally, the inducements are not meant to even out the success chance of the two teams when they're not TV matched, which means you'd have increasingly confounded data as you introduced larger TV differences, if you were trying to examine the actual team composition differences themselves.

dode74 wrote:This isn't the social sciences though: it's a game of probabilities using dice. Lots of people telling you the same thing is just lots of people who are able to tell you about it remembering that thing (perceptual bias) - it says nothing about what the general population thinks of that matter (selection bias).

This would actually fall under social sciences too - we're not looking at the strict dice outcomes, we're looking at the match outcomes based on a combination of human ability, game mechanics, and randomness and seeing if there are patterns... then attempting to map those perceived patterns to an actual cause. Easy to control for randomness... not so easy to control for human ability since nobody will agree to a means of measuring it (and thus having a way to control for it) so there's always going to be muddy waters when you try to use the results to look at small changes to the mechanics.

spubbbba wrote:Well it depends if the changes to NTBBL are concerned solely with balance or are also trying to make teams more enjoyable and interesting to play with and against. For that you do need to gather anecdotal evidence.

People's opinions are not very good data, because people's opinions are not some firm position they will stand by in the long run. People's first reaction to change is usually negative, an often based on no real experience with whatever the opinion is on. So, to get usable "opinions" you ask them a bunch of indirect questions, rather than "did you like it better than the old way?" which would be like asking men "If you had one complaint about our brand of condom, what would it be?" and expecting something other than "they're too small for me".

spubbbba wrote:In my eyes that’s why zons need changing more than any other team. Overall their win rate is about right, but they are easily one of the top teams at 100 TV, unless they are facing dwarfs. Whilst they get steadily worse at higher TV. The blandness of the roster really limits them and they don’t even have a differentiation of stats like other teams do.

Amazons are a poorly designed team... well, most BB teams are poorly designed, but they're one of the worst in so much as they have fast and easy access to two of the best randomness reducing skills very early in play.

dode74 wrote:I don't think that was a part of the scope, but if it is then you're right in that some form of data collection of that aspect is needed.

At the end of the day it'll be shooting into the dark, and using gut feelings, regardless. You aren't going to be able to tease out which specific game mechanics are responsible for which trends in the data, so someone is just going to say "well I think its <x>" and change <x>, and then you try to get people to play with that change until you have enough data to say "nope, wasn't that" or whatever... in this case, you're going to change 30 things, and then either it'll all miraculously work, or it won't, and the game of grafting more toes to a crippled monkey will continue.

It's almost like an extended comedy sketch.

Posted: **Tue Jan 22, 2013 6:18 pm**

VoodooMike has had some bad press of late, but I have to say I'm a fan

Posted: **Wed Jan 23, 2013 2:43 pm**

Shteve0 wrote:VoodooMike has had some bad press of late, but I have to say I'm a fan

Maybe you could get him to do a personal flame for you to use as your I am a ghastly spambot.

IMO he seems to have gone a little soft.

Posted: **Thu Jan 24, 2013 10:32 pm**

Hi all,
primarily VoodooMike and Dode,

VoodooMike said:

and the game of grafting more toes to a crippled monkey will continue

I suppose I should start by saying that what I'm doing is catering to those that do not think BB is like a crippled monkey. I believe I've seen you writing some pretty harsh words regarding BB as a game on the Cyanide forum recently. But I like it. So do others - (obviously). I completely agree that if someone were to redesign BB, NTBB wouldn't be the way to do it.

This would actually fall under social sciences too - we're not looking at the strict dice outcomes, we're looking at the match outcomes based on a combination of human ability, game mechanics, and randomness and seeing if there are patterns... then attempting to map those perceived patterns to an actual cause. Easy to control for randomness... not so easy to control for human ability since nobody will agree to a means of measuring it (and thus having a way to control for it) so there's always going to be muddy waters when you try to use the results to look at small changes to the mechanics.

Right. I shouldn't have brought up the social sciences.
The few scraps I picked up about it at uni were clearly too foggy.
But VoodooMike brings up a point that I wanted to make:
Playtesting isn't hard science. It isn't just done with massive ammounts of data.
The few people I know who have been involved with playtesting projects have been asked for comments/feedback - and I don't think any board game company would do enough playtesting to get results that would be statistically significant.

I also don't think you need to be able to isolate the effect of every single mechanic to gauge an effect as general as what I'm trying to do here.

Say you think Orcs are overpowered. You could take off a powerskill (block on their blitzers?) and stick on any infirior skill (sure hands). As long as you only push in one direction, and don't get caught up in balancing countermeasures, I believe you'll know the direction you pushed the team power in - though not the exact distance. Sure, sure hands isn't a bad skill to have sometimes, but going blockless and having sure hands redundancy would be a weakening of the team. I don't think that is controversial. And I think you have a lot of leeway for nerfing teams like orcs without risking suddenly and miraculously pushing them all the way through the bottom of tier 1.

So, what I'm trying to do is basic playtesting, with the aid of some statistics that can serve as a guideline, without ever being truly statistically significant.

Discussion and experience is what created the thing you're trying to fix.

To some extent yes.
But in the vault process there were a lot of changes being made - way more than NTBB - and discussion had to be done on limited background. I believe we know a heck of a lot more about CRP by now. And NTBB does not start from scratch, but builds on the CRP rules. So I think we have better foundation on which to make decisions.

Moving on, I also asked about data sources.
FOL sounds interesting. I don't know much about B, but I got the impression that developed teams could join B at a later point in their development. Is this right? (Honest question!)

Ideally I'd love data on lots of games, where each team has played no more than, say, 10 games, and they aren't more than 5 games apart. I think that would reasonably mimic a starting league scenario. The reason I'm interested in these stats is that my tier0/überteam nerfs are based on the assumption that even though we have (vague) stats putting the überteams teams in tier1 over their lifetime, they seem to be starting stronger and finishing weaker - making them "above tier 1" at the start of their existence. That would be interesting to check.

Cheers
Martin

Posted: **Fri Jan 25, 2013 2:59 am**

plasmoid wrote: I don't know much about B, but I got the impression that developed teams could join B at a later point in their development. Is this right? (Honest question)

You can't join Black Box with an already developed team, every Black Box team is created from scratch with 1,000,000 GP as normal, then it's paired by same/the closest TV possible with another Black Box team.
About FOL: since the Cyanide's client is still missing Chaos Pact, Slann and many Star Players it's not a proper league to draw statistical conclusions from, it's flawed.

Posted: **Fri Jan 25, 2013 3:01 am**

plasmoid wrote:I suppose I should start by saying that what I'm doing is catering to those that do not think BB is like a crippled monkey. I believe I've seen you writing some pretty harsh words regarding BB as a game on the Cyanide forum recently. But I like it. So do others - (obviously). I completely agree that if someone were to redesign BB, NTBB wouldn't be the way to do it.

You (obviously) like the game so much that you have a long list of changes you've been proselytizing since Jesus stepped off the Mayflower? Can you explain the difference between that and redesigning in 200 words or less without ever using the terms "spirit" or "essence" while making vague hand gestures?

Blood Bowl is not balanced around anything serious, and we know that from every source of stats we've ever looked at. You end up with two rational options - you either say "eh, who cares?" and you just play it for kicks, or you say "we should aim for better balance" and actually do that. NTBB is neither of those - you've got no support for the idea that it provides better balance other than you kind'v feel like it does, and a vote at the local legion, after a few beers, said "could be!" according to 7/10 drunken veterans.. but you're not content to just play the game as is for kicks or there'd be no need to change anything at all. So... yeah, I don't know what the hell you're doing.

plasmoid wrote:Playtesting isn't hard science. It isn't just done with massive ammounts of data.
The few people I know who have been involved with playtesting projects have been asked for comments/feedback - and I don't think any board game company would do enough playtesting to get results that would be statistically significant.

Right, which is why you math it out at design time so you don't need playtesting to determine if things are balanced - playtesting determines if people enjoy the game and can figure out how to play it, and maybe finds any major errors you made during design. Trying to balance things using playtesting data is insane. Consider the following two space programs:

1) Engineers use math and physics to design a rocket, run simulations, and then finally build a prototype and launch it.
2) Jeb and Zeke from the junkyard build rocket after rocket, seeing if this next one is the one that doesn't explode on the launchpad.

While I have no doubt there are plenty of people who take the second approach, they're basically throwing crap at the wall and seeing what sticks. In terms of game design, they're crap designers even if they eventually find something that works. They're an infinite number of toe-grafted crippled monkeys on and infinite number of typewriters hoping to eke out some Shakespeare.

plasmoid wrote:I also don't think you need to be able to isolate the effect of every single mechanic to gauge an effect as general as what I'm trying to do here.

No, but you need to be able to point to the effect of a mechanic to say you're justified in altering it, one would think, especially if you have a stated goal... which you do by the title of your project alone. You're making changes, but saying you don't need to show that those changes individually contribute to a specified goal... just that collectively they do. That boils down to saying, whenever anyone questions your logic "shh, just let daddy drive the bus" and if anyone asks how you came to the conclusion that it would move it in that direction saying "I used the force".

plasmoid wrote:Say you think Orcs are overpowered. You could take off a powerskill (block on their blitzers?) and stick on any infirior skill (sure hands). As long as you only push in one direction, and don't get caught up in balancing countermeasures, I believe you'll know the direction you pushed the team power in - though not the exact distance. Sure, sure hands isn't a bad skill to have sometimes, but going blockless and having sure hands redundancy would be a weakening of the team. I don't think that is controversial. And I think you have a lot of leeway for nerfing teams like orcs without risking suddenly and miraculously pushing them all the way through the bottom of tier 1.

This is how the game got to where it is already, plasmoid - using how you FEEL things are rather than using actual data. At the end of the day you're basically betting on having a stronger connection to The Force than, say Galak, since all of you are basing things on gut feeling and past experience as being the primary source of "data". What effect does block have on the w/l/d numbers for a roster? You FEEL it has a positive effect so you will compensate by replacing it with something you FEEL will have less positive effect. You don't think that's controversial and within a community that doesn't know any better it might not be... but its a bad way to go about designing and redesigning things.

As for the tiers.. well, they're mighty arbitrary. You do have a lot of room there... plus you can always subdivide them further to make it seem like you meant for things to be the way they turn out! If orcs end up being Tier 1.725 or Tier 0.934 then that's ok... design decision.

plasmoid wrote:So, what I'm trying to do is basic playtesting, with the aid of some statistics that can serve as a guideline, without ever being truly statistically significant.

Heh, you use the numbers as a vague guide for your specific gut feelings... Your method is the exact opposite of how things are supposed to go. That's why I may seem somewhat critical of the process.

plasmoid wrote:But in the vault process there were a lot of changes being made - way more than NTBB - and discussion had to be done on limited background. I believe we know a heck of a lot more about CRP by now. And NTBB does not start from scratch, but builds on the CRP rules. So I think we have better foundation on which to make decisions.

You may well have a future in politics.

plasmoid wrote:FOL sounds interesting. I don't know much about B, but I got the impression that developed teams could join B at a later point in their development. Is this right? (Honest question!)

I was under the impression it was the other way around - that people can join FOL with a team they've developed outside of it. At the end of the day it doesn't much matter - you're asking for very specific data, while stating, flat out, that you don't care what the data SAYS to begin with. Just use a Twister spinner and run from there - you're going to base your changes on your gut anyway, right?

plasmoid wrote:Ideally I'd love data on lots of games, where each team has played no more than, say, 10 games, and they aren't more than 5 games apart. I think that would reasonably mimic a starting league scenario. The reason I'm interested in these stats is that my tier0/überteam nerfs are based on the assumption that even though we have (vague) stats putting the überteams teams in tier1 over their lifetime, they seem to be starting stronger and finishing weaker - making them "above tier 1" at the start of their existence. That would be interesting to check.

MM data works just fine to show you that that is the case. Again, I point out that if your intention is to apply any sort of balance between rosters, you want them to always be playing at even TV so that you aren't confounded by the effects of inducements... and that's what you're going to end up with in MM scenarios. It also doesn't matter how many games the team has played - in fact, the more the better, as you're more likely to see what CAN be done with a roster rather than just what someone has managed to do with their current luck. To try to limit your data to scenarios in which less might have happened is to simply hope that randomness hides any effects that might not match what you're hoping to see.

Teams that have a crapton of games under their belt don't represent "skewed data", they represent what a team of their TV could be if they'd gotten the rolls they wanted, which is always a possibility. That's what you want to use as data, moreso than something that got crappy rolls and isn't living up to the roster's potential at that TV. That way you end up with a game in which one team isn't going to roll lucky and destroy the entire league as a result, solely because you refused to look at what could be and instead on what is simply more likely to be.

But I still think your method for deciding what changes to make is extremely sketchy.

Posted: **Fri Jan 25, 2013 3:22 am**

VoodooMike wrote: I was under the impression it was the other way around - that people can join FOL with a team they've developed outside of it. At the end of the day it doesn't much matter - you're asking for very specific data, while stating, flat out, that you don't care what the data SAYS to begin with. Just use a Twister spinner and run from there - you're going to base your changes on your gut anyway, right?

Just to clarify: teams joining FOL must be fresh, you can't join it with already developed teams.
From FOL thread (http://forum.bloodbowl-game.com/viewtopic.php?f=5&t=61):
"How do I apply?
After making your fresh team you should use the league finder to search".

Posted: **Fri Jan 25, 2013 5:02 am**

VM:

You mays as well leave the thread.

1) Previous data means jack all. Sure, I have 10,000 game data set with orcs (example!), but how does that help me know the effect of the change? All you can do is determine what might be a candidate for nerfs/boosts, and while data is king in this....because it's black and white (nerf or no), there's a very good chance that the gut feeling is right.

2) This a game people play in their spare time, and we are discussing a minor variations to the ruleset that maybe 100 people total will ever read. This is as far from NASA and rocket science as I can possibly imagine. There's no way to collect the data without writing complete AI's to play the games so we can attempt to gather information. I mean, it bugs me the way Plasmoid tests the rosters (as, without equal strength players, the 'feedback' will be heavily skewed), but I can't see a better way.

3) I'm with Plasmoid on setting an upper limit to the games played. This is because, teams don't just exist in a happy "I get better every game!" loop. Finger in the air, every 30-40 games, you need a 10+ game rebuild. This is as far from scientific as you get...just my observations and the numbers differ from a player...but a team's TV usually emulates a sine wave. By limiting the amount of games they've played, you reduce the chances of those teams having gotten unlucky and got battered early. I would be willing to bet that over 100 games old, there is no correlation between number of games and TV of the team.

I know, I know, you want Bloodbowl to be a game designed off data, and I think we'd all like to agree with you...but where do you get the data for the changes, and what would you do to be able to draw conclusions from it?

Posted: **Fri Jan 25, 2013 5:12 am**

I have to say, the whole notion of "tiers" is blood bowl is predicated on the understanding that teams operate at specific win percentage; any effort to narrow these tiers must surely be based on these percentages, for which a degree of statistical backing is pretty essential. Otherwise how is NTBB "narrow tier" at all? To me that suggests that the tiers themselves (as opposed to the gaps between the tiers, which is the opposite) is narrow, and I'm sorry if I'm taking your mission statement too literally, but this current chopping and changing based as much on dismissing the dataset as using the data itself seems like a scandalously haphazard way of trying to achieve that.

Posted: **Fri Jan 25, 2013 2:43 pm**

Hitonagashi wrote: 2) This a game people play in their spare time, and we are discussing a minor variations to the ruleset that maybe 100 people total will ever read. This is as far from NASA and rocket science as I can possibly imagine. There's no way to collect the data without writing complete AI's to play the games so we can attempt to gather information. I mean, it bugs me the way Plasmoid tests the rosters (as, without equal strength players, the 'feedback' will be heavily skewed), but I can't see a better way.

Get working on that AI Hito!

Hitonagashi wrote: 3) I'm with Plasmoid on setting an upper limit to the games played. This is because, teams don't just exist in a happy "I get better every game!" loop. Finger in the air, every 30-40 games, you need a 10+ game rebuild. This is as far from scientific as you get...just my observations and the numbers differ from a player...but a team's TV usually emulates a sine wave. By limiting the amount of games they've played, you reduce the chances of those teams having gotten unlucky and got battered early. I would be willing to bet that over 100 games old, there is no correlation between number of games and TV of the team.

But for the most part it's teams over 30 games that people are actually complaining most bitterly about.

So yeah, we need that AI. We don't want to suddenly find that after 60 games halflings are unbeatable.

Posted: **Fri Jan 25, 2013 3:22 pm**

..Completely lost the plot on all the calculator throwing and arguments about where to put each others decimal point and went and reab the quick NTBB pdf.....feels ok.

So...for the Fling-Fest 2014, we'll try the NTBB Flings for a giggle...coz that's what BB is..a giggle.

Right, I'll hand you back to the maths-match...

Carry On!

Posted: **Sat Jan 26, 2013 5:19 pm**

Hitonagashi wrote:You mays as well leave the thread.

You mays [sic] as well bite me, but neither of us is going to follow the other's advice.

Hitonagashi wrote:Previous data means jack all. Sure, I have 10,000 game data set with orcs (example!), but how does that help me know the effect of the change? All you can do is determine what might be a candidate for nerfs/boosts, and while data is king in this....because it's black and white (nerf or no), there's a very good chance that the gut feeling is right.

You use the data to find a way to look at the effect of whatever you're changing, in past games. That lets you know what effect is likely to occur if you make the change in question. You're suggesting that there's no way to know what effect a change will have without making a change and then running stats on a large amount of data collected AFTER the change, and we know from... well... the WORLD... that that's not the case. Gut feelings are a grab bag of right and wrong... like coin tosses.

Hitonagashi wrote:This a game people play in their spare time, and we are discussing a minor variations to the ruleset that maybe 100 people total will ever read. This is as far from NASA and rocket science as I can possibly imagine. There's no way to collect the data without writing complete AI's to play the games so we can attempt to gather information. I mean, it bugs me the way Plasmoid tests the rosters (as, without equal strength players, the 'feedback' will be heavily skewed), but I can't see a better way.

As I said, I'm not sure what he's getting at. He's not leaving it alone, he's not changing it based on a superior mechanical principle... he's just tossing patches on top of something that is a mess of patches. Has each version of the game's rules made things.. better? I guess its a matter of opinion... to me it seems like each version makes some things better, and some things worse, which is consistent with gut feeling patching. What will NTBB do? Make some things better, some things worse, likely. This is what I find so silly... people just keep moving the problems around.

You don't need an AI to fix anything... that's predicated on the idea that you can't do anything but make changes and then test them. AI also has no imagination - it's human players who will find flaws and combinations and use them.. and learn from other people's successes.

Hitonagashi wrote:I'm with Plasmoid on setting an upper limit to the games played. This is because, teams don't just exist in a happy "I get better every game!" loop. Finger in the air, every 30-40 games, you need a 10+ game rebuild. This is as far from scientific as you get...just my observations and the numbers differ from a player...but a team's TV usually emulates a sine wave. By limiting the amount of games they've played, you reduce the chances of those teams having gotten unlucky and got battered early. I would be willing to bet that over 100 games old, there is no correlation between number of games and TV of the team.

Totally irrelevant. Any team of the same TV could have arrived there in the lowest number of games that any of those teams have, in the form it is. While some are far less likely than others, the fact remains - they could all BE part of that group - the games played is irrelevant. By removing teams based on number of games played, despite them being valid data for that particular use, you're simply increasing the amount of error and reducing the power of the model. What's the point of that?

Hitonagashi wrote:I know, I know, you want Bloodbowl to be a game designed off data, and I think we'd all like to agree with you...but where do you get the data for the changes, and what would you do to be able to draw conclusions from it?

I'd rather something more profound than that, but yes, at the very least I'd like to see changes based on actual data rather than gut feelings and random changes. I don't think everyone agrees with me on the data thing - there are plenty of people who dismiss the concept of data and statistics in favour of anecdotal feelings from players. While that's peachy if you're just trying to pander with your changes, but they won't bear out in the long run.

You have a massive source of data, and it could be expanded if the FUMBBL folks and/or Cyanide would increase the amount of easily (well, reasonably) available information. How do you turn that into "what should we change"? Without getting too technical, there are routes of investigation using the data, and there are logical conclusions you can come to based on how the game works (for example, if you're looking at how the win% changes as TV changes, you're almost certainly looking at skill access since not much else affects TV and influences win%, universally) which, in turn, gives you an idea of where to start looking with your statistical models, and so on.

The more important question sort'v becomes: should you make changes when you don't have anything but a get feeling to guide your hand? Obviously opinions vary on that... my opinion is "no".

Posted: **Sat Jan 26, 2013 5:42 pm**

VoodooMike wrote:for example, if you're looking at how the win% changes as TV changes, you're almost certainly looking at skill access since not much else affects TV and influences win%, universally) which, in turn, gives you an idea of where to start looking with your statistical models, and so on.

I agree. Bad pricing of the various players (Orcs cheaper than Humans, for example), wrong relative costs of the skills (e.g. TV 1000 Amazons vs TV 1000 Dwarfs) and bad calculation of skill stacks (e.g. clawpomb) are other flaws.

Posted: **Sat Jan 26, 2013 7:58 pm**

Thing is Mike I don't know how many people actually have more faith in all this maths than the 'gut feeling suck it and see'.

Talk Fantasy Football

Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013

Re: Thinking ahead: NTBB 2013