More Yomi Data

mastrblastr · February 4, 2019, 4:36pm

it does show Troq’s dominance over zane but it hides Oni’s dominance over Zane because there isn’t a correlating 20-5 stat for that.

vengefulpickle · February 4, 2019, 5:11pm

Ah, I see. Yes, there’s not much I can do if no one actually played out any particular character at the highest levels.

charnel_mouse · February 4, 2019, 5:39pm

Maybe it would be useful to give the most even matchups between particular player/character pairings, as well as just characters?

Having little data for certain player/character pairings is something difficult to deal with. One pipe dream I have for the Codex model, which might be easier for Yomi, is to get willing players to play matchups the model is highly uncertain about, so as to efficiently get more information. E.g. for any two players you find the characters that the model thinks should give a match as close to 5-5 as possible, and see what happens.

vengefulpickle · February 4, 2019, 5:45pm

Do you mean for something like a tournament setting? Or are you thinking more back in history?

Oddly, it seems like the model is fairly certain about matchups (like Oni-Zane) where the experts are fairly certain that it’s wrong… One possible thought there would be to figure out if there’s a way to cause it to weight games played by highly skilled players (either highly skilled at the time of the match, or globally highly skilled in the character) higher. I’m not sure how to set that up in a model, though.

charnel_mouse · February 4, 2019, 5:59pm

Yes, but is it? It seems pretty certain about Troq’s general rating, but how certain is it in a specific match, with specific players? Pretty certain if it’s Deluks playing him, but maybe not if he’s played by someone else. I’m talking about exploring matchups with high uncertainty given the two players in the match. For example, from the skill charts I’d expect Hobosu or vengefulpickle (Troq) vs. thehug0naut (Zane) to have a higher degree of uncertainty, so seeing that matchup a few times could help to disentangle Troq’s rating from Deluks’s skill with the character.

ArthurWynne · February 4, 2019, 6:05pm

Is the data set even large enough that we can reasonably expect the model to make useful predictions when we drill down to this ultra-specific level of “player A with character X vs player B vs character Y”?

vengefulpickle · February 4, 2019, 6:05pm

The problem there, I think, is that what actually matters in a matchup-chart is character ability at a high level of play. So, having me or Hobusu play wouldn’t actually inform much, since we’re both duffers.

I wonder if I had a scaling factor on the overall prediction based on each of the players Elo and/or character skill, which would force the model to predict lower-skilled players as closer to 50-50s, whether that would in turn force the model to be more uncertain about data it’s getting from those low-skill matches. Basically, I don’t know what levers I have towards making the model more or less certain about particular types of games.

charnel_mouse · February 4, 2019, 6:11pm

Ah, that’s fair. This is a guess, but I wonder if limiting the player skill levels per character with a maximum would help. So, character matchup information is for theoretical optimal play, and all the player skill levels are negative and predict how far below optimal play they are. Not sure offhand what sort of prior you’d want on that, though. In that case, I’m not sure.

vengefulpickle · February 4, 2019, 8:08pm

Huh… That’s an interesting idea, making all the skills deficits from optimal.

If the MU numbers in the model are the optimal, than we can model players with both overall skill (Elo) and per-character skill gaps (as negative values). We might also be able to weight observations based on the overall player skills, or something, so that we count high-skilled matches more… but I’m not sure if that would help or not.

I think the thing to do is to just write up the models and I’ll try them out, and see what’s most predictive, and what the resulting MU charts look like…

Niijima-san · February 4, 2019, 11:05pm

Keep trying models until you finally find one that shows Ven beats Arg. Then you’ll know you’re getting close.

vengefulpickle · February 4, 2019, 11:49pm

thehug0naut · February 5, 2019, 12:43am

Its really interesting looking at this data! Thanks @vengefulpickle for having another stab at modelling this stuff. I’ve got some observations I thought I’d document in case they help refine the model…

I was looking through my own data, it correctly suggests that my top characters are with the rest seeming too variable to reliably discuss. However, I would have said that I had quite a high skill level with Rook when I mained him, whereas this data shows me as barely above a mean of 0 skill with him.

I then thought to look for anything similar in other players. The one that caught my eye was @Fluffiness. Given their rep as almost certainly the strongest player in Yomi, I was surprised to see their mean skill for are ranked higher than for perse.

Could these be examples of character matchup weaknesses influencing player skill?

mastrblastr · February 5, 2019, 1:36am

The princess MU chart has always had ven 5.5 vs arg!

Niijima-san · February 5, 2019, 1:40am

I am dead serious. As always.

Fluffiness · February 5, 2019, 4:46am

You figured it out, I’m a secret Gwen main.

Also Perse sucked. Worst character in the game.

Nopethebard · February 5, 2019, 5:41am

Why past tense?

thehug0naut · February 5, 2019, 9:16pm

The model reveals our darkest secrets

In all seriousness, this is kind of my point. Surely the very definition of high skill would be getting good results with a “bad” character? If anyone should be ranked high in skill with her I would think it would be you. @vengefulpickle do you have any thoughts on this? I’ve not looked at the results numbers for Fluff yet but I assume this is to do with how bad the chart suggests perse is vs how well Fluff uses her?

vengefulpickle · February 6, 2019, 11:06am

The interpretation of skill in that chart is more or less “how much does this player bend the underlying matchup number”. So, in the model, the difference in each players skill with their respective characters gets added onto the win chance. Because there’s no ground-truth on what the matchup numbers are, the model might very well factor some of the reason for a loss into changing the player skill value and some of it into changing the absolute matchup number.

One type of visualization I could do that might be interesting would be to compute the MU chart for a player, assuming they were playing against a character generalist (no character skill advantage) of the same overall player skill (no Elo advantage).

In theory, actually, I could do a “right now, what is your personalized MU chart look like when you play against this other specific player”, too. That could be fun, although perhaps less enlightening than the earlier chart.

vengefulpickle · February 6, 2019, 7:08pm

Ok, so the last set of charts I posted were totally bogus. I added a bug when I added Elo into the calculation that ended up miscounting matches and such, so feel free to discard.

This set is using the technique that @charnel_mouse and I discussed to treat the MU chart as being the max-skill chart, and making all player/character skill levels subtract from that. So, all of the player skill level charts droop down, and the closer the blob is (and the more compact it is) toward the top of the line, the more confident the model is that you’re playing at top-level for that character.

Also, it passes the @Niijima-san sniff test by putting Ven-Arg at 5.5-4.5. Mission Accomplished!

MU Chart

Character Skill Levels

Bomber678-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 CKR-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 cpat-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 Fluffiness-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 Hobusu-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 mysticjuicer-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 Niijima-San-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 snoc-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 thehug0naut-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166 vengefulpickle-char-skill-7cdf015708c8dccf4c9d159425d662ae-5f99886f3138768713679ed344c19166

Most Even Matchups

Character	Counterpick

The Model

data {
    int<lower=0> NG; // Number of games
    int<lower=0> NM; // Number of matchups
    int<lower=0> NP; // Number of players
    int<lower=0> NC; // Number of characters

    int<lower=0, upper=1> win[NG]; // Did player 1 win game
    int<lower=1, upper=NM> mup[NG]; // Matchup in game
    vector<lower=0, upper=1>[NG] non_mirror; // Is this a mirror matchup: 0 = mirror
    int<lower=1, upper=NC> char1[NG]; // Character 1 in game
    int<lower=1, upper=NC> char2[NG]; // Character 2 in game
    int<lower=1, upper=NP> player1[NG]; // Player 1 in game
    int<lower=1, upper=NP> player2[NG]; // Player 1 in game
    vector[NG] elo_logit; // Player 1 ELO-based logit win chance
}
parameters {
    vector[NM] mu; // Matchup value
    vector<upper=0>[NP] char_skill[NC]; // Player skill at character
    real elo_logit_scale; // elo_logit scale
}
transformed parameters {
    vector[NG] player_char_skill1;
    vector[NG] player_char_skill2;
    vector[NG] win_chance_logit;

    for (n in 1:NG) {
        player_char_skill1[n] = char_skill[char1[n], player1[n]];
        player_char_skill2[n] = char_skill[char2[n], player2[n]];
    }

    win_chance_logit = (player_char_skill1 - player_char_skill2) + non_mirror .* mu[mup] + elo_logit_scale * elo_logit;
}
model {
    for (n in 1:NC) {
        char_skill[n] ~ std_normal();
    }
    mu ~ normal(0, 0.5);
    elo_logit_scale ~ std_normal();

    win ~ bernoulli_logit(win_chance_logit);
}
generated quantities{
    vector[NG] log_lik;
    vector[NG] win_hat;

    for (n in 1:NG) {
        log_lik[n] = bernoulli_logit_lpmf(win[n] | win_chance_logit[n]);
        win_hat[n] = bernoulli_logit_rng(win_chance_logit[n]);
    }
}

Niijima-san · February 6, 2019, 7:40pm

I like this chart. It says many things that I like, like DeGrey is essentially unCPable, Quince/Troq is like da worst, and Valerie is largely pants (except as a Vendetta CP).

I also like the skill droopies, as it says the one truest thing I have ever seen in Yomi: that MysticJuicer’s secret most skilled character is Argagarg. And anyone who has ever seen BlockJuicer play knows that absolutely must be an inescapable fact.