Comments on: Modern MTGO Deep Dive: Win Rate Analysis https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/ Play More, Win More, Pay Less Sun, 13 Sep 2015 15:06:57 +0000 hourly 1 By: Sheridan Lardner https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120555 Sun, 13 Sep 2015 15:06:57 +0000 http://34.200.137.49/?p=1315#comment-2120555 In reply to Timur Nurmagambetov.

Happy to address these concerns!

1. For those earlier articles, I was using a different dataset time and refining methods. I eventually realized I couldn’t just use public MTGO data from the mothership because the results were quite different than when using the data from the client. I also realized it wasn’t enough to look at raw MWP values because the confidence intervals around some of those MWPs were so wide. I leave those articles up not because they are right, but because I think it’s important to show how the thought process evolved on these kinds of questions.

2. The null hypothesis is that the deck’s MWP is equal to the average MWP of all decks on MTGO. Decks that are significantly lower might not be optimal choices. Decks that are significantly higher might be the better players in Modern.

3. I’m certainly looking (and hoping!) for low P values, but I’m going to report on any findings I come across. Although I would prefer the P values to be < .05 or < .01, the < .10 values are still worth reporting on and are small enough to suggest a legitimate outlier. Some of them, notably Amulet Bloom, continued their downward trend and ended up with some really exciting P values around their MWP (Amulet was .03 by May).
4. As far as I can find, no one who conducts these sorts of metagame analysis publishes the dataset (including Frank Karsten and MTG Goldfish). Because it’s not public data, there’s a lot of work that goes into collecting it and I don’t think authors and sites want to totally turn that data over to the public. It would be a bummer if we did all the work collecting the stats and some other site took the data and analyzed it on their own! So I’m not making the data public. I would love to incorporate these numbers and results into a database of some kind, but we are pretty far away from that from a site development perspective.

I’m sorry you didn’t enjoy the articles and the methods. I have extensive experience in both stats and stats as applied to various social sciences, and hoped this would be a fun way to bring those methods to a broader public while not making them too technical. Given the overwhelmingly positive reception, I think we did a pretty good job. Let me know if you have other concerns or questions!

]]>
By: Timur Nurmagambetov https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120554 Sat, 12 Sep 2015 18:34:06 +0000 http://34.200.137.49/?p=1315#comment-2120554 Your statistics findings are quite questionable
You public three articles with three different results
Naming three different decks (Blue Tron, Temur Twin and Infect) to be the best deck
You calculate confidence intervals then throw them away calling them old and present new ones
What is your Null hypothesis you calculate p-values for?
Your use of p-values is quite strange, you say that you are looking for low p-values, then post results with huge p-values like they are ok too
Could you public the raw data of modern matches?
These articles are written like you are taking a statistics class and just want to exercise with some numbers without a real understanding how methods work)

]]>
By: Joshua Davenport https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120553 Wed, 15 Apr 2015 04:33:31 +0000 http://34.200.137.49/?p=1315#comment-2120553 In reply to Sheridan Lardner.

I was thinking more about your open questions – items that affect the MWP, and one I would like to throw in the ring is expense of the deck.

TL;DR “The greater the price of the deck, the more likely there is an above average pilot behind it. The cheaper the deck, the more likely there is a below average pilot. This has the potential to make the more expensive decks have a MW% higher than the true MW%, and potentially the cheaper decks a lower MW% than if all pilots were equally skilled.”

Following generalised reasons:

a) those that play in mtgo modern tournaments are a subset of the competitive magic community. You do not see many kitchen table players just “having a go” in modern dailies. I would expect this subset to have a greater skill level on average than the crowd at an event that is open to the public (an SCG event for example).
b) Of this subset of “better on average” players, those that are more committed to magic are more likely to use their funds to get the more expensive decks in the format. Pricing the decks in the format (mtgo prices from mtggoldfish):

Burn: $174.02
Grixis delver: $210.04
Merfolk: $260
Bloom Titan: $277
Infect: $320.99
Affinity: 321.57
Abzan Liege: 400
Twin : $492.81
Tarmo Twin: $660
Jund: $763.00
Junk: $907.38

(the pricing is not perfect, but serves to illustrate my point).

If you are playing junk/jund online, you are a serious player. You aren’t playing it because you would like to have a go at modern – you are on it because it is the best deck in the format.

I expect the more expensive the deck, the better the average of the player within this group. Therefore, I would expect the super expensive decks to be a few % points higher than if players if the decks were played by equally skilled pilots. To rephrase, I think it is possible that these the expensive decks (jund and junk) win 1 or 2 matches out of 100 just because they have more skilled pilots than the pilots on other decks. The amount of money you have to spend on MTGO is a limiting factor when choosing a modern deck. I am sure there are well off people that choose the cheaper or more expensive decks. But i am also sure there are players that play cheaper decks because that is what they can afford.

So i think Junk and Jund are probably a percentage or two too high. You will respond “that drags Junk down to 47%! that cannot be right.” I do not think it is possible to prove or disprove this hypothesis at present, but it is just conjecture.

The alternate is true for the cheapest decks in the format. Look at the sweetheart (and most played) deck in the format – Grixis delver. It has the highest non-tier 1 N value, while costing little more than burn. The cheaper buy in is sure to appeal to the “average player on a budget” that doesn’t want to play burn. I think these two decks are probably a few percentages too low

I think it also helps explain why tarmo twin is so dominant: it is not only a metagame call, but also probably skilled by better players that can afford the additional $180 for pixels.

Abzan liege – The exception – cheaper than junk, but out performing it on all fronts. In my opinion, this deck is 100% a metagame call. With positive matchups against Affinity, burn, Junk & twin (4 of the 5 tier 1 decks), the question isnt how high is this MW%, but “what the hell is this deck losing against?” If any other deck had positive matchups against 4 of the top 5 tier 1 decks, there wouldnt be 5 decks in tier 1 – there would be this deck and the anti this deck. If Junk’s presence in the metagame slows down (which i think it will), there is a good chance this deck just becomes fringe playable.

Regarding reasons bloom titan is underplayed for its success
Bloom Titan – Fear. There is a lot of irrational fear over bannings at present. This deck seems to good to be true. The ban announcement (that some feared summer bloom) was 23 March – your collection data started on 24 March. This is a possible reason the deck appeared underrepresented for the period

]]>
By: Roland F. Rivera Santiago https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120552 Tue, 14 Apr 2015 19:37:08 +0000 http://34.200.137.49/?p=1315#comment-2120552 In reply to Sheridan Lardner.

I’d be excited to see a piece on “notable untiered decks” using your data. I feel that a few decks like 8-Rack, Orzhov Tokens, Sultai Control, Martyr Life/Soul Sisters, and Mono-U Tron are lurking on the margins and could make some noise in the coming months, and you could potentially see it coming in the MTGO data.

]]>
By: amalek0 https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120551 Tue, 14 Apr 2015 18:32:48 +0000 http://34.200.137.49/?p=1315#comment-2120551 In reply to Sheridan Lardner.

Leave the forum pages to be forum pages, leave the analysis to be analysis. Not that I mind Ktkenshinx bearing the torch of statistics on the mtgsalvation forums the same way I bear the torch of formal definitions (yo, what makes it a TEMPO deck?).

One point I would like to make is the use of MWP and GWP–I’m not super familiar with MTGO, but is it possible to separate out whether a deck is on the play or draw in game 1, and how that affects the MWP of a deck? It might provide some nice hard numbers for deciding what the actual “speed” of the format is at a given point in time.

]]>
By: Sheridan Lardner https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120550 Tue, 14 Apr 2015 16:02:52 +0000 http://34.200.137.49/?p=1315#comment-2120550 @Josh: STATS ARE BEAUTIFUL! A MAN AFTER MY OWN HEART!

I agree with your assessment of Twin. This becomes a tricky statistical point, however, because the data does not suggest the deck is particularly good or bad. It just looks kind of average. But as we know, there are all sorts of other factors that are bringing down Twin’s overall performance even if the deck is still awesome. This is exactly the kind of caution it is important to keep in mind when analyzing deck performance, and I’m happy there are guys like you out there who think through this so critically.

It has been exciting to see Jund rising up the ranks. This makes a lot of sense given the comparative advantages to running Bolt and Blackcleave Cliffs over Path and a super painful shock/fetch manabase. It is also interesting that as Jund has gone up, Abzan has stayed relatively flat or even declined a bit. This suggests there is some ceiling on BGx style decks in the format. Or it’s just an artifact of player preference and metagame trends, and maybe the collective Jund/Junk share will rise later on. As for Merfolk, this was a deck I identified a while ago as a probably riser, for the very reasons you talked about. In particular, it has a good Burn matchup while still keeping the linear and aggressive elements that make Modern decks successful.

Forums are a tricky one. On the one hand, they are in line with our mission for providing quality Modern content. On the other hand, they are huge undertakings and many content sites have actually gotten out of the forum business (or forum sites getting out of the content business). So it’s an ambitious proposition that we are still considering, but the more feedback we get, the more information we will have to make a decision!

]]>
By: Sheridan Lardner https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120549 Tue, 14 Apr 2015 15:46:21 +0000 http://34.200.137.49/?p=1315#comment-2120549 In reply to Roland F. Rivera Santiago.

@Roland: Happy the data is interesting! For Mono U Tron, the reason I excluded it is because it’s neither Tier 1 nor Tier 2 right now by our classification system. I didn’t highlight it or any of the other untiered decks in this current piece, but I’d like to revisit them later.

Incidentally, Mono U Tron has super high variance once you look at a dataset like this. You see some players piloting it regularly and doing really well. But then you see a bunch of other players who flop out with it at 2-2 or worse. I think this is because the deck has the most favorable ratio of competitiveness to cost in MTGO, so it’s a good intro deck for new Modern players. These newer players may lack experience and could bring the deck performance down overall, even though other players are quite successful with it.

]]>
By: Sheridan Lardner https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120548 Tue, 14 Apr 2015 15:44:02 +0000 http://34.200.137.49/?p=1315#comment-2120548 In reply to amalek0.

@amalek0: Interesting. This would definitely explain why the deck’s paper prevalence is falling, especially at larger events. The more rounds you have to win as an Amulet player, the more that high variance can screw you out of top finishes. We see similar issues with other supposedly broken decks like Griselbrand Reanimator, a deck that was big on MTGO in Summer 2013 but then never converted that into paper finishes past a T16 by Todd Anderson.

That said, I don’t think this would explain a similar decline in Amulet’s MTGO share. If anything, MTGO has been a historical stronghold for Amulet players, where you just need to win 3 rounds to get tickets. With so many events and such a low buy-in cost, this should be the perfect place to play higher variance decks and just hope for the steamroll and/or the good matchups. But we also see an MTGO decline in the deck. Perhaps this is related to the paper decline; MTGO players see the deck not putting up paper results and then avoid it themselves, even if the conditions that inhibit paper success might be less present online.

]]>
By: Josh D https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120547 Tue, 14 Apr 2015 05:08:22 +0000 http://34.200.137.49/?p=1315#comment-2120547 Hey! As a solicitor, with a science degree (advanced mathematics major) that only plays modern – if there was ever an article written specifically for me, this one feels like it!!!

When you start to see the analysis like this come out of the statistics, it becomes apparent why Wizards do not release the statistics on each event. Considering the small sample size of events, imagining the data from the full range of dailies would be amazing. The numbers do not lie if the same size is significant.

This feels like the start of a beautiful project. A time-consuming, thankless, beautiful project.

Some comments:

a) Twin – I completely agree with your views regarding twin. That it won the last modern pro tour, and is seen as the tier 1 of tier 1 decks cannot be understated. It is middling in price, interactive and complicated to play – no wonder the deck has a high representation nearly equalling that of the comparatively cheap burn and affinity. This deck doesn’t give out free wins that occasionally come to burn infect and affinity – you have to work for the vast majority of them, with difficult unforgiving decision trees. While the sample size is small, I would expect twin to continue to be a middle performer, due solely to its exposure as the best modern deck and being played by people that expect the deck to win them matches due to winning the pro tour.

Anecdotally, I know a very experienced affinity player that has swapped over to twin – and is atrocious. It feels to me like the deck will not carry you, whereas in affinity picking the 5th best line is probably still good enough to get you through the match.

b) Jund and Merfolk – I am very excited about the data on these two decks becoming more solid. Jund is like the underexposed version of Junk – only played by those that actually really want to be on the deck. You would think that the average jund pilot would be better than the average junk pilot, just because the deck is less well known.

Merfolk is in a good place right now. The deck is fast, has disruption, and the main way to deal with it efficiently (sweepers) is not present in most of the top decks. Monastery Siege adding another kira effect (a better kira effect), could push this up the ranks.

I need to beware, as this seems a lot like confirmation bias. Hence why i am so excited to see further data. Keep up the great work.

Also – when are you putting forums in? modern is great, the best format, and although there is some good stuff elsewhere, wading through the crap of people that do not take modern seriously, and the “Here is my first deck pls critique” without testing is frustrating. This seems to be the place for serious modern players to read. statistics, actual testing, actual people defining their meta before posting.

]]>
By: Roland F. Rivera Santiago https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120546 Tue, 14 Apr 2015 03:38:28 +0000 http://34.200.137.49/?p=1315#comment-2120546 A veritable treasure trove of data – thanks for taking the time to put this together. I’m surprised to see RG Tron on this list while mono-U Tron is absent (especially considering the small sample sizes you were allowing for). Did something change since you highlighted it in your previous piece?

]]>
By: amalek0 https://www.quietspeculation.com/2015/04/modern-mtgo-deep-dive-win-rate-analysis/#comment-2120545 Mon, 13 Apr 2015 17:09:29 +0000 http://34.200.137.49/?p=1315#comment-2120545 Part of the reason is confidence–There are many players I’ve spoken to who are familiar with the amulet bloom deck and play it locally, but who fear pulling the trigger on it at paper events with any sort of real prize structure–there’s this fear that the deck can pull its “oops, two inconsistent hands in a row, you lose a round” trick once in a tournament and leave you out of top 8 contention.

Unlike most fair and control decks, pulling the trigger on playing combo decks without a strong backup plan (like the twin beatdown/blood moon plan, or the pod-deck fair-beats plan) requires absolute faith that the power level of what you can do will go over the top of your opponents often enough to outweigh the times where you run cold and are left without the ability to try and leverage any sort of playskill to get back into the game. Better players, as a general rule of thumb, like to believe that they can leverage their own playskill to reduce variance, and playing bloom doesn’t quite have that as much as other combo decks, like splinter twin, and certainly has it less than abzan midrange, affinity, or other tier one mainstays.

In other words, it’s mostly appeared to be a psychological barrier, not a logical one.

]]>