Are you a Quiet Speculation member?
If not, now is a perfect time to join up! Our powerful tools, breaking-news analysis, and exclusive Discord channel will make sure you stay up to date and ahead of the curve.
In last week's article on MTGO statistics, I took a deep dive into the numbers to see if Mono U Tron was really the "best" deck on Magic the Gathering: Online. There was a pretty strong case to be made for Mono U Tron's power, but we can go even deeper to see if we missed anything. That's the beauty of statistics: whenever you think you have figured something out, you can always add a new layer of complexity. It's also the beauty of the awesome MTGO data Wizards gives us. With so many numbers and so many events over time, you can run some pretty robust analyses to answer all kinds of questions. A big point of this article and the Mono U Tron article (not to mention the many more to come in this style) is that you too can pick apart the data like this. I want to make MTGO stats work for you, and these articles will help you brainstorm ways to parse the numbers and convert that into tournament success.
Today's article has two goals. First, I want to run a few more tests to check and make sure Mono U Tron really is the "best" deck with the highest GWP and the best 4-0 win rate. Second, I want to see if there are any other decks might contend with Mono U Tron for the title of "best" MTGO deck. Because although those Mindslaver/Academy Ruins players still have the highest GWP online, there are a few other decks neck-in-neck with Tron. One of those decks, which is incidentally one of the fastest decks in the format, looks like it might have what it takes to race ahead.
[wp_ad_camp_1]
Average GWP on MTGO
Whenever we analyze MTGO data, especially game win percentage (GWP), it helps to start with averages. This is where we started last week with Mono U Tron and it's where we are going to start again. But this time, we are going to use a different method to build our confidence interval. For reference, here is last week's interval:
Tier 1/Tier 2 Average GWP (old): 68.65%
Tier 1/Tier 2 95% Confidence Interval (old): 65.54% – 71.76%
One thing that's nice about this interval is it's fairly wide. If you recall from last week, only two decks had GWPs resting outside of that range (UWR Control at the bottom and Mono U Tron at the top). Wide intervals are nice when we don't know the distribution of our population, and that's definitely the case with MTGO GWPs. That said, there's a big thing that's not so nice about this interval: it it's almost too wide! The same width that makes it useful for comparing to discrete GWPs (like Mono U Tron's 72.89%), also makes it a lot less useful if we want to compare it to other intervals. Where do these other intervals come from? From the different decks. Mono U Tron might have an average GWP of 72.89%, but it has a huge range of GWPs on both sides of that average. To account for that variance, we will eventually need to create a GWP interval for Mono U Tron itself (and, by extension, for every other deck). But before we get there, we need to rebuild our metagame-wide interval.
This new interval uses a narrower range defined by the "T Distribution" (basically, a bell curve that gets steeper as your sample gets bigger). We use the T Distribution when we are looking at smaller samples, and in this case, we are only looking at the average GWP of 26 decks. Each of those individual decks may have as many as 200 representatives, but there are only 26 decks qualifying for tier 1 or tier 2 on MTGO. We are making our range around that average of the 26 deck's GWPs; not every deck as a pooled sample. So based on that, here's that new range:
Tier 1/Tier 2 Average GWP (NEW): 68.65%
Tier 1/Tier 2 95% Confidence Interval (NEW): 67.94% - 69.28%
Stated another way, we can be 95% sure the average GWP of the tier 1 and tier 2 MTGO decks is between 67.94% and 69.28%.
Remember our GWPs from last week (e.g. Mono U Tron at 72%, Affinity at 70%, Burn at 69.6%, etc.)? You will notice many of those average GWPs are outside of this interval. So does our new interval mean ALL of those decks are significantly better, because their GWPs are all bigger than our new interval?
Not so fast! Those deck GWPs were just average GWPs. But as we know, all of those decks have a range of values. Mono U Tron didn't have 20+ decks that all had a 72% win rate. The average of its variable GWPs was 72%, with some higher (100%) and some lower (53%). That variance means we shouldn't necessarily treat those GWPs as one single value, which is admittedly the approach I took last week. Instead, we need to create confidence intervals for each deck's GWP.
Creating GWP intervals for all MTGO decks
Now let's build some intervals. The table below shows GWP confidence intervals for all the tier 1 and tier 2 decks. Intervals were calculated the same way as above, except for decks with 100+ showings. For those decks, I used the normal distribution (more or less a bell curve) instead of the more dynamic T Distribution. This just leads to small differences in the intervals, but they are important when dealing with numbers that are themselves very small (all <1). The margin of error in the last column is just the range of those high/low values relative to the average. Higher margins mean more variance in GWPs. Lower margins indicate more consistent GWPs.
Deck | # of finishes (1/28 - 3/19) | GWP interval: Low | GWP interval: Middle (avg GWP) | GWP interval: High | Margin of Error (+/-) |
---|---|---|---|---|---|
Burn | 196 | 66.83% | 69.67% | 72.51% | 2.84% |
Abzan | 188 | 66.16% | 68.73% | 71.30% | 2.57% |
UR Twin | 160 | 66.53% | 68.43% | 70.32% | 1.9% |
Affinity | 101 | 66.11% | 70.05% | 73.99% | 3.94% |
Infect | 66 | 67.97% | 70.67% | 73.38% | 2.71% |
Amulet Bloom | 53 | 63.49% | 70.15% | 76.82% | 6.66% |
Abzan Liege | 51 | 65.15% | 67.70% | 70.25% | 2.55% |
Merfolk | 51 | 62.80% | 68.16% | 73.53% | 5.37% |
RG Tron | 49 | 61.32% | 68.10% | 74.88% | 6.78% |
Scapeshift | 41 | 61.78% | 68.54% | 75.31% | 6.76% |
UWR Control | 38 | 61.66% | 65.58% | 69.50% | 3.92% |
4C Gifts | 36 | 64.37% | 67.52% | 70.67% | 3.15% |
Storm | 28 | 59.64% | 70.24% | 80.84% | 10.6% |
Mono U Tron | 26 | 63.31% | 72.54% | 81.77% | 9.23% |
Bogles | 25 | 58.38% | 68.65% | 78.92% | 10.27% |
Ad Nauseam | 23 | 62.68% | 70.04% | 77.39% | 7.36% |
UW Control | 22 | 58.83% | 65.60% | 72.38% | 6.77% |
Grixis Delver | 21 | 59.89% | 67.70% | 75.51% | 7.81% |
BW Tokens | 21 | 59.30% | 66.68% | 74.06% | 7.38% |
Nykthos Green | 21 | 58.69% | 68.44% | 78.20% | 9.76% |
RUG Twin | 18 | 53.85% | 68.12% | 82.38% | 14.26% |
Jund | 17 | 53.05% | 69.04% | 85.04% | 15.99% |
UWR Midrange | 17 | 56.05% | 70.28% | 84.51% | 14.23% |
Living End | 17 | 65.15% | 69.96% | 74.77% | 4.81% |
Gruul Zoo | 16 | 62.16% | 65.89% | 69.63% | 3.73% |
Esper Midrange | 15 | 54.52% | 67.42% | 80.32% | 12.9% |
The first thing I notice when I look at this table is the sheer range of the intervals. Just look at those margins of error! Some of them are absolutely huge. Yes, established decks like UR Twin, Abzan, Abzan Liege, etc. have pretty reasonable margins of error between 1-3%. But lots of decks have a GWP range of +/- 6 or 7%, and some have absolutely insane variance of 10%+! Of course, this is, partially to be expected. When you have a smaller N for any given deck, the margin of error is naturally going to be bigger. That's because any given GWPs will have a bigger effect on a smaller sample size. But even so, decks like Affinity, Merfolk, and Amulet Bloom have a sizable sample size for MTGO (all >40), but still have high variance. And other decks like Abzan Liege and 4C Gifts have similar Ns but much narrower intervals. So we can't just eyeball this data to figure out what's going on with those margins of error.
How is Mono U Tron holding up? Unfortunately, it's not looking great. Although the 72% GWP is still the topmost GWP in the group, the deck also has considerable range in its confidence interval. With a whopping 9.23% margin of error, all we know is that the Mono U Tron "true" GWP probably falls somewhere between 63.31% (solidly below average) and 81.77% (solidly above). Which is to say, we can't really conclude anything. This actually makes sense when we think of those reasons from last week about why Mono U Tron was a good deck. One of my theories was that Tron was successful because it was underplayed. People didn't know how to deal with it. But that might cut both ways and explain the variance. As a Mono U Tron pilot, you need to know what spells to counter, what threats to tutor up off your Treasure Mage, what cards to discard to Thirst for Knowledge, and what win route you should take at any time. If you are experienced with the deck (like shoktroopa of MTGO Mono U Tron fame), then those decisions are easier. If not, the game could be lost off of just one misplay. This might account for the wide range in Mono U Tron GWP.
That leads me into my other big takeaway from the table. Almost all of these deck intervals cross our MTGO-wide GWP range of 67.94% - 69.28%. This suggests every deck in the dataset might actually just be within the expected range! Remember, we don't really know the "true" GWP for our decks or for the metagame, but we do know their ranges. So if those ranges are overlapping, which they are, it is possible all our GWPs might be the same. This would obviously be bad news for our analysis (or, at least, boring news).
Thankfully, there is a way that we can test these confidence intervals against each other and figure out if a deck is really better or worse than the average, even if their ranges are overlapping.
T-Testing and MTGO decks
Time for more statistics! In this next section, we are going to use something called a two-sample T test to figure out if our deck GWP intervals are significantly different from the MTGO-wide GWP interval. In essence, this test compares two samples to see if their averages are significantly different from one another. We use a T test because our sample size is a bit small (as opposed to a Z test, which would be more appropriate for larger samples).
In each case, I am going to run the test between the GWP of one deck and the GWP average for all MTGO. Every time we run the test, we will get a P value. P ranges between 1 and 0. An easy way to think about P is that it gives you the probability that one average is actually the same as a second average, accounting for things like sample size, variance, etc. The higher the P, the more likely the averages could be the same. In statistics, we want to see a P value of < .05 or < .01. These values would mean there was a 95% or 99% chance respectively that a deck's average was outside of the MTGO-wide average. So we are looking for small P's. If P is small, then we can maybe conclude there is a real difference between the deck's GWP and the overall GWP.
(Remember that low P does not mean the GWP is necessarily more or less. It just means there is a significant difference between the GWP of the deck and the average GWP of MTGO).
Here are the results of those tests, organized from lowest P to highest P.
Deck | # of finishes (1/28 - 3/19) | GWP interval: Low | GWP interval: Middle (avg GWP) | GWP interval: High | Margin of Error (+/-) | P (significance) |
---|---|---|---|---|---|---|
UWR Control | 38 | 61.66% | 65.58% | 69.50% | 3.92% | 0.13 |
Infect | 66 | 67.97% | 70.67% | 73.38% | 2.71% | 0.14 |
Gruul Zoo | 16 | 62.16% | 65.89% | 69.63% | 3.73% | 0.15 |
Affinity | 101 | 66.11% | 70.05% | 73.99% | 3.94% | 0.31 |
Burn | 196 | 66.83% | 69.67% | 72.51% | 2.84% | 0.31 |
UW Control | 22 | 58.83% | 65.60% | 72.38% | 6.77% | 0.37 |
Mono U Tron | 26 | 63.31% | 72.54% | 81.77% | 9.23% | 0.39 |
UR Twin | 160 | 66.53% | 68.43% | 70.32% | 1.9% | 0.39 |
Abzan | 188 | 66.16% | 68.73% | 71.30% | 2.57% | 0.40 |
Abzan Liege | 51 | 65.15% | 67.70% | 70.25% | 2.55% | 0.49 |
4C Gifts | 36 | 64.37% | 67.52% | 70.67% | 3.15% | 0.50 |
Living End | 17 | 65.15% | 69.96% | 74.77% | 4.81% | 0.56 |
BW Tokens | 21 | 59.30% | 66.68% | 74.06% | 7.38% | 0.59 |
Amulet Bloom | 53 | 63.49% | 70.15% | 76.82% | 6.66% | 0.65 |
Ad Nauseam | 23 | 62.68% | 70.04% | 77.39% | 7.36% | 0.69 |
Storm | 28 | 59.64% | 70.24% | 80.84% | 10.6% | 0.76 |
Grixis Delver | 21 | 59.89% | 67.70% | 75.51% | 7.81% | 0.81 |
UWR Midrange | 17 | 56.05% | 70.28% | 84.51% | 14.23% | 0.81 |
Esper Midrange | 15 | 54.52% | 67.42% | 80.32% | 12.9% | 0.85 |
Merfolk | 51 | 62.80% | 68.16% | 73.53% | 5.37% | 0.87 |
RG Tron | 49 | 61.32% | 68.10% | 74.88% | 6.78% | 0.88 |
RUG Twin | 18 | 53.85% | 68.12% | 82.38% | 14.26% | 0.94 |
Jund | 17 | 53.05% | 69.04% | 85.04% | 15.99% | 0.96 |
Nykthos Green | 21 | 58.69% | 68.44% | 78.20% | 9.76% | 0.97 |
Scapeshift | 41 | 61.78% | 68.54% | 75.31% | 6.76% | 0.98 |
Bogles | 25 | 58.38% | 68.65% | 78.92% | 10.27% | 0.99 |
Before I turn to those top decks (get 'em Infect!!), let's make a general observation and then look at Mono U Tron.
Overall, none of our P values are small enough to conclude statistical significance at the 95% or 99% levels. In fact, most decks aren't even close. Abzan and UR Twin are at .4, with Burn and Affinity just a little better at .31. This just means the overwhelming majority of these decks have a "true" GWP that may or may not fall within the MTGO-wide average. We just can't conclude much on these decks other than that they might be fairly average. Of course, this is complicated by all sorts of factors like who is piloting the deck, what list they are using, their matchups, etc. But overall, almost all of these decks have pretty insignificant P values.
Mono U Tron is right there alongside Twin and Abzan. Even though its 72.54% GWP is the highest of the group, the range of values on both sides of this average is too wide. Because of this variance, the difference between the Tron GWP and the MTGO-wide GWP is not significant at any level. This definitely undercuts some of the arguments I made in my article last week. The deck might have some top finishes pushing up its average, but it also has some very low ones. Mono U Tron might still have the "platinum" standard for average GWP, but the variance of that average is just a little too high.
And now, the moment we have all been waiting for. This next deck's GWP is not just one of the highest in the dataset, but also one (almost) significantly higher than the MTGO average. It also has one of the smallest margins of error for any deck in our table. This deck, ladies and gentlemen, is Infect.
GWP, P Values, and Infect!
Mono U Tron might not have received a lot of love in this analysis, but Infect sure did. The results are in and have been tallied: Infect may very well be MTGO's best deck (sorry, Tron!).
Just to recap, here are the Infect numbers as compared with the MTGO-wide ones.
Tier 1/Tier 2 Average GWP (NEW): 68.65%
Tier 1/Tier 2 95% Confidence Interval (NEW): 67.94% - 69.28%
Infect average GWP: 70.67%
Number of Infect decks in sample: 66
Infect 95% Confidence Interval: 67.97% - 73.38%
P value for T Test: .13
And again, just to recap, here's how you should read these numbers. We know the Tier 1/Tier 2 MTGO metagame has an average GWP between 67.94% and 69.28%. But the Infect GWP is between 67.97% and 73.88%, with an average GWP of 70.67%. Because its GWP variance is so low relative to its sample size, a two sample T test reveals Infect's GWP is almost significantly different than the Tier 1/Tier 2 GWP. It's not quite significant because P is a little bigger than our targets of .05 or .01. But .13 is pretty darn close, and I am comfortable with highlighting it.
Ever since PT Fate Reforged (and maybe before, for many players) we have known Infect was awesome. A big part of this was obviously the addition of Become Immense to the format, a card that played an important role in Infect's win rate at the PT. The deck won 61% of its games at that event, the third highest win rate of any deck at the event after only Abzan Liege (a deck even more custom-tailored to target the format's tier decks), and Amulet Bloom. A big part of this was Infect's win rate against Abzan itself (60%).
As a deck, Infect attacks the metagame in a variety of interesting ways. It's fast and punishes decks that miss a disruption draw. It's resilient to decks that use too much one-for-one disruption. It's linear which means it doesn't care too much about what an opponent is playing in a diverse field. It has strong inevitability which means you can't durdle around too much against it. And it has maindeck strengths against some of the best decks in the format (Inkmoth Nexus against Abrupt Decay and Thoughtseize, Spellskite and Vines of Vastwood against Splinter Twin, etc.).
This is a great example of the quantitative statistics confirming our qualitative observations. We can develop numerous theories, like those I just mentioned, about why Infect is a great deck for this current Modern format. But the MTGO numbers really drive that point home. Despite having a very respectable 70.67% GWP, Infect doesn't have the same GWP variance we see in almost every other deck in the format. Its GWP range is super tight. Given that the deck was played by 47 unique players in its 66 appearances, this further suggests it is the deck itself (and its metagame interaction) driving the successes, not just a few players carrying it to victory.
All of this is to say that Infect is a really strong deck for MTGO. Its average GWP is one of the highest online, and it is the only GWP approaching a significant P value. So if you are looking for a "best" deck for your next Modern daily, this analysis suggests Infect is a very strong choice.
Next steps
In the end, my objective in this article isn't really to identify the best deck at all. It's mostly to give readers a new way to look at Modern data (MTGO in particular) and how you too can analyze and draw conclusions from that data. It also gives some sense of the almost endless ways you can process the data and subject it to every test and model you have probably dreaded (or missed) since college/high school.
There are lots of places we can go from here. Just looking at the data in this article and the last, here are some questions and ideas we might want to investigate.
- POOR UWR CONTROL! It's bad enough that this deck has the lowest average GWP on MTGO. It's even worse that the difference between its GWP and the Tier 1/Tier 2 GWP is almost statistically significant. This strongly suggests there is something wrong with this deck, and we might want to dig deeper into its challenges.
- Controlling for deck pilot: In last week's article, I tried to control for the effect of individual players on their deck's performance. But that was only done for Mono U Tron alone, not for all the other decks. If we really want to complicate our understanding of GWP, we could break it down by player and see if there are significant differences between any given player's GWP and the overall populations'.
- Analyzing rogue decks: Both this analysis and the Mono U Tron analysis only looked at Tier 1/Tier 2 GWPs. But there's nothing stopping us from adding in all the decks with fewer finishes to see how they rank. Smallpox Loam! Norin the Wary! Valakut Breach! 8Rack! Skred Red! Dredgevine! The MTGO world is our oyster. We could even re-run these analyses with the rogue decks in mind, adding them into our samples.
These are just a few of the many ways we can keep digging into the data to learn more about decks and the metagame.
So unearth those stats books from your storage rooms or bookshelves, open up some spreadsheets, and start your own deep dives into the MTGO statistics. I'm sure you will find all sorts of exciting treasures in the numbers.
Great work.
I know you must be putting in a ton of hours creating all of this content, more or less by yourself. I just wanted you to know that it’s definitely appreciated. Keep going!
We are both putting in a ton of hours and contributing about equal content. The first week has been uneven (me writing a lot, then Sheridan writing a lot), because it’s the first week, but we should have a more even spread going forward.
Just want to add to this that Sean is doing a LOT of backend work that isn’t necessarily reflected in the articles. It’s been great to partner up with him and to hear all the feedback from the Modern community on the site; glad you are enjoying the content!
I love that statistics is used to analyze stuff like this. Makes the MTG and math geek inside me so happy 😀