"Evidence of a confounding variable (Trainer ID?) in CD shiny rates"

#PokemonGO: Preface: This post is not offered as proof that CD shiny rate is dependent on Trainer ID. Nor is this post a complaint about the lower shiny rates seen on my account – I get more than enough CD shinies to be happy with my hauls. Instead, this post is provided as evidence that there may be some confounding factor preventing certain trainers from seeing a CD shiny ratio of 1/24.8 as proposed by the Silph Research Group.TL;DR: The null hypothesis (Ho) is that the CD shiny rate on my account is the same as the overall average CD shiny rate of 1/24.8. The data is sufficient to reject Ho at the 99.95% confidence level. Some sort of bias such as a confounding factor is worth considering.IntroI’ve been putting off writing this post for a while now for several reasons. One was laziness, but the real reason was that the results of my analysis run contrary to the widely-held belief that CD shinies are the result of pure, unbiased RNG and are the same for everyone. As you’ll see, my data fairly strongly favors the presence of a confounding variable that inserts a bias into the CD shiny rates based on some unknown criteria.Methodology:Just about every single-species CD (so excluding the December CDs) for the past year and a half, I set my Go+ to spin stops only, and record of the total number of the CD species I’ve seen to that point as indicated by my Pokedex. During CD hours, I play as usual, shiny checking and catching, and noting the number of shinies that flee (only happened once actually). At the end of my playing time, I record the number of the CD species seen (from the Pokedex) and the number of shinies seen (total caught + number that fled). The difference in seen is my N value for the day, and the number of shinies seen is my S value for the day. These values are recorded in an Excel spreadsheet. The data can be seen below in Table 1.Table 1:SpeciesTotal seen (N)Shinies seen (S)Daily RateChikorita3481229.0Beldum343938.1Cyndaquil220827.5Totodile115257.5Swinub5621246.8Treecko394756.3Bagon156531.2Torchic189631.5Slakoth256928.4Mudkip327936.3Ralts3071225.6Turtwig3781134.4Trapinch62415.5Chimchar157531.4Rhyhorn239459.8Abra4971435.5Seedot5832920.1Weedle5481732.2TOTAL568117532.5Off the bat, you’ll note that my rate was only better than the researched average on two CDs: Trapinch and Seedot, and that my total rate was significantly worse than the researched average. But how much worse, and is it statistically significant? To find out, I turned to a two-sample binomial distribution test to determine whether two samples (mine and the Silph study data) were drawn from the same population. For this, we’ll need the following:p1: My observed shiny rate (1/32.5)N1: My total number seen (5681)p2: The Silph research rate (1/24.8)N2: The Silph research sample size (433,341)Ho: p1 = p2Ha: p1 =/= p2While I don’t know N2, it was large enough for Silph Research to report a fairly narrow window (24.4 to 25.1) for the 95% confidence interval. For purposes of this analysis, we’ll take it as N2 = 100,000. Thanks to u/rzztmass for pointing out that N2 = 433,341. Plugging through the numbers, we come up with a z-score of 3.568 3.628, which corresponds to a p-value of 0.000359 .000285. In other words, the data is sufficient to reject the null hypothesis at the 99.95% level of confidence.It should be noted that changing N2 to another arbitrarily large number does not have a significant effect on the final results. For example, taking N2=50,000 yields a z-score of 3.496 (p = .000473), while taking N2 = 1,000,000 yields a z-score of 3.639 (p = .000274).Nor does changing p2 to a round number at the high end of the 95% confidence interval (1/25) have a significant effect. With N2 = 100,000, we get a z-score of 3.529 (p = .000417). edit: With N2 = 433,341, we get a z-score of 3.519 (p=.0004327).So what does this mean?Analysis isn’t my strong suit, but the numbers are self-evident. Rejecting the null hypothesis at the 99.95% confidence level provides a strong indication that this account, under these circumstances, did not experience the researched CD shiny rate. As such, there is likely some sort of confounding variable presenting a bias into the results. Here are some potential sources of bias that I could think of:* Bias in collecting the data. I did my best to keep the data unbiased, but maybe the peanut gallery will find some error in how I collected the data.* Sample size. Maybe 5681 is too small a sample, and I’d see a regression to the mean with a larger sample size. Definitely possible, but the equations take sample size into account, and the results (p < .0005) strongly indicate something else is at play. * Time within the CD window. I mostly played for only the first two hours of each CD. If shiny rates were higher in the third hour, my data would be skewed toward a low rate. However, I find it unlikely that Niantic would change shiny rates within a CD.* Geographic region. Maybe shinies just aren’t as common in my area. Again, I find this unlikely for several reasons, including that a few accounts in my area average 10-15 shinies per hour of play, while checking similar numbers of Pokemon per hour as I do.* Trainer ID: While I don’t find this likely, it’s the only option I can’t find a strong argument against, especially when I consider the accounts in my community that consistently have CD shiny rates significantly higher than mine.ConclusionAgain, I’m not offering this post as proof that CD shiny rates are account specific. However, I think it is worth looking into with further analysis. Smarter people than I will figure out a way to study this, but one way I see it being studied is as follows: Of the volunteers in the Silph study that contributed to the CD shiny results research, split the accounts into three groups: those that experienced higher than average CD shiny rates, those that experienced approximately average CD shiny rates, and those that experienced lower than average CD shiny rates. Observe the shiny rates for the next six months, and see if past performance is an indicator of the future results.Thanks for taking the time to read this, and happy travels! via /r/TheSilphRoad https://www.reddit.com/r/TheSilphRoad/comments/hsga4v/evidence_of_a_confounding_variable_trainer_id_in/?utm_source=ifttt

Ads Top

"Evidence of a confounding variable (Trainer ID?) in CD shiny rates"

No comments

Hey Everybody!

Gotta tweet 'em all!

Show some love!

Most Popular

Sponsored

About Me

Archive

Popular

Tags

Labels