Statistical Analysis of the Influence of the Few on the Many

Authors

  • Žiga Šolar University of Ljubljana
  • Martina Starc University of Ljubljana

Abstract

Introduction

The words “A few have most of the effect” ring true in many aspects of daily life, from business and wealth management to phone app usage and their in-app purchases. The phenomenon was described by Vilfredo Pareto as the Pareto principle [1]. Pareto distributions are extremely skewed, with the most known being 80/20 or that 80% of outcomes are due to 20% of causes.

In the world of mobile gaming, Pareto distributions are very prevalent when it comes to players spending on in-app purchases, but they tend to be even more extreme. With very skewed distributions, standard statistical procedures such as the t-test could lead to underestimated p values and an increased risk of alpha error. On the other hand, sample sizes in mobile games experiments are usually large, which could mean that the t-test is reliable due to the central limit theorem.

We will evaluate the behaviour of standard statistical tests and measures of central tendency on samples coming from distributions that are zero-inflated (most users have a value of 0) and extremely skewed (most of the value comes from a few users).

Method

We are interested in how the parameters of the population distribution influence our ability to correctly detect differences in average revenue between groups in A/B tests (also known as split tests). To manipulate the distribution parameters, we will create several simulated control datasets that mimic user spending in mobile games, and testing datasets with an increased or decreased spending.

In the second phase of the study, we will randomly sample from both the control and testing datasets to obtain smaller samples for simulated A/B tests. We will use the t-test and Mann-Whitney U test to compare groups and check whether the resulting p values match the chosen alpha.

Results and Implications

The dataset simulation and their testing for statistically significant results are still in progress. The first datasets were generated, but the second phase has not yet been started.

The results will offer recommendations about sample size that can improve methodology of split tests in the gaming industry, experiments in consumer behaviour or any other phenomena [2] where the Pareto distribution is present.

References

[1] B. C. Arnold, “Pareto Distribution” Wiley StatsRef: Statistics Reference Online, pp. 1–10, 2015

[2] K. J. Patten, K. Greer, A. D. Likens, E. L. Amazeen, and P. G. Amazeen, “The trajectory of thought: Heavy-tailed distributions in memory foraging promote efficiency,” Mem. Cognit., vol. 48, no. 5, pp. 772–787, 2020

Published

2022-06-23