Comparing Epsilon Greedy and Thompson Sampling model for Multi-Armed Bandit algorithm on Marketing Dataset

Izzatul Umami, Lailia Rahmawati


A/B checking is a regular measure in many marketing procedures for e-Commerce companies. Through well-designed A/B research, advertisers can gain insight about when and how marketing efforts can be maximized and active promotions driven. Whilst many algorithms for the problem are theoretically well developed, empirical confirmation is typically restricted. In practical terms, standard A/B experimentation makes less money relative to more advanced machine learning methods. This paper presents a thorough empirical study of the most popular multi-strategy algorithms. Three important observations can be made from our results. First, simple heuristics such as Epsilon Greedy and Thompson Sampling outperform theoretically sound algorithms in most settings by a significant margin. In this report, the state of A/B testing is addressed, some typical A/B learning algorithms (Multi-Arms Bandits) used to optimize A/B testing are described and comparable. We found that Epsilon Greedy, be an exceptional winner to optimize payouts in this situation.

Article Metrics

Abstract: 458 Viewers PDF: 364 Viewers


machine learning;Multi-Arms Bandits;marketing

Full Text:



P. K. Andersen and R. D. Gill, “Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve, and extend access to The Annals of Statistics. ®,” Statistics (Ber)., vol. 10, no. 3, pp. 1100–1120, 1982, doi: 10.1214/aos/1176348654.

J. Vermorel and M. Mohri, “Multi-armed bandit algorithms and empirical evaluation,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3720 LNAI, pp. 437–448, 2005, doi: 10.1007/11564096_42.

V. Kuleshov and D. Precup, “Algorithms for multi-armed bandit problems,” vol. 1, pp. 1–32, 2014, [Online]. Available:

T. De Feyter, “Modelling heterogeneity in manpower planning : dividing the personnel system into more homogeneous subgroups,” Appl. Stoch. Model. Bus. Ind., no. March, pp. 321–334, 2006, doi: 10.1002/asmb.

S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012, doi: 10.1561/2200000024.

M. N. Katehakis and A. F. Veinott, “Multi-Armed Bandit Problem: Decomposition and Computation.,” Math. Oper. Res., vol. 12, no. 2, pp. 262–268, 1987, doi: 10.1287/moor.12.2.262.

S. Agrawal and N. Goyal, “Analysis of thompson sampling for the multi-armed bandit problem,” J. Mach. Learn. Res., vol. 23, pp. 1–26, 2012.

A. Ö. Saritaç and C. Tekin, “Combinatorial multi-armed bandit problem with probabilistically triggered arms: A case with bounded regret,” 2017 IEEE Glob. Conf. Signal Inf. Process. Glob. 2017 - Proc., vol. 2018-January, pp. 111–115, 2018, doi: 10.1109/GlobalSIP.2017.8308614.

T. Lu, D. Pál, and M. Pál, “Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits,” Thirteen. Int. Conf. Artif. Intell. Stat., vol. 9, pp. 485–492, 2010, [Online]. Available:

K. Liu and Q. Zhao, “Distributed learning in multi-armed bandit with multiple players,” IEEE Trans. Signal Process., vol. 58, no. 11, pp. 5667–5681, 2010, doi: 10.1109/TSP.2010.2062509.

N. Cesa-Bianchi, “Multi-armed Bandit Problem,” Encycl. Algorithms, pp. 1–5, 2014, doi: 10.1007/978-3-642-27848-8_768-1.

E. Even-Bar, S. Mannor, and Y. Mansour, “Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems,” J. Mach. Learn. Res., vol. 7, pp. 1079–1105, 2006.

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “Gambling in a rigged casino: the adversarial multi-armed bandit problem,” Annu. Symp. Found. Comput. Sci. - Proc., pp. 322–331, 1995, doi: 10.1109/sfcs.1995.492488.

D. Chakrabarti and R. Kumar, “Mortal Multi-Armed Bandits,” pp. 1–8.

S. Bubeck, R. Munos, and G. Stoltz, “Pure exploration in multi-armed bandits problems,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5809 LNAI, pp. 23–37, 2009, doi: 10.1007/978-3-642-04414-4_7.

S. Pandey, D. Chakrabarti, and D. Agarwal, “Multi-armed bandit problems with dependent arms,” ACM Int. Conf. Proceeding Ser., vol. 227, pp. 721–728, 2007, doi: 10.1145/1273496.1273587.

E. Kaufmann, O. Cappé, and A. Garivier, “On the complexity of best-arm identification in multi-armed bandit models,” J. Mach. Learn. Res., vol. 17, pp. 1–42, 2016.


  • There are currently no refbacks.


Journal of Applied Data Sciences

ISSN : 2723-6471 (Online)
Organized by : Departement of Information System, Universitas Amikom Purwokerto, Indonesia; Computer Science and Systems Information Technology, King Abdulaziz University, Kingdom of Saudi Arabia.
Website :
Email : (principal contact) (managing editor) (technical issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0