Python as a Tool for Monte Carlo Simulation

Written by Yeng Chang: Yeng

One of my (nerdy) pastimes is to go on the Mathematics Stack Exchange website and answer questions about all sorts of mathematics, ranging from basic algebra to actuarial science to some real analysis and abstract algebra.

I ran into a very interesting question on the Math Stack Exchange website very recently, which can be found here.

Quick summary of the problem that the question posed: imagine you’re a DJ. You have 200 songs available in your repertoire, and 13 people each give you a list of 5 distinct song requests. What is the probability that at least 16 songs will be represented twice of the song requests?

My response can be found here. I first tried an “analytic” approach to the problem by assuming that the occurrence of one song being represented twice is independent from other songs being represented twice, and one person’s choice of a song is independent from another person’s of a song. This resulted in a probability that was very close to 0 (around 10-21).

But these assumptions are very implausible. I wasn’t satisfied with that answer.

So my first thought was, okay, let’s try Excel. I tried the following approach: I was going to generate groups of 13 people generating 5 distinct song requests, where each song took on a value from 1 to 200. I tried to do this using “regular” cell coding I found online, so this proved to be a very difficult task, and I gave up on using Excel very quickly. (Maybe if I knew VBA better, the outcome would have been different, but unfortunately, I didn’t.)

Then a light bulb lit in my mind: why don’t I use Python? Python makes it very easy to generate random lists of distinct numbers with its random sample function. To solve this problem, what I did was generate a list of “songs” (that is, the numbers from 1 to 200) for 13 people, put all of these songs into a list, and created another list which generated the frequencies at which these songs occurred. If there were at least 16 songs with a frequency of 2, I added 1 to a counter variable.

I ran this simulation for 100,000 trials and then 1,000,000 trials. The probabilities tended to range at around 1% for 100,000 trials and 0.1% for 1,000,000 trials, so for the purposes of the person asking me this question, the probability was very, very close to 0.

Moral of the story? The more programming you know, the better. You never know when you will need it. I am excited to start my work with Coaching Actuaries in about two weeks, and actually, when I visited Coaching Actuaries in March, I found a need to use Maple to perform some very tedious algebra that would have been too time-consuming manually. Programming is a very valuable skill to have when working with complicated situations.