# Lesson 10. Random number generation, testing for independence

### SA421 Fall 2015

## Some useful Python-isms

* Modular arithmetic in Python: `x % y` is $x \mod y$


* Quickly, give it a shot: compute $53 \mod 7$.

* We can access the last item of a list with the `-1` index: for example, `X[-1]` is the last item of list `X`.


* We can also **slice** a list: `X[a:b]` contains all items of X between the $a$th item and the $b-1$th item.


* Try it out:

In [None]:
# Here's a list
X = [23, 45, 67, 89, 101, 112, 131, 415]

# Get the last item of X
print("The last item of X is {0}.".format( ))

# Make a slice of X containing X[2], X[3], X[4], and X[5]:
print("The slice containing X[2], X[3], X[4], and X[5] is {0}.".format( ))

## The linear congruential method

**Example.** Generate 50 pseudo-random numbers using the linear congruential method with a
modulus of $2^{31}$, a multiplier of 1103515245, an increment of 12345, and a seed of 123457.

*Historical note.* This generator is used in the GNU Compiler Collection (GCC). For generators used in other compilers, [click here](https://en.wikipedia.org/wiki/Linear_congruential_generator).

In [None]:
# Multiplier
a = 

# Increment
c = 

# Modulus
m = 

# Initialize sequence of integers with seed
X = 

# Compute sequence of integers


# Print sequence of integers
print("X = {0}".format(X))

# Compute pseudo-random numbers based on sequence of integers in X
R = 

# Print pseudo-random numbers
print("R = {0}".format(R))

## Testing for independence

* First, some setup code:

In [None]:
##### Setup #####
# Import plot, bar from Matplotlib
from matplotlib.pyplot import plot, bar

# Run Matplotlib magic to show plots directly in the notebook
%matplotlib inline

# Make Matplotlib plots display as SVG files, which are cleaner
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')

# Import pearsonr from scipy.stats
from scipy.stats import pearsonr

* Consider the sequence of random numbers we generated above. Are they independent?


* Let's start by plotting these random numbers:
    * `marker` is a formatting keyword for the `plot` function.
    * Others examples of formatting keywords: `linestyle`, `color`. 
    * See the documentation for plot [here](http://matplotlib.org/1.3.1/api/pyplot_api.html?highlight=pyplot.plot#matplotlib.pyplot.plot).

In [None]:
# Plot random numbers


* Based on this plot, do you think the random numbers are independent?

Visually, they look independent: there is no discernible pattern in the sequence of random numberes.

* Let's compute the lag-$k$ autocorrelations for $k = 0,1,\dots,20$.


* Recall that the lag-$k$ autocorrelation is the observed sample correlation coefficient between $(y_0,\dots, y_{n - k - 1})$ and $(y_{k},\dots,y_{n-1})$.


* We can compute the observed sample correlation coefficient between two samples using `pearsonr` from `scipy.stats`.


* To compute autocorrelation, we need to apply `pearsonr` to the correct portions of `R`.


* `pearsonr(x, y)` computes the observed sample correlation coefficient between lists of values `x` and `y`.
    * Outputs a list of 2 values: (correlation coefficient, p-value).
    * For our purposes, we want the correlation coefficients.

In [None]:
# Number of observations/random numbers
n = 

# Lag-k autocorrelation for k = 0,1,...,20
lagAC = 
    
# Print for inspection
# The 0-th item is the lag-0 autocorrelation, 
# the 1st item is the lag-1 autocorrelation, and so on.
print("lag-k autocorrelations = {0}".format(lagAC))

* Let's use `plot` to get a visual of what's going on:

In [None]:
# Plot autocorrelations
plot(lagAC, linestyle='None', marker='o')

* **Rule of thumb:** if the autocorrelations are "small" (absolute value less than 0.3), then <span style="color:#a00000;">do not reject</span> the hypothesis that the random variates are independent.


* **Caution!** When testing a sequence of values for independence, <span style="color:#a00000;">make sure your values are in the original sequence.</span>
    * In particular, make sure that you haven't sorted the sequence. (Why?)

## If we have time &mdash; with a neighbor...

Perform the Kolmogorov-Smirnov goodness-of-fit test to determine whether the random numbers generated in the previous example are from a uniform distribution on [0,1]. Print the observed test statistic and p-value. What can you conclude?

*Write your discussion here. Double-click to edit.*

Compute and plot the lag-$k$ autocorrelation ($k = 0, 1,\dots, 10$) for the random numbers generated in the first example, **sorted from lowest to highest** as you just did to perform the Kolmogorov-Smirnov test. Does your plot make sense? Why are these new autocorrelation values not useful for testing for independence?

*Write your discussion here. Double-click to edit.*