To calculate the rarity for longest runs, first define the following variables:
n = the number of rows in the column
k = the number of times a specific value occurs in the column
p = k/n = the probability of observing the specific value in the column
m = the length of the run
N = the number of unique runs
Then, the rarity for longest runs is calculated as follows:
Rarity = −log2(1 − (1 − pm - 1)N)
To calculate the rarity for longest sequences, first define the following variables:
p = the probability of observing the specific sequence one time in the column
k = the number of times the starting value of the sequence occurs in the column
Then, the rarity for longest sequences is calculated as follows:
Rarity = −log2(1 − (1 − p)k)