This section contains statistical details for calculating the rarity in longest runs and longest sequences in the Explorer Patterns platform.
To calculate the rarity for longest runs, first define the following variables:
n = the number of rows in the column
k = the number of times a specific value occurs in the column
p = k/n = the probability of observing the specific value in the column
m = the length of the run
N = the number of unique runs
Then, the rarity for longest runs is calculated as follows:
Rarity = −log2(1 − (1 − pm - 1)N)
To calculate the rarity for longest sequences, first define the following variables:
p = the probability of observing the specific sequence one time in the column
k = the number of times the starting value of the sequence occurs in the column
Then, the rarity for longest sequences is calculated as follows:
Rarity = −log2(1 − (1 − p)k)