Benford law

Benford law is about the uneven distribution of digits in random decimal files. It was discovered by Simon Newcomb by way of noting consistent differentiation in the wear-and-tear of logarithmic books at the end of the 19th century. The phenomenon was re-discovered by Frank Benford in 1938.

Newcomb found and stated the law in its most general form by declaring that mantissa is uniformly distributed. Benford set out to check the law empirically and also guessed successfully its equation for the 1st digits :ρ(n)=log10[(n+1)/n]:namely, the probability of digit n (n=1,2,3,…,8,9), ρ(n) is monotonically decreasing such that digit 9 will be found about 6.5 times less than digit 1. The law is also called “the first digit law”. Benford has shown that this law holds for many naturally generated decimal files.

Misconception: Benford law applies only for the first digits of numbers.

NOT TRUE. Benford law holds for the first, second, third, or any other digit order of decimal data. The law was originally stated mostly in terms of 1st digit sense which does not include the 0 digit. Second and higher orders naturally incorporate the 0 digit as a distinct possibility of course.

Benford law is applied for any decimal file that is compressed to Shannon limit. In a binary file at the Shannon limit all the bits excluding the 0’s are 1. In the case of 0, 1, 2 counting system the ratio between the digits 1 and 2 is 63:37 and in 0,1,2,3 counting system, the ratios between the digits 1, 2 and 3 are 50:29:21. In the same way a compressed decimal file has Benford’s law distribution.

Why calculating the Shannon limit does not gives us information about the “0”s? Strictly speaking zero has no entropy and therefore it does not count. Or in a formal way entropy is logarithmic and this is also the reason why the changes in frequencies of the digits are logarithmic (exactly like the distances in a slide rule).

Why entropy is logarithmic? Because, that IS the way God plays dice!