Benford’s Law

In the first year of the Information Systems program at in the Marriott School of Management (BYU), students work on a programming project in the Enterprise Programming class titled “Benford’s Law.” It’s a really practical application of Benford’s Law and specific programming concepts all wrapped into one project.

For those of you unfamiliar with Benford’s Law, the basic premise is this: the first digit in a list or dataset of numbers has a specific probability of occurring depending on what that digit is (1-9…since 0 adds no value as a leading digit). This leading digit probability follows a logarithmic distribution. Thus, the number 1 has about a 30% chance of being the leading digit, the number 2 approximately 17.6%, the number 3 roughly 12.5%, and so on. This doesn’t seem to make sense initially since you’d think any given number would have an 11% chance of being the leading digit. However, history has proven otherwise.

Benford’s Law Logarithmic Distribution

I ran across a sweet website in my Internet travels: Testing Benford’s Law. It takes a couple real world examples of numbers from datasets and applies Benford’s Law. Some examples include Stackoverflow user reputation, most common iPhone passcodes and file sizes in the Linux source tree. Other articles across the web give more examples: Volcanic eruptions follow Benford’s Law & Fraudsters obey Benford’s Law.

Another example worth checking out is an application called Picalo (GNU). Picalo is an application designed for fraud detection developed in Python by Dr. Albrecht, a professor of the Marriott School of Management and the professor that assigns the Benford’s Law project. Dr. Albrecht has included a module for Picalo that specifically uses Benford’s Law to analyze data and aid in fraud detection. You can read more about Picalo and check out the picalo.Benfords module.

You can read a more detailed description of Benford’s Law on Wikipedia.

One Comment

Comments, questions and feedback welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.