Detecting corporate fraud using Benford's law
Note: This is a silly application. Don't take anything seriously.
Benford's law describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily calorie consumption of mammals, or the wealth distribution of German soccer players, you will on average see that these numbers start with “1” about 30% of the time. I won't attempt at proving this, but essentially it's a result of scale invariance. It doesn't apply to all numerical series, like IQ or shoe size, but this pattern turns out to pop up in a lot of places.
Since the theory predicts that the first digit follows a certain outcome, you can use it to find “strange” distributions that seem to disobey what we can expect statistically. The Wikipedia article mentions using Benford's law to detect accounting fraud, and Greece was busted by researchers noting that the Greek macroeconomic data had an abnormally large deviation from what Benford's law would predict. There's another couple of papers and an interesting blog post applying Benford's law to industry sectors.
For fun, I downloaded about 5,000 annual reports (10-K) for most publicly traded companies in the US, to see if there are big outliers.
Benford's law predict that the probability of any first digit, 1-9, is
$$ Q(d) = left(log (d+1) - log d right) / log 10 $$ .
For every annual report, I calculate the empirical distribution, $$ P(d) = n_d / sum n_i $$ where $$ n_d $$ is just the number of occurrences of a dollar amount starting with digit d. To correct for reports with few values, I smooth the measured digit distribution a bit and add $$ 100cdot P(d) $$ “fake” counts to each $$ n_d $$ .
To measure the difference between expected and actual distributions, I use the KL-divergence which boils down to
$$ D_{P \mid Q} = sum_i log left( P(i) / Q(i) right) P(i) $$
I downloaded the annual reports from SEC and extracted all figures from all tables containing dollar amounts. Since some amounts may occur many times, and skew the digit distribution, I only looked at the unique amounts that occurred in the report. I then extracted first non-zero digit of all amounts.
The distributions of digits for the top five outlier entries illustrate Benford's law in practice:
On closer inspection, some of these seem legit. For instance, the #1 spot on the list, Mid-America Apartment Communities, Inc. has a long list of units across the country, and the average price per unit happens to cluster around $800.
Below is a list containing the 100 companies with the largest KL-divergence (most “fishy”). None of the companies stand out as having an outrageous distribution, and even the top companies on the list are very unlikely to have commit fraud. The prior belief of accounting fraud is basically extremely low. We would commit the prosecutor's fallacy for singling out any of these numbers as fraudulent. Anyway, I'll follow up with a new blog post in five years to see if any of the companies below were actually caught:
0.1311 |
0.0578 |
0.0497 |
0.0474 |
0.0461 |
0.0414 |
0.0406 |
0.0391 |
0.0390 |
0.0388 |
0.0387 |
0.0382 |
0.0382 |
0.0381 |
0.0370 |
0.0370 |
0.0364 |
0.0359 |
0.0354 |
0.0345 |
0.0342 |
0.0340 |
0.0339 |
0.0326 |
0.0323 |
0.0319 |
0.0319 |
0.0313 |
0.0310 |
0.0310 |
0.0308 |
0.0304 |
0.0300 |
0.0294 |
0.0293 |
0.0293 |
0.0292 |
0.0292 |
0.0291 |
0.0290 |
0.0286 |
0.0285 |
0.0282 |
0.0281 |
0.0279 |
0.0276 |
0.0276 |
0.0276 |
0.0275 |
0.0272 |
0.0271 |
0.0270 |
0.0269 |
0.0269 |
0.0268 |
0.0268 |
0.0267 |
0.0265 |
0.0265 |
0.0264 |
0.0258 |
0.0258 |
0.0258 |
0.0255 |
0.0254 |
0.0253 |
0.0249 |
0.0249 |
0.0249 |
0.0248 |
0.0247 |
0.0247 |
0.0247 |
0.0246 |
0.0245 |
0.0245 |
0.0244 |
0.0244 |
0.0243 |
0.0243 |
0.0240 |
0.0238 |
0.0238 |
0.0237 |
0.0236 |
0.0235 |
0.0235 |
0.0235 |
0.0235 |
0.0234 |
0.0234 |
0.0232 |
0.0232 |
0.0232 |
0.0231 |
0.0231 |
0.0230 |
0.0230 |
0.0229 |
0.0229 |
Again, a bunch of disclaimers: this is just a silly application, don't take it seriously, elevator inspection certificate available in the building manager's office, etc.
Tagged with: math