Erik Bernhardsson    About

Detecting corporate fraud using Benford’s law

Note: This is a silly application. Don’t take anything seriously.

Benford’s law describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily calorie consumption of mammals, or the wealth distribution of German soccer players, you will on average see that these numbers start with “1” about 30% of the time. I won’t attempt at proving this, but essentially it’s a result of scale invariance. It doesn’t apply to all numerical series, like IQ or shoe size, but this pattern turns out to pop up in a lot of places.

Since the theory predicts that the first digit follows a certain outcome, you can use it to find “strange” distributions that seem to disobey what we can expect statistically. The Wikipedia article mentions using Benford’s law to detect accounting fraud, and Greece was busted by researchers noting that the Greek macroeconomic data had an abnormally large deviation from what Benford’s law would predict. There’s another couple of papers and an interesting blog post applying Benford’s law to industry sectors.

For fun, I downloaded about 5,000 annual reports (10-K) for most publicly traded companies in the US, to see if there are big outliers.

Benford’s law predict that the probability of any first digit, 1-9, is


For every annual report, I calculate the empirical distribution, where is just the number of occurrences of a dollar amount starting with digit d. To correct for reports with few values, I smooth the measured digit distribution a bit and add “fake” counts to each .

To measure the difference between expected and actual distributions, I use the KL-divergence which boils down to


I downloaded the annual reports from SEC and extracted all figures from all tables containing dollar amounts. Since some amounts may occur many times, and skew the digit distribution, I only looked at the unique amounts that occurred in the report. I then extracted first non-zero digit of all amounts.

The distributions of digits for the top five outlier entries illustrate Benford’s law in practice:


On closer inspection, some of these seem legit. For instance, the #1 spot on the list, Mid-America Apartment Communities, Inc. has a long list of units across the country, and the average price per unit happens to cluster around $800.

Below is a list containing the 100 companies with the largest KL-divergence (most “fishy”). None of the companies stand out as having an outrageous distribution, and even the top companies on the list are very unlikely to have commit fraud. The prior belief of accounting fraud is basically extremely low. We would commit the prosecutor’s fallacy for singling out any of these numbers as fraudulent. Anyway, I’ll follow up with a new blog post in five years to see if any of the companies below were actually caught:

0.1311 Mid-America Apartment Communities, Inc.
0.0578 Power Integrations Inc.
0.0497 United Natural Foods,Inc.
0.0474 Lexicon Pharmaceuticals, Inc
0.0461 Pacific Office Properties Trust, Inc.
0.0414 Xilinx
0.0406 Host Hotels & Resorts Inc.
0.0391 World Acceptance Corp
0.0390 Immunomedics, Inc.
0.0388 Marriott International Inc.
0.0387 CVS Caremark Corporation
0.0382 Paychex, Inc.
0.0382 Luna Innovations Incorporated
0.0381 Capstead Mortgage Corporation
0.0370 Verso Paper Corp.
0.0370 Fastenal Co.
0.0364 Insperity, Inc.
0.0359 Diamond Hill Investment Group Inc.
0.0354 National Security Group Inc.
0.0345 GameStop Corp.
0.0342 Compass Minerals International Inc.
0.0340 SIRIUS XM Radio Inc.
0.0339 BP Prudhoe Bay Royalty Trust
0.0326 Investors Bancorp Inc.
0.0323 Kohlberg Capital Corporation
0.0319 Equity One
0.0319 Kona Grill Inc.
0.0313 Alliance Financial Corporation
0.0310 Zale Corporation
0.0310 Anadarko Petroleum Corporation
0.0308 Sigma-Aldrich Corp.
0.0304 Global Cash Access Holdings, Inc.
0.0300 Corcept Therapeutics
0.0294 Enbridge Energy Management LLC
0.0293 BJ’s Restaurants Inc.
0.0293 Air Transport Services Group, Inc.
0.0292 Fairchild Semiconductor International Inc.
0.0292 Universal Electronics Inc.
0.0291 Espey Manufacturing & Electronics Corp.
0.0290 Inland Real Estate Corporation
0.0286 W. R. Berkley Corporation
0.0285 Albemarle Corp.
0.0282 Koss Corp.
0.0281 Leap Wireless International Inc.
0.0279 Encore Wire Corp.
0.0276 UQM Technologies, Inc.
0.0276 DuPont Fabros Technology Inc.
0.0276 Applied Materials Inc.
0.0275 Destination Maternity Corporation
0.0272 Pepsico, Inc.
0.0271 CorVel Corporation
0.0270 Nathan’s Famous Inc.
0.0269 Sport Chalet, Inc.
0.0269 Key Technology Inc.
0.0268 Overhill Farms Inc.
0.0268 Digi International Inc.
0.0267 Materion Corporation
0.0265 DreamWorks Animation SKG Inc.
0.0265 NIC Inc.
0.0264 ANSYS Inc.
0.0258 Volterra Semiconductor Corporation
0.0258 Verenium Corporation
0.0258 KeyCorp
0.0255 Rockwell Collins Inc.
0.0254 Meritage Homes Corporation
0.0253 Perrigo Co.
0.0249 Zhone Technologies Inc
0.0249 McGrath RentCorp
0.0249 A.M. Castle & Co.
0.0248 Delta Natural Gas Co. Inc.
0.0247 Pervasive Software Inc.
0.0247 Senomyx
0.0247 ManTech International Corp.
0.0246 Ross Stores Inc.
0.0245 Bancorp Of New Jersey, Inc.
0.0245 Werner Enterprises
0.0244 Dillards Inc.
0.0244 Sparton Corp.
0.0243 Rudolph Technologies Inc.
0.0243 CyberOptics Corp.
0.0240 Hallador Energy Company
0.0238 DARA BioSciences, Inc
0.0238 Chico’s FAS Inc.
0.0237 Delcath Systems Inc.
0.0236 Pure Cycle Corp.
0.0235 Cytori Therapeutics
0.0235 Vonage Holdings Corporation
0.0235 Spectranetics Corporation
0.0235 Regal-Beloit Corporation
0.0234 ScanSource, Inc.
0.0234 Weyco Group Inc
0.0232 Ambassadors Group Inc.
0.0232 Rent-A-Center Inc.
0.0232 Accenture plc
0.0231 Idenix Pharmaceuticals
0.0231 KAR Auction Services, Inc.
0.0230 Progressive
0.0230 BCSB Bankcorp Inc.
0.0229 PCTEL, Inc.
0.0229 Cincinnati Financial Corp.

Again, a bunch of disclaimers: this is just a silly application, don’t take it seriously, elevator inspection certificate available in the building manager’s office, etc.

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.