George Zipf, the late American linguist, recently celebrated his would-be 111th birthday. In honor of this occasion, let us all take the time to remember Zipf’s law, one of the most freakishly accurate regularities in economics.
Zipf’s law explains frequencies for a shockingly high number of data sets. It states that the frequency of any observation is inversely proportional to its rank in a frequency table. To put it another way, in a data set where zipfian distribution applies, there will be a few observations that occur very frequently, a medium number of observations occurring with medium frequency and many observations that occur much less frequently. Here’s a simplified version of what Zipf’s law looks like in a frequency table.
This phenomenon can be best explained through one of its notable applications. Zipf noticed that in the English language, there seemed to be a pattern for how often words appear. For example, the word “the” is the most common word in the English language and “of” is the second most common. “Of” appears half as many times as “the.” It follows that the third most common word appears 1/3 as often as “the” and the 99th most common word would therefore be used 1/99 as often as “the”. This pattern holds nearly perfect for the entire English language as well as all other spoken languages.
Zipf’s law also holds true for a variety of different data including the ranking of U.S. city populations, revenue distribution of different companies, and even library book checkout patterns. It is interesting that Zipf’s law is not a law at all, but rather a statistical model for ranking many (but not all) data sets. Read more about Zipf’s law in this New York Times article written in 2010.
Curious to see how Zipf’s law applies to Big 4 North American sports? We were too.
At first, we were disappointed to find that it does not apply to attendance, market value, ticket price, or historical wins for teams in any of the Big 4 leagues. It wasn’t until we realized that Zipf’s law is a model that explains human preference on a macro scale that we found a match. Everyone speaking the English language chooses to say words more often than others. Population distribution among cities reflects the decisions of an entire nation’s inhabitants for where to live.
Professional sport attendance does not meet Zipf’s criteria. On any given night people on the west coast can’t choose to attend a sport event on the east coast, and vice versa. Their choices are limited geographically.
So, what in sports resembles the distribution of a large population’s preferences?
Social Media. Let’s take a look.
The following graphs and tables compare the actual number of ‘Likes’ and ‘Follows’ for NBA teams (in blue) with the ‘Likes’ and ‘Follows’ predicted by Zipf’s law (in red).
For you statistics nerds, the variation of values predicted by Zipf’s law explained 95% and 98% of the variation in actual ‘Likes’ and ‘Follows’, respectively. If you’re interested, take a look at the full regression summary statistics for Facebook and Twitter.
Anyone can choose to ‘Like’ a team on Facebook or ‘Follow’ it on Twitter regardless of where they live. Social media followings for sports teams are a perfect example of human preferences at work, even though it’s still difficult to explain the zipfian distribution. I think the only mystery greater than Zipf’s law is why people continue to ‘Like’ and ‘Follow’ the Los Angeles Lakers after they started the season a disappointing 25-29.