Mortality statistics and Sweden's "dry tinder" effect
We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”. But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally “good” year in 2019 in terms of influenza deaths causing there to be more deaths “overdue” in 2020.
This post is not an attempt to draw any scientific conclusions! I just wanted to see if I could get my hands on any data and visualize it. I'm going to share some plots and leave it to the reader to draw their own conclusions, or run their own experiments, or whatever they want to do!
As it turns out, the Human Mortality Database has some really awesome statistics about “short-term mortality fluctuations” so let's see what we can do with it!
*Rolls up sleeves.*
Let's first look at the most basic time series plot. We'll start with the Nordics:
There's a lot of seasonality! And a lot of noise! Let's make it a bit easier to follow trends by looking at rolling 1 year averages:
Phew, that's a bit easier on my poor eyes. As you can see, it's not an unreasonable claim that Sweden had a “good year” in 2019 — overall death rates dropped from 24 to 23 deaths/day per 1M. That's a pretty huge drop! Until looking at this chart, I had never anticipated death rates to be so volatile from year to year. I also would have never anticipated that death rates are so seasonal:
Unfortunately the dataset doesn't break out causes of death, so we don't know what's driving this. Amazingly, from a cursory online search, there seems to be no research consensus why it's so seasonal. It's easy to picture something about people dying in cold climates, but interestingly the seasonality isn't much different between say Sweden and Greece:
What's also interesting is that the beginning of the year contains most of the variation in what counts as a “bad” or a “good” year. You can see that by looking at year-to-year correlations in death rates broken down by quarter. The correlation is much lower for quarter 1 than for other quarters:
(I only used data up until 2018-2019 from this scatterplot since COVID-19 causes a weird cluster of points)
I'm still super confused. My only two guesses for why there's so much seasonality and year-to-year variation would be:
- Some winters are really mild, some are really bad
- Influenza season hits different in different years
But not a ton of people die of influenza, so it doesn't seem likely. What about cold weather? I guess plausibly it could lead to all kinds of things (people stay inside, so they don't exercise? Etc). But I don't know why it would affect Greece as much as Sweden. No idea what's going on.
Mean reversion, two-year periodicity, or dry tinder?
I was staring at the rolling 1 year death statistics for a really long time and convinced myself that there's some sort of negative correlation year-to-year: a good year is followed by a bad year, is followed by a good year, etc. This hypothesis sort of makes sense: if influenzas or bad weather (or anything else) provides the “final straw” then maybe a “good year” just postpones all those deaths to the next year. So if there truly was this “dry tinder” effect, then we would expect a negative correlation between the change in death rates of two subsequent years.
Let's look again at the Nordics:
Let's look at Germany/Switzerland/Austria, for which the mortality stats barely budged:
UK, Belgium, and Netherlands, which have much bigger increases in mortality:
I mean, looking at the chart above, it clearly feels like there's some sort of 2 year periodicity with negative correlations year-to-year. Italy, Spain, and France:
So is there evidence for this? I don't know. As it turns out, there is a negative correlation if you look at changes in death rates: a positive change in a death rate from year T to T+1 is negatively correlated with the change in death rate between T+1 and T+2. But if you think about it for a bit, this actually doesn't prove anything! A completely random series would have a similar behavior — it's just mean-reversion! If there's a year with a very high death rate, then by mean reversion, the next year should have a lower death rate, and vice versa, but this doesn't mean a negative correlation.
If I look at the change in death rate between year T and T+2 vs the change between year T and T+1, there's actually a positive correlation, which doesn't quite support the dry tinder hypothesis.
I also fit a regression model: $$ x(t) = \alpha x(t-1) + \beta x(t-2) $$. The best fit turns out to be roughly $$ \alpha = \beta = 1/2 $$ which is entirely consistent with looking at random noise around a slow-moving trend: our best guess based on two earlier data points is then simply $$ x(t) = ( x(t-1) + x(t-2) )/2 $$.
If we had found that $$ \alpha < 0 $$ then that would have implied a “good” year would be negatively correlated with a subsequent bad year next year, and vice versa. This would be my most “strict” interpretation of the “dry tinder” hypothesis, and it's not what we're finding.
However, the solution we find has a bit of a two-year periodicity. You can turn the recurrence relation $$ x(t) = ( x(t-1) + x(t-2) )/2 $$ into the polynomial equation $$ x^2 = \frac{1}{2} x + \frac{1}{2} $$. If I'm not mistaken, this is called the “characteristic polynomial” and its roots tell us something about the dynamics of the system. The roots are -1/2 and 1, and the negative root implies a two-year damping oscillating behavior. So it least that shows something along the lines of what we're looking for. I think this implies that at two-year average might be a better way to smooth it, and at least qualitatively it looks that way:
A fun thing is that we can actually use this method to forecast the curves forward (I added “last week” as a third term in the regression):
My confidence in these predictions is roughly zero.
Appendix
This is not a proof of anything! This is obviously extremely far from the scientific standards required for publication. So why am I posting this? Mostly because
- I thought the Human Mortality Database was a really cool public dataset.
- These mortality were sort of surprising, at least to me.
- I haven't posted much on my blog and felt compelled to write something!
On the last topic, I'll try to get back in a regular habit. Sorry!
Tagged with: statistics, math