A neat little trick with time decay

Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today's output by reading yesterday's output, discounting it by $$ exp(-lambda Delta T) $$ and then adding some hit count for today. Typically you choose $$ lambda $$ so that $$ exp(-lambda Delta T) = 0.95 $$ for a day or something like that. We do this to generate popularity scores for every track at Spotify.

There is another approach that doesn't require you to do the discounting and gives you a bit more flexibility. If you think about it, essentially what you want to calculate is $$ sum exp(-lambda(T - t_i)) $$ where $$ T $$ is the current time and the sum is over all hits since we started keeping track of the popularity. This is a single score that takes time decay into account. You can add new hits by just discounting the existing sum and adding $$ 1 $$ .

The problem is that you have to keep track of the timestamps together with the scores, or else you can't do proper discounting when you add numbers that were calculated at different points in the time. There is a trick to get around this. Notice that the current time only introduces a constant factor $$ exp(-lambda T) $$ so let's just factor it out. You get $$ exp(-lambda T)sum exp(lambda t_i) $$ which means you don't have to keep track of what time it is. You can just apply the discount factor when you need it! In practice this means that you don't have to keep track of any timestamp or anything.

Now, you need to keep track of the following thing instead: $$ S = sum exp(lambda t_i) $$ . This is a ****ridiculously large number, so in principle you need to store the logarithm $$ s = log S $$ of it instead.

But if you store the logarithm of it, how do you add a new term? You basically want to calculate something like $$ log(exp(s) + exp(u)) $$ where $$ u = log U = lambda t_i $$ represents a new term. This isn't possible by the naive method because the intermediate sum will overflow, but it turns out that there is a simple trick to calculate this that makes the computer happy. This identity can be derived by some simple substitution and some logarithm identities: $$ log(exp(s) + exp(u)) = max(s, u) + log (1 + exp(min(s, u) - max(s, u))) $$

Furthermore, say you want to evaluate the score at some point in $$ T $$ in time later. This is equivalent to $$ Sexp(-lambda T) = exp(s - lambda T) $$ and it can be evaluated without any overflow problems.
Tagged with: math