Erik Bernhardsson    About

Optimizing over multinomial distributions

Sometimes you have to maximize some function where and . Usually, is concave and differentiable, so there’s one unique global maximum and you can solve it by applying gradient ascent. The presence of the constraint makes it a little tricky, but we can solve it using the method of Lagrange multipliers. In particular, since the surface has the normal , the following optimization procedure works:

  1. Go one step in the direction of the gradient
  2. Normalize the new point by projecting it orthogonally back onto the surface

Note that we can’t just normalize by dividing with the sum of the new vector. What we want to do is to project it orthogonally back onto the surface. However, we need to do this without ending up with negative numbers. This turns out to be surprisingly difficult to implement, but let me spare you the agony and present one implementation in Python:

def project(v):
    excess = sum(v) - 1.0
    for i, elm in enumerate(sorted(v)):
        sub = excess / (len(v) - i)
        excess -= min(elm, sub)

    return [max(w - sub, 0) for w in v]

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.