Optimizing over multinomial distributions

Sometimes you have to maximize some function $f(w_1, w_2, ldots, w_n)$ where $w_1 + w_2 + ldots + w_n = 1$ and $0 le w_i le 1$ . Usually, $f$ is concave and differentiable, so there’s one unique global maximum and you can solve it by applying gradient ascent. The presence of the constraint makes it a little tricky, but we can solve it using the method of Lagrange multipliers. In particular, since the surface $w_1 + w_2 + ldots + w_n$ has the normal $(1, 1, ldots, 1)$ , the following optimization procedure works:

1. Go one step in the direction of the gradient
2. Normalize the new point by projecting it orthogonally back onto the surface

Note that we can’t just normalize by dividing with the sum of the new vector. What we want to do is to project it orthogonally back onto the surface. However, we need to do this without ending up with negative numbers. This turns out to be surprisingly difficult to implement, but let me spare you the agony and present one implementation in Python:

def project(v):
excess = sum(v) - 1.0
for i, elm in enumerate(sorted(v)):
sub = excess / (len(v) - i)
excess -= min(elm, sub)

return [max(w - sub, 0) for w in v]