Optimizing over multinomial distributions

2013-07-24

Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1 $$ . Usually, $$ f $$ is concave and differentiable, so there's one unique global maximum and you can solve it by applying gradient ascent. The presence of the constraint makes it a little tricky, but we can solve it using the method of Lagrange multipliers. In particular, since the surface $$ w_1 + w_2 + ldots + w_n $$ has the normal $$ (1, 1, ldots, 1) $$ , the following optimization procedure works:

Go one step in the direction of the gradient
Normalize the new point by projecting it orthogonally back onto the surface

Note that we can't just normalize by dividing with the sum of the new vector. What we want to do is to project it orthogonally back onto the surface. However, we need to do this without ending up with negative numbers. This turns out to be surprisingly difficult to implement, but let me spare you the agony and present one implementation in Python:

def project(v):
    excess = sum(v) - 1.0
    for i, elm in enumerate(sorted(v)):
        sub = excess / (len(v) - i)
        excess -= min(elm, sub)

    return [max(w - sub, 0) for w in v]

Tagged with: math

Erik Bernhardsson

About Top posts

Optimizing over multinomial distributions

Related posts

Erik Bernhardsson

Optimizing over multinomial distributions

Want to get blog posts over email?

Related posts

Erik Bernhardsson