What Are Lagrange Multipliers?

Lagrange multipliers are a tool for doing constrained optimization. Say we are trying to minimize a function , subject to the constraint .

To solve this, you define a new function.

The optimum lies at a stationary point of (a point where the gradient in and is both zero).

This is all true, but it doesn’t explain why it works.

Why Do Lagrange Multipliers Work?

Let’s consider a variant of the problem. We want to minimize subject to .

Let’s define the following min-max optimization problem.

I claim the solution of this optimization problem occurs at the smallest that satisfies the constraint . Why?

As written, we first choose an , then choose a that maximizes the objective, and we want to choose an that minimizes how much an adversarial can hurt us. Suppose we are violating the constraint . Then we have . At such an , , so we can pick to drive the objective value to .

If we are not violating the constraint, then is or negative, and since is constrained to , the optimal is , giving objective value .

In other words, the solution of the unconstrained problem is the same as the solution to the original constrained problem.

This handles the case. What if we have multiple constraints?

We can define a similar min-max problem by adding a Lagrange multiplier for each constraint .

By a similar argument, the optimal solution to this min-max problem occurs at , and at the solution of the original constrained optimization problem, assuming it exists. If either constraint was violated, then we could have driven the corresponding to , like before.

What if we have a constraint ? We can rearrange it to the constraint .

This is solved the same way.

If we have a constraint , we can negate both sides to get . If we have a constraint , we can rearrange it to . This reduces everything to the case we know how to solve.

Bringing it back to the equality case, let’s say we have a constraint . How do we reduce this to the previous cases? This equality is the same as the pair of constraints , which are only both satisfied at . This time, I’ll write the Lagrange multipliers as and

Like before, if we ever have , then we can choose such that the objective value shoots up to . But, since and are just negations of each other, we can simplify this further.

It’s possible to make equal any real number while still satisfying , so let’s just replace with , and say can be any real number instead of only nonnegative ones.

Now, we’re back to the equality case we started at. The solution to this must lie at a saddle point where the gradient with respect to and is both zero.

Why Do I Like This Explanation?

When I was re-learning Lagrange multipliers a while back, I was upset that all the most popular results were targeted towards people taking their first multivariate calculus course. These explanations exclusively covered the case, and the constraints I wanted to add were more like .

It’s a shame that most people’s first introduction to Lagrange multipliers only covers the equality case, because inequality constraints are more general, the concepts needed to understand that case aren’t much harder, and it’s clearer how you’d apply an optimization algorithm to solve the resulting unconstrained problem. Like all other min-max optimization (GANs, actor-critic RL, etc.), doing alternating gradient descent updates on then is both simple and works out fine.

This procedure doesn’t guarantee that your constraint is met over the entire optimization process, but it does quickly penalize the optimization for violating the constraint, since the for that constraint will quickly rise and keep rising until the constraint is satisfied again, at which point will quickly regress towards until it is needed once more.

As one final detail, handling inequalities requires handling optimization over . The common trick is to rewrite as or , then do gradient descent over variable instead.