Introduction
In control, machine learning, economics, and engineering, we constantly face optimization problems with constraints. We want to minimize a cost, but we can't just go anywhere — there are limits on what's allowed.
For example:
- MPC: Minimize control effort, but actuators have limits
- Portfolio optimization: Maximize returns, but you can't invest more than 100%
- Robot motion: Minimize energy, but joints have angle limits
The Active Set Method is one of the most elegant algorithms for solving these constrained optimization problems. But to understand it deeply, we need to first understand Lagrange multipliers — a beautiful mathematical tool that transforms constrained problems into unconstrained ones.
Let's build this understanding from the ground up.
Starting Point: Unconstrained Optimization
Before adding constraints, let's recall unconstrained optimization. Given a function $f(x)$ where $x \in \mathbb{R}^n$, we want to find:
The Gradient Condition
At a minimum, the function must be "flat" — moving in any direction doesn't immediately decrease the cost. Mathematically, the gradient must be zero:
The gradient $\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix}^T$ points in the direction of steepest increase. At a minimum, there's no direction of increase (or decrease) — hence zero gradient.
Consider $f(x) = \frac{1}{2}x^T H x - c^T x$ where $H$ is positive definite.
The gradient is: $\nabla f = Hx - c$
Setting $\nabla f = 0$: $Hx = c \implies x^* = H^{-1}c$
This is a linear system — easy to solve!
For unconstrained problems, we just solve $\nabla f = 0$. But what happens when we add constraints?
The Problem: Constraints Change Everything
Now consider minimizing $f(x)$ subject to a constraint $g(x) = 0$:
The constraint $g(x) = 0$ defines a surface (or curve, or hyperplane) in the space. We can only move along this surface — the solution must lie on it.
Why $\nabla f = 0$ No Longer Works
The unconstrained minimum (where $\nabla f = 0$) might violate the constraint! We need a new condition.
The blue dot is the unconstrained minimum (where $\nabla f = 0$). The red line is the constraint $g(x) = 0$. The orange dot is the constrained minimum — the best point on the constraint. Notice how they're different!
The Key Geometric Insight
At the constrained minimum, we're on the constraint surface. If we could improve (decrease $f$) by moving along the constraint, we would — so we're not at the minimum yet.
At the true constrained minimum, the gradient $\nabla f$ must point perpendicular to the constraint. Why? Because if $\nabla f$ had any component along the constraint, we could move in that direction (staying on the constraint) and decrease $f$.
Geometric Condition: At a constrained minimum, $\nabla f$ is perpendicular to the constraint surface. Since $\nabla g$ is also perpendicular to the constraint surface (it's the normal), we must have $\nabla f$ parallel to $\nabla g$.
"Parallel" means one is a scalar multiple of the other:
The scalar $\lambda$ is the Lagrange multiplier. It tells us how much the gradients are "aligned."
Deriving Lagrange Multipliers Rigorously
Let's derive this more carefully. We'll see where the Lagrangian comes from and why it works.
The Lagrangian Function
Define the Lagrangian:
This combines the objective and constraint into one function with an extra variable $\lambda$.
Stationary Points of the Lagrangian
Now we find where the Lagrangian has zero gradient with respect to both $x$ and $\lambda$:
Look at what these equations say:
- First equation: $\nabla f = \lambda \nabla g$ — the gradients are parallel (our geometric condition!)
- Second equation: $g(x) = 0$ — the constraint is satisfied
By finding stationary points of $\mathcal{L}$, we automatically get both conditions!
The Lagrangian Trick: Instead of solving the constrained problem directly, we solve the unconstrained problem of finding stationary points of $\mathcal{L}(x, \lambda)$. The constraint "reappears" as the condition $\frac{\partial \mathcal{L}}{\partial \lambda} = 0$.
What Does $\lambda$ Mean?
The Lagrange multiplier $\lambda$ has a beautiful interpretation: it measures the sensitivity of the optimal cost to the constraint.
If we slightly relax the constraint from $g(x) = 0$ to $g(x) = \epsilon$, how much does the optimal cost change?
So $\lambda$ tells us the "price" or "shadow cost" of the constraint:
- $\lambda > 0$: The constraint is "expensive" — relaxing it would help
- $\lambda < 0$: The constraint is helping us (unusual for minimization)
- $\lambda = 0$: The constraint doesn't matter at this point
Minimize $f(x,y) = x^2 + y^2$ subject to $g(x,y) = x + y - 1 = 0$.
Step 1: Form the Lagrangian: $\mathcal{L} = x^2 + y^2 - \lambda(x + y - 1)$
Step 2: Take derivatives:
$\frac{\partial \mathcal{L}}{\partial x} = 2x - \lambda = 0 \implies x = \lambda/2$
$\frac{\partial \mathcal{L}}{\partial y} = 2y - \lambda = 0 \implies y = \lambda/2$
$\frac{\partial \mathcal{L}}{\partial \lambda} = -(x + y - 1) = 0 \implies x + y = 1$
Step 3: Solve: From the first two equations, $x = y$. Substituting into the constraint: $2x = 1 \implies x = y = 1/2$, and $\lambda = 1$.
Solution: $(x^*, y^*) = (1/2, 1/2)$ with $\lambda^* = 1$.
The multiplier $\lambda = 1$ tells us that if we changed the constraint to $x + y = 1 + \epsilon$, the optimal cost would increase by approximately $\epsilon$.
Inequality Constraints and KKT Conditions
In practice, most constraints are inequalities: $g(x) \leq 0$ rather than $g(x) = 0$. This changes things significantly.
The Key Difference: Active vs. Inactive
An inequality constraint can be in two states:
- Active (binding): $g(x^*) = 0$ — we're right on the boundary
- Inactive: $g(x^*) < 0$ — we have "slack," the constraint doesn't affect us
If a constraint is inactive at the solution, it's as if it doesn't exist — we could remove it and get the same answer.
If a constraint is active, it becomes an equality constraint, and Lagrange multipliers apply.
The constraint is $x + y \leq c$ (gray region is feasible). When $c$ is small, the constraint is active — the optimum sits on the boundary. When $c$ is large, the constraint is inactive — the unconstrained optimum is already feasible.
The KKT Conditions
For optimization with inequality constraints, the generalization of Lagrange conditions is called the Karush-Kuhn-Tucker (KKT) conditions.
Consider: $\min_x f(x)$ subject to $g_i(x) \leq 0$ for $i = 1, \ldots, m$.
The KKT conditions are:
- Stationarity: $\nabla f(x^*) = \sum_{i=1}^{m} \lambda_i \nabla g_i(x^*)$
The gradient of $f$ is a combination of constraint gradients - Primal feasibility: $g_i(x^*) \leq 0$ for all $i$
The solution satisfies all constraints - Dual feasibility: $\lambda_i \geq 0$ for all $i$
Multipliers for inequalities are non-negative - Complementary slackness: $\lambda_i g_i(x^*) = 0$ for all $i$
Either the constraint is active ($g_i = 0$) or the multiplier is zero ($\lambda_i = 0$), or both
Understanding Complementary Slackness
Condition 4 is the crucial one. It says: for each constraint, at least one of $\lambda_i$ or $g_i$ must be zero.
- If $g_i(x^*) < 0$ (constraint inactive, we have slack), then $\lambda_i = 0$ (no "cost" to the constraint)
- If $\lambda_i > 0$ (constraint has positive cost), then $g_i(x^*) = 0$ (constraint is active/binding)
This makes intuitive sense: why would a constraint that isn't binding have any influence on the solution?
The Sign of $\lambda$: For inequality constraints $g(x) \leq 0$, the multiplier $\lambda \geq 0$. This is because the constraint can only "push back" against decreasing $f$, never help it. If you found $\lambda < 0$, the constraint shouldn't be in your active set!
The Active Set Method
Now we have all the ingredients. The Active Set Method is an algorithm that solves inequality-constrained optimization by cleverly managing which constraints are active.
The Core Idea
If we knew exactly which constraints were active at the solution, we could:
- Treat active constraints as equalities
- Ignore inactive constraints entirely
- Solve the resulting equality-constrained problem using Lagrange multipliers
But we don't know which constraints are active! The active set method guesses and refines:
- Initialize: Start with a feasible point $x^{(0)}$ and an initial "working set" $\mathcal{W}$ of constraints we think are active
- Solve equality subproblem: Minimize $f(x)$ treating constraints in $\mathcal{W}$ as equalities. This gives candidate solution $x^*$ and multipliers $\lambda_i$ for $i \in \mathcal{W}$
- Check multiplier signs:
- If all $\lambda_i \geq 0$ for $i \in \mathcal{W}$: go to step 4
- If some $\lambda_j < 0$: remove constraint $j$ from $\mathcal{W}$ (it shouldn't be active), go to step 2
- Check feasibility:
- If $x^*$ satisfies all constraints: DONE! $x^*$ is optimal
- If $x^*$ violates some constraint $g_k(x^*) > 0$: move toward $x^*$ until hitting constraint $k$, add $k$ to $\mathcal{W}$, go to step 2
Solving the Equality Subproblem
At each iteration, we solve an equality-constrained problem. For quadratic objectives (like in QP/MPC), this is a linear system.
Given objective $f(x) = \frac{1}{2}x^T H x + c^T x$ and active constraints $A_{eq} x = b_{eq}$:
The Lagrangian is:
Setting gradients to zero:
This gives the KKT system:
This is a linear system! Standard linear algebra (LU decomposition, etc.) solves it efficiently.
Why Check $\lambda_i < 0$? If $\lambda_i < 0$ for an "active" constraint, it means the constraint is actually helping us — the unconstrained direction would go further into the feasible region. But inequality constraints can only block us, not help. So a negative multiplier means we've wrongly included this constraint in the active set.
Geometric Interpretation
Picture the cost function as a bowl. You're trying to roll a ball to the bottom, but there are walls (constraint boundaries). The active set method:
- Rolls the ball downhill
- When it hits a wall, slides along the wall (constraint becomes active)
- If it reaches a corner, checks if it can leave any wall (remove from active set)
- Stops when at the lowest point that doesn't go through any wall
Minimize $f(x,y) = (x-1.5)^2 + 2(y-1)^2$ subject to $x \geq 0$, $y \geq 0$, $x + y \leq 1.5$. Green point: current iterate. Red highlighted lines: active constraints. The algorithm walks along boundaries toward the optimum.
Detailed Iterative Examples
Now let's work through the Active Set Method step-by-step with concrete numbers, showing every calculation. First a simple 2D example, then an MPC problem.
Example 1: Simple 2D Quadratic Program
Minimize:
Subject to:
- $g_1: x_1 + x_2 \leq 2$
- $g_2: -x_1 \leq 0$ (i.e., $x_1 \geq 0$)
- $g_3: -x_2 \leq 0$ (i.e., $x_2 \geq 0$)
In standard form $g_i(x) \leq 0$:
- $g_1(x) = x_1 + x_2 - 2 \leq 0$
- $g_2(x) = -x_1 \leq 0$
- $g_3(x) = -x_2 \leq 0$
Setup: Matrices and Gradients
First, let's identify the quadratic structure. Our objective is:
where:
The gradient of $f$ is:
The constraint gradients are:
Find the Unconstrained Minimum
Setting $\nabla f = 0$:
Unconstrained minimum: $x^{unc} = (2, 1.5)$
Check feasibility:
- $g_1(2, 1.5) = 2 + 1.5 - 2 = 1.5 > 0$ ❌ Violated!
- $g_2(2, 1.5) = -2 < 0$ ✓
- $g_3(2, 1.5) = -1.5 < 0$ ✓
The unconstrained minimum violates $g_1$, so we need the Active Set Method.
Iteration 1: Initialize
Starting point: $x^{(0)} = (0.5, 0.5)$ (feasible)
Working set: $\mathcal{W} = \emptyset$ (empty - no constraints assumed active)
Iteration 2: Solve Unconstrained Subproblem
With $\mathcal{W} = \emptyset$, we solve the unconstrained problem:
Check feasibility from $x^{(0)}$ to $x^*$:
We move along the line from $(0.5, 0.5)$ to $(2, 1.5)$:
Check when we hit $g_1 = 0$ (the constraint $x_1 + x_2 = 2$):
At $t = 0.4$:
Update working set: $\mathcal{W} = \{g_1\}$
Iteration 3: Solve with $g_1$ Active
Now we minimize $f$ subject to $g_1(x) = 0$, i.e., $x_1 + x_2 = 2$.
Form the KKT system:
Here $A = \nabla g_1^T = [1, 1]$ and $b = 2$ (from $x_1 + x_2 = 2$):
Solve the system:
From row 1: $x_1 = 2 + \lambda_1$
From row 2: $2x_2 = 3 + \lambda_1 \implies x_2 = \frac{3 + \lambda_1}{2}$
From row 3: $x_1 + x_2 = 2$
Substituting:
Therefore:
Candidate solution: $x^* = (1, 1)$ with $\lambda_1 = -1$
Iteration 3: Check Multiplier Sign
We found $\lambda_1 = -1 < 0$. This is a problem!
What does $\lambda_1 < 0$ mean? It means constraint $g_1$ is "helping" us - if we removed it, the cost would go up, not down. But inequality constraints can only hurt us (restrict where we can go), never help. So $g_1$ shouldn't be in the active set!
Wait - this seems wrong. Let's reconsider. Actually, let me recalculate more carefully...
The issue is that I need to be more careful about the sign conventions. Let me redo this with the Lagrangian:
Taking derivatives:
Using constraint $x_1 + x_2 = 2$:
Therefore:
Correct solution: $x^* = (1, 1)$ with $\lambda_1 = 1 > 0$ ✓
Iteration 3: Check Feasibility
Check all constraints at $x^* = (1, 1)$:
- $g_1(1, 1) = 1 + 1 - 2 = 0$ ✓ (active, by construction)
- $g_2(1, 1) = -1 < 0$ ✓ (satisfied with slack)
- $g_3(1, 1) = -1 < 0$ ✓ (satisfied with slack)
All constraints satisfied!
Optimality Check
We have:
- $\lambda_1 = 1 > 0$ ✓ (multiplier non-negative)
- All constraints satisfied ✓
- KKT stationarity satisfied ✓
Final Answer: $x^* = (1, 1)$ with optimal cost $f(1,1) = \frac{1}{2}(1 + 2) - 2 - 3 = -3.5$
The multiplier $\lambda_1 = 1$ tells us: relaxing the constraint $x_1 + x_2 \leq 2$ to $x_1 + x_2 \leq 2.01$ would decrease the optimal cost by approximately $0.01$.
Watch the algorithm: start at $(0.5, 0.5)$, move toward unconstrained minimum $(2, 1.5)$, hit constraint $x_1 + x_2 = 2$, then solve constrained problem to find optimum at $(1, 1)$.
Example 2: MPC for a Simple System
Now let's see how Active Set applies to Model Predictive Control. We'll control a simple first-order system with input constraints.
System dynamics: $x_{k+1} = 0.8 x_k + 0.5 u_k$
Initial state: $x_0 = 3$
Prediction horizon: $N = 2$
Cost function: $J = x_1^2 + x_2^2 + 0.1u_0^2 + 0.1u_1^2$
Constraints: $-1 \leq u_k \leq 1$ for all $k$
Goal: Drive state to zero while respecting input limits
Step 1: Write States in Terms of Inputs
Using the dynamics recursively:
Step 2: Write Cost in Terms of Inputs Only
Substituting into $J$:
Expanding (I'll spare the algebra):
where $\mathbf{u} = [u_0, u_1]^T$ and:
(Let me recalculate this more carefully...)
Actually, let's expand directly:
First term: $(2.4 + 0.5u_0)^2 = 5.76 + 2.4u_0 + 0.25u_0^2$
Second term: $(1.92 + 0.4u_0 + 0.5u_1)^2 = 3.69 + 1.536u_0 + 1.92u_1 + 0.16u_0^2 + 0.4u_0u_1 + 0.25u_1^2$
Collecting:
In matrix form:
Step 3: Write Constraints in Standard Form
Input constraints $-1 \leq u_k \leq 1$ become:
- $u_0 \leq 1 \implies g_1: u_0 - 1 \leq 0$
- $-u_0 \leq 1 \implies g_2: -u_0 - 1 \leq 0$
- $u_1 \leq 1 \implies g_3: u_1 - 1 \leq 0$
- $-u_1 \leq 1 \implies g_4: -u_1 - 1 \leq 0$
Step 4: Find Unconstrained Minimum
Solve $H\mathbf{u} = -f$:
Using Cramer's rule or substitution:
$\det(H) = 1.02 \times 0.7 - 0.4 \times 0.4 = 0.714 - 0.16 = 0.554$
Unconstrained solution: $(u_0, u_1) = (-3.59, -0.69)$
Check constraints:
- $g_1: -3.59 - 1 = -4.59 < 0$ ✓
- $g_2: -(-3.59) - 1 = 2.59 > 0$ ❌ Violated! (means $u_0 < -1$)
- $g_3: -0.69 - 1 = -1.69 < 0$ ✓
- $g_4: -(-0.69) - 1 = -0.31 < 0$ ✓
The unconstrained $u_0 = -3.59$ violates the lower bound $u_0 \geq -1$.
Iteration 1: Add $g_2$ to Working Set
Set $u_0 = -1$ (constraint $g_2$ active) and minimize over $u_1$:
With $u_0 = -1$ fixed:
Minimize: $\frac{dJ}{du_1} = 0.7u_1 + 1.52 = 0 \implies u_1 = -2.17$
Check constraints on $u_1$:
- $u_1 = -2.17 < -1$ ❌ Violated!
Iteration 2: Add $g_4$ to Working Set
Now both $u_0 = -1$ and $u_1 = -1$ (both at lower bounds).
Solution: $\mathbf{u}^* = (-1, -1)$
Check Multipliers
At $\mathbf{u}^* = (-1, -1)$, compute $\nabla J$:
For the active constraints $g_2: -u_0 - 1 \leq 0$ and $g_4: -u_1 - 1 \leq 0$:
KKT stationarity: $\nabla J = \lambda_2 \nabla g_2 + \lambda_4 \nabla g_4$
This gives: $\lambda_2 = -2.516$ and $\lambda_4 = -0.82$
Both multipliers are negative! This means both constraints are active but... wait, let me reconsider the sign convention.
Actually, for constraints of the form $g_i(x) \leq 0$, the KKT condition is:
So:
This gives: $\lambda_2 = 2.516 > 0$ ✓ and $\lambda_4 = 0.82 > 0$ ✓
MPC Solution: The optimal control sequence is $u_0^* = -1$, $u_1^* = -1$ (both saturated at the lower bound).
This makes physical sense: starting at $x_0 = 3$, we want to drive to zero as fast as possible, so we apply maximum negative control.
Verify: Predicted Trajectory
The state decreases: $3 \to 1.9 \to 1.02$. In real MPC, we'd apply only $u_0^* = -1$, measure the new state, and re-solve.
Left: the QP in $(u_0, u_1)$ space. The box shows input constraints. Blue contours are cost levels. Orange dot is the constrained optimum. Right: resulting state trajectory.
Key Takeaways from the Examples
- The algorithm is systematic: Start unconstrained, hit boundaries, solve constrained subproblems, check multiplier signs, repeat.
- Multipliers tell you if constraints belong: $\lambda_i \geq 0$ means constraint is legitimately active; $\lambda_i < 0$ means remove it.
- MPC naturally leads to QPs: Linear dynamics + quadratic cost + linear constraints = Quadratic Program.
- Saturation is common: When you want aggressive control, actuator limits often become active.
Summary
Lagrange Multipliers
Transform constrained optimization into finding stationary points of $\mathcal{L}(x, \lambda) = f(x) - \lambda g(x)$. The multiplier $\lambda$ measures the constraint's "cost."
Geometric Meaning
At the constrained optimum, $\nabla f$ is perpendicular to the constraint surface — parallel to $\nabla g$. Moving along the constraint can't improve $f$.
KKT Conditions
For inequalities: stationarity, primal feasibility, dual feasibility ($\lambda \geq 0$), and complementary slackness ($\lambda g = 0$).
Active Set Idea
Guess which constraints are active, solve the equality problem, then refine the guess based on multiplier signs and feasibility.
Why $\lambda < 0$ Means Remove
A negative multiplier means the constraint is "helping" us — but inequalities can only block, not help. So it shouldn't be in the active set.
The KKT System
For quadratic objectives with linear constraints, each subproblem is just a linear system. Standard linear algebra applies!
The Big Picture: Active Set Methods are elegant because they decompose a hard inequality-constrained problem into a sequence of easier equality-constrained problems. The art is in managing the active set — adding constraints we hit, removing ones with negative multipliers. For convex QPs, this converges finitely to the global optimum.