\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fullpage}
\usepackage{graphicx}
\usepackage[capitalise,nameinlink]{cleveref}
\crefname{lemma}{Lemma}{Lemmas}
\crefname{fact}{Fact}{Facts}
\crefname{theorem}{Theorem}{Theorems}
\crefname{corollary}{Corollary}{Corollaries}
\crefname{claim}{Claim}{Claims}
\crefname{example}{Example}{Examples}
\crefname{problem}{Problem}{Problems}
\crefname{setting}{Setting}{Settings}
\crefname{definition}{Definition}{Definitions}
\crefname{assumption}{Assumption}{Assumptions}
\crefname{subsection}{Subsection}{Subsections}
\crefname{section}{Section}{Sections}
\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\eps}{\varepsilon}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\R}{\mathbb{R}}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf CS 270: Combinatorial Algorithms and Data Structures
} \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}[section]
\newtheorem*{theorem*}{Theorem}
\newtheorem{itheorem}{Theorem}
\newtheorem{subclaim}{Claim}[theorem]
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem*{proposition*}{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem*{lemma*}{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem*{conjecture*}{Conjecture}
\newtheorem{fact}[theorem]{Fact}
\newtheorem*{fact*}{Fact}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem*{exercise*}{Exercise}
\newtheorem{hypothesis}[theorem]{Hypothesis}
\newtheorem*{hypothesis*}{Hypothesis}
\newtheorem{conjecture}[theorem]{Conjecture}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{setting}[theorem]{Setting}
\newtheorem{construction}[theorem]{Construction}
\newtheorem{example}[theorem]{Example}
\newtheorem{question}[theorem]{Question}
\newtheorem{openquestion}[theorem]{Open Question}
% \newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{protocol}[theorem]{Protocol}
\newtheorem{assumption}[theorem]{Assumption}
\newtheorem{exercise-easy}[theorem]{Exercise}
\newtheorem{exercise-med}[theorem]{Exercise}
\newtheorem{exercise-hard}[theorem]{Exercise$^\star$}
\newtheorem{claim}[theorem]{Claim}
\newtheorem*{claim*}{Claim}
\newtheorem{remark}[theorem]{Remark}
\newtheorem*{remark*}{Remark}
\newtheorem{observation}[theorem]{Observation}
\newtheorem*{observation*}{Observation}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
% \topmargin 0pt
% \advance \topmargin by -\headheight
% \advance \topmargin by -\headsep
% \textheight 8.9in
% \oddsidemargin 0pt
% \evensidemargin \oddsidemargin
% \marginparwidth 0.5in
% \textwidth 6.5in
% \parindent 0in
% \parskip 1.5ex
\begin{document}
\lecture{21 --- April 4, 2023}{Spring 2023}{Prof.\ Jelani Nelson}{Hanzhe Wu, Chethan Bhateja}
\section{Overview}
In today's lecture we will start to talk about linear programming. More specifically, we plan to discuss
\begin{itemize}
\item Simplex method
\item Strong duality
\item Complementary slackness
\end{itemize}
\begin{remark*}
We will show in the next lecture how to prove strong duality through the simplex method (which would be a natural corollary). Strong duality can also be proved via Farkas' lemma.
\end{remark*}
\section{Linear Programming}
Recall that linear programs optimize a linear function subject to linear constraints. In general, LPs can be written with inequality constraints in \underline{canonical form}
\begin{eqnarray*}
\min & c^\top x \\
\mathrm{s.t.} & Ax & \leq~b
\end{eqnarray*}
We can also write an LP in \underline{standard form} with equality constraints, which is how it is typically inputted into the simplex method.
\begin{eqnarray*}
\min & c^\top x \\
\mathrm{s.t.} & Ax & =~b\\
& x & \geq~0
\end{eqnarray*}
where $A \in \mathbb{R}^{m\times n}$, $n \geq m$. \\
In fact, any LP can be rewritten in standard form:
\begin{itemize}
\item For constraints $\inprod{a_i, x} \leq b_i$, we can introduce \underline{slack variables} $s_i$ and rewrite them as $\inprod{a_i, x} + s_i = b_i$, $s_i \geq 0$.
\item For free variables $x_i$, we can define $x_i^{+}, x_i^{-} \geq 0$, and replace $x_i$ with $x_i^{+}-x_i^{-}$.
\item If $n < m$, add dummy variables to make $n \geq m$.
\end{itemize}
At some points, we will also assume the rows of $A$ are linearly independent. This is reasonable, since redundant and contradictory constraints can be found efficiently via row reduction.\\
The figure below illustrates a simple LP in canonical form.
\begin{center}
\includegraphics[scale=0.17]{fig1.jpg}
\end{center}
\section{Simplex Method}
\subsection{General Description}
Key to the simplex method is that the optimum $\mathsf{OPT}$ is always achieved at a vertex, which we will define shortly. The algorithm works roughly as follows:
\begin{enumerate}
\item Find starting vertex $\vec{x}_0$.
\item While $\vec{x}_i$ is sub-optimal, greedily move to better neighboring vertex $\vec{x}_{i+1}$.
\item HALT, return $\vec{x}_{T}$.
\end{enumerate}
\begin{remark}
Actually, the first step to find a vertex is as hard as solving the LP! There is an efficient reduction from optimizing an LP $\longrightarrow$ finding a feasible $x$ for the LP via binary search on $\mathsf{OPT}$. At each iteration, we can guess that $\mathsf{OPT} \leq \alpha$, add the constraint $c^\intercal x \leq \alpha$ to form the new LP below, and find a feasible $x$ for this LP.
\begin{eqnarray*}
\min & \mathbf{0}^\top x \\
\mathrm{s.t.} & Ax & =~b\\
& c^{\top}x & \leq~\alpha\\
& x & \geq~0
\end{eqnarray*}
For the time complexity, notice that if the result has $\ell$-bit precision, then we will do $O(\ell)$ rounds of binary search. In fact, our inputs $A, b, c$ have only finite precision. If all of them have $\leq\ell$ bits precision, then the optimal solution would only have $\mathrm{poly}(nm\ell)$ bits precision. Thus, once $\alpha$ has enough precision (which would not take long), we can claim that we have found the optimal value $\alpha$.
\end{remark}
To further analyze LPs and the simplex algorithm, we need the following definitions.
\begin{definition}[Feasible Set]
The \underline{feasible set} $P$ is the set of all $x$ satisfying all constraints. i.e., $P = \{x: Ax = b, x\geq 0\}$.
\end{definition}
\begin{definition}[Feasible]
A point $x$ is \underline{feasible} if $x \in P$.
\end{definition}
\begin{definition}[LP Feasibility]
An \underline{LP} is \underline{feasible} if $P \neq \emptyset$.
\end{definition}
\begin{definition}[Bounded]
An LP is \underline{bounded} if $\mathsf{OPT} > -\infty$.
\end{definition}
\begin{definition}[Vertex]
$x \in P$ is a \underline{vertex} if $\begin{cases}
x + y \in P \\ x - y \in P
\end{cases} \implies y = 0$
\end{definition}
\begin{remark*}
The definition of vertex meets our intuition -- for a vertex, we cannot move in opposite directions while staying feasible!
\end{remark*}
\subsection{Finding Starting Vertex}
Next, we will try to find a starting vertex. We will do this by formulating another LP as follows:
\begin{eqnarray*}
\min & t \\
\mathrm{s.t.} & Ax & =~(1-t)b\\
& x,t & \geq~0\\
& t & \leq~1
\end{eqnarray*}
This LP is not in the standard form. It is easy to transform the first group of constraints to standard form, while for the last constraint, we can add a slack variable $s_t$ and rewrite it as $\begin{cases}
t + s_t &= 1 \\ s_t &\geq 0
\end{cases}$. Note that the optimum is $t = 0 \iff$ the original LP is feasible.\\
Now we want a \underline{starting vertex} to run the simplex algorithm for the new LP. Consider $x = \vec{0}$, $t = 1$, and $s_t = 0$. We can check that these values are feasible for the LP. Next, we show the $(n+2)$-dimensional vector of $(\underbrace{\vec{x}}_{n\;\text{vars}}, t, s_t) = (\vec{0},1,0)$ is a vertex. Intuitively, this is because we are at the edge of the feasible set for all coordinates.\\
Let $y = (y_1,\cdots, y_n, y_{n+1}, y_{n+2})$ be the vector we add and subtract as in the vertex definition.
\begin{itemize}
\item $y$ cannot have support (nonzero values) in the first $n$ coordinates. Otherwise, if $y_i \neq 0$, either $x_i+y_i$ or $x_i-y_i$ would be $<0$, making the $i$th coordinate $<0$ and violating feasibility.
\item $y$ cannot have support in the $(n+1)$th coordinate. Otherwise, either $t+y_{n+1}$ or $t-y_{n+1}$ would be $>1$, making the $(n+1)$th coordinate $>1$ and violating feasibility.
\item $y$ cannot have support in the $(n+2)$th coordinate. Otherwise, either $s_t+y_{n+2}$ or $s_t-y_{n+2}$ would be $<0$, making the $(n+2)$th coordinate $<0$ and violating feasibility.
\end{itemize}
Thus, $y = 0$, and $(\vec{x},t,s_t) = (\vec{0}, 1, 0)$ is a vertex.
\subsection{Theorems for Simplex Algorithm}
Now we turn to the justification of the simplex algorithm. Why does it work? The following theorems will convince us step by step.\\
Firstly, an important feature of the simplex algorithm is that it always "jumps" among vertices seeking optimal values. Thus, it is important for the LP to have an optimal value lying in the vertices, as stated in the next claim. Notice that we are now dealing with a minimization problem.
\begin{claim}
If an LP is bounded and feasible, then $\forall x \in P$, $\exists$ a vertex $x^{\prime} \in P$, s.t. $c^{\top}x^{\prime} \leq c^{\top}x$.
\end{claim}
\begin{proof}
Assume, for contradiction, that $x^{\prime}$ is not a vertex, then $\exists\,y \neq 0$, s.t. $x+y, x-y \in P$, i.e., $\underbrace{A(x+y) = b, A(x-y) = b}_{Ay=0}$, and $x+y, x-y \geq 0$. Without loss of generality, let $c^{\top}y \leq 0$ (we may rename $y \leftarrow -y$). If $c^{\top}y = 0$, as $\exists\,j$, s.t. $y_j \neq 0$, then WLOG, $\exists\,j$, s.t. $y_j < 0$ (Similarly, we can rename $y$ if otherwise).\\
Now consider two cases of $y$:\\
\underline{Case 1}: $\exists\,j$, s.t. $y_j < 0$. Note that as $x+y, x-y \geq 0$, $\mathrm{supp}(y) \subseteq \mathrm{supp}(x)$, i.e., $x_i = 0 \Rightarrow y_i = 0$. Consider $x+ty, t \geq 0$. If $t$ is small enough, then adding $ty$ to $x$ would not violate any constraints, since if $y_j \neq 0$, then $x_j > 0$. Thus, we can gradually increase $t$ from $0$ and stop when some $x_i = 0$. We pick $t^{\ast}$ = $\min_{i:y_{i}<0} \lvert\frac{x_i}{y_i}\rvert$, and change $x$ to $x + t^{\ast}y$.\\
\underline{Case 2}: $\forall j$, $y_j \geq 0$ (i.e. $y \geq 0$). In this case, we can assume $c^{\top}y < 0$ since $y \geq 0$ and $c^\intercal y \leq 0$. Note that $x+ty \in P$, $\forall t\geq 0$, then $\mathsf{OPT} = -\infty$. Thus, case 2 contradicts our premise that the LP is bounded and is impossible.\\
Notice that whenever we are in case $1$, one more coordinate will be set to $0$. We can repeat case 1 over and over again, setting more and more coordinates $x_i$ to $0$ and making $x$ more vertex-like. When enough $x_i$ are $0$ and at the boundary of the feasible region, there will no longer exist a nonzero $y$ to perturb $x$, making $x$ a vertex.
\end{proof}
Next, we will define the concept of basis, and show an equivalent condition of a point being a vertex and the columns of $A$ corresponding to the basis of that point in a claim.
\begin{definition}[Basis]
Given a vertex $x \in P$, the \underline{basis} of $x$ is $B_x = \{j \in [n]: x_j > 0\} = \mathrm{supp}(x)$.
\end{definition}
\begin{claim}
\fbox{$x \in P$ is a vertex} $\iff$ \fbox{columns $A_{B_x}$ are linearly independent}, where $A_S$ denotes $A$ restricted to the columns $S$.
\end{claim}
By the claim above, since $m \leq n$, a vertex can have at most $m$ columns. Hence, a way to find a vertex is to take $m$ independent columns as the basis, as illustrated in the figure below.
\begin{center}
\includegraphics[scale=0.22]{fig2.jpg}
\end{center}
Then, as $Ax = b$,
\begin{equation*}
Ax = \sum_{i=1}^{n} x_iA_i = \sum_{i \in B_{x}}x_iA_i \implies Ax = A_{B_x}x_{B} = b \implies x_{B} = A_{B_x}^{-1}b
\end{equation*}
where $x_{B}$ is the vector $x$ restricted to the indices in $B_x$. Thus, to find $x_{B}$, we can just take the inverse of the restricted columns of the matrix after taking the basis. The other parts of $x$ are just $0$'s. Next, we will prove the claim.
\begin{proof}
We will show both directions by contraposition. First, we will show \fbox{$x \in P$ is not a vertex} $\implies$ \fbox{columns $(A_{B_x})$ are linearly dependent}.\\
If $x \in P$ is not a vertex, then $\exists\, y \neq 0$, s.t. $\quad\begin{aligned}
A(x+y) = b\\
A(x-y) = b
\end{aligned} \quad$ and $\quad \begin{aligned}
x+y \geq 0\\
x-y \geq 0
\end{aligned}$\\
Thus, $Ay = A_{B_y}y^{\prime} = 0$, and $B_y \subseteq B_x$. Hence, $A_{B_x}y^{\prime\prime} = 0$, and thus, columns of $A_{B_x}$ are linearly dependent.
\begin{remark*}
$y$ is an $n$-dimensional vector, $y^{\prime}$ is a $\lvert B_y\rvert$-dimensional vector with all the zero elements ``chopped off" from $y$. $y^{\prime\prime}$ is a $\lvert B_x\rvert$-dimensional vector generated from $B_y$ and adding back several $0$ entries. Thus, since $y \neq 0$, by definition, $y^{\prime} \neq 0$, and thus $y^{\prime\prime} \neq 0$ and columns of $A_{B_x}$ are linearly dependent.
\end{remark*}
Next, we will show \fbox{columns $(A_{B_x})$ are linearly dependent} $\implies$ \fbox{$x \in P$ is not a vertex}.\\
Columns $(A_{B_x})$ are linearly dependent $\implies \exists y \neq 0$, s.t. $A_{B_x}y = 0$. Thus, $\exists\,y^{\prime} \in \mathbb{R}^n$, s.t. $Ay^{\prime} = 0$, and $\mathrm{supp}(y^{\prime}) \subseteq B_x$(we can achieve this by padding $0$'s to other entries). Thus, $y_i \neq 0 \Rightarrow x_i > 0$. Hence, $\exists\,t > 0$, s.t. $\begin{cases}
x + ty^{\prime} \in P\\
x - ty^{\prime} \in P
\end{cases} \implies x$ is not a vertex.
\end{proof}
\subsection{The Simplex algorithm}
Now, we will talk about the algorithm itself. The first thing we need to note is that for any vertex $x$ with $\lvert B_x\rvert < m$, we need to artificially add more linearly independent columns from $A$ to make $\lvert B_x\rvert = m$. This may cause problems, as we will see in the last part of today's lecture.\\
Next, we can (finally) give a more detailed version of the simplex algorithm.
\begin{enumerate}
\item Start at some basis $B$.
\item while $\exists$ a better neighbor, move there
\item HALT.
\end{enumerate}
Naturally, the next questions we may ask are ``what does `a better neighbor' mean?" and ``when should we halt?". To answer them, if we are given a particular basis $B$, we may rewrite the LP as
\begin{eqnarray*}
\min & c_{B}^{\top}x_{B} + c_{N}^{\top}x_{N}\\
\mathrm{s.t.} & \boxed{A_{B}x_{B} + A_{N}x_{N} =~b}\\
& x_{B}, x_{N} \geq~0
\end{eqnarray*}
where $N := [n]\backslash B$. Hence,
\begin{equation*}
x_{B} = A_{B}^{-1}b - A_{B}^{-1}A_{N}x_{N} \implies \mathsf{cost} = \underbrace{c_{B}^{\top}A_{B}^{-1}b}_{\mathrm{const}} - c_{B}^{\top}A_{B}^{-1}A_{N}x_{N} + c_{N}^{\top}x_{N}
\end{equation*}
Therefore, we need to optimize (minimize) $\underbrace{(c_{N} - A_{N}^{\top}\left(A_{B}^{-1}\right)^{\top}c_{B})^{\top}}_{\tilde{c}_{N}}x_{N}$, where $x_{N}$ is currently all $0$.\\
Returning to our questions, now we know that ``$\exists$ a better neighbor" means $\exists\, j$, s.t. $(\tilde{c}_{N})_{j} < 0$, and we halt if every entry of $\tilde{c}_{N} \geq 0$.\\
If we look carefully at the algorithm, we can find that changing entries in $x_{N}$ would also change the entries in $x_{B}$. In fact, each time we throw an index $j$ into the basis, some indices in $B$ would become $0$, and then we will kick them out from the basis. If there are multiple $j$'s that can be chosen to throw into $B$, we may choose one of them freely or by some rules.\\
However, as we artificially add some entries to $B$ when $\lvert B_x\rvert < m$, these entries may already be $0$. Thus, for some $j \in N$, $x_j$ may not be jacked up (increased) at all! In this case, we will add $j$ into $B$ and kick out some ``bad" entry from $B$.\\
That is still not the full story -- we may get into an $\infty$-loop if we don't do this wisely! The good news is that there are ``pivot rules" for us to choose the index to throw into the basis when multiple $j$'s can be chosen, and the entry to kick out from the basis when some of them are $0$. For instance, we can use the Bland's rule \cite{Bla77} discovered in the 1970s. Such rules would guarantee the algorithm would terminate, but the bad news is that all known pivot rules would take exponential time in the worse case!
\section{Conclusion}
The simplex method was first discovered by George Dantzig in 1947 \cite{Dan51}. There is a famous story on Dantzig's solving open problems related to simplex algorithm because he mistakenly regarded them as homework when he was a Ph.D. student in UC Berkeley.\\
In the next lecture, we will cover \underline{strong duality} and its proof. It states that $\exists$ dual feasible $y$ s.t. $c^{\top}x = b^{\top}y$. The proof will be based on writing down $y$ with basis $B$.
\bibliographystyle{alpha}
\begin{thebibliography}{42}
\bibitem[Bla77]{Bla77}
Robert~G.~Bland.
\newblock New finite pivoting rules for the simplex method.
\newblock {\em Mathematics of Operations Research}, 2(2):103--107, 1977.
\bibitem[Dan51]{Dan51}
George~B.~Dantzig
\newblock Maximization of a linear function of variables subject to linear inequalities.
\newblock {\em Activity analysis of production and allocation}, 13:339--347, 1951.
\end{thebibliography}
\end{document}