\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm,graphicx}
\usepackage{fullpage}
\usepackage[capitalise,nameinlink]{cleveref}
\usepackage{float}
\crefname{lemma}{Lemma}{Lemmas}
\crefname{fact}{Fact}{Facts}
\crefname{theorem}{Theorem}{Theorems}
\crefname{corollary}{Corollary}{Corollaries}
\crefname{claim}{Claim}{Claims}
\crefname{example}{Example}{Examples}
\crefname{problem}{Problem}{Problems}
\crefname{setting}{Setting}{Settings}
\crefname{definition}{Definition}{Definitions}
\crefname{assumption}{Assumption}{Assumptions}
\crefname{subsection}{Subsection}{Subsections}
\crefname{section}{Section}{Sections}
\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\eps}{\varepsilon}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\R}{\mathbb{R}}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf CS 270: Combinatorial Algorithms and Data Structures
} \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}[section]
\newtheorem*{theorem*}{Theorem}
\newtheorem{itheorem}{Theorem}
\newtheorem{subclaim}{Claim}[theorem]
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem*{proposition*}{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem*{lemma*}{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem*{conjecture*}{Conjecture}
\newtheorem{fact}[theorem]{Fact}
\newtheorem*{fact*}{Fact}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem*{exercise*}{Exercise}
\newtheorem{hypothesis}[theorem]{Hypothesis}
\newtheorem*{hypothesis*}{Hypothesis}
\newtheorem{conjecture}[theorem]{Conjecture}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{setting}[theorem]{Setting}
\newtheorem{construction}[theorem]{Construction}
\newtheorem{example}[theorem]{Example}
\newtheorem{question}[theorem]{Question}
\newtheorem{openquestion}[theorem]{Open Question}
\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{protocol}[theorem]{Protocol}
\newtheorem{assumption}[theorem]{Assumption}
\newtheorem{exercise-easy}[theorem]{Exercise}
\newtheorem{exercise-med}[theorem]{Exercise}
\newtheorem{exercise-hard}[theorem]{Exercise$^\star$}
\newtheorem{claim}[theorem]{Claim}
\newtheorem*{claim*}{Claim}
\newtheorem{remark}[theorem]{Remark}
\newtheorem*{remark*}{Remark}
\newtheorem{observation}[theorem]{Observation}
\newtheorem*{observation*}{Observation}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
% \topmargin 0pt
% \advance \topmargin by -\headheight
% \advance \topmargin by -\headsep
% \textheight 8.9in
% \oddsidemargin 0pt
% \evensidemargin \oddsidemargin
% \marginparwidth 0.5in
% \textwidth 6.5in
% \parindent 0in
% \parskip 1.5ex
\begin{document}
\lecture{9 --- February 14, 2023}{Spring 2023}{Prof.\ Jelani Nelson}{Martin Toft}
\section{Overview}
In the last lecture we started looking the Word RAM model. In this lecture we will continue looking the Word RAM model, more specifically the data-structures y-fast tries and Fusion trees.
\section{y-fast tries}
We ended last lecture by looking at x-fast tries, which have:
\begin{itemize}
\item $O(nw)$ space
\item $O(\lg\lg u)$ query
\item $O(\lg u)$ update
\end{itemize}
y-fast tries improves X-fast tries, and have
\begin{itemize}
\item $O(n)$ space
\item Same query time
\item $O(\lg\lg u)$ (amorized) update
\end{itemize}
\smallskip
\subsection{How it works}
y-fast tries are made up of two data structures: The top half is x-fast trie, and the lower half consists of balanced binary search trees (BBSTs).
The keys are divided into groups of $O(\lg u)$ consecutive elements and stored in a BBST. To facilitate efficient insertion and deletion, each group contains at least w/4 and at most 2*w elements. For each group, a representative r is chosen. These representatives are stored in the x-fast trie. The groups have pointers to their predecessor and successor.
What this separation of x-fast trie and BBSTs means for our search is that we first search the x-fast trie and find the correct representative, then search the BBST and find the correct element.
\begin{figure}[H]
\centering
\includegraphics[width=0.5\textwidth]{y-fast trie}
\caption{Y-trie}
\label{fig:y-trie1}
\end{figure}
\subsection{Runtimes and space}
Since the x-fast trie stores $O(n / \log M)$ representatives and each representative occurs in $O(\log M)$ hash tables, it requires $O(n / \log M)$ * $O(\log M)$ = $O(n)$ space. The BBST store n elements in total which uses $O(n)$ space. The total space used by y-fast tries in total is therefore $O(n)$ + $O(n)$ = $O(n)$.
Querying is done by first finding the correct group using x-fast query, then searching through that group's BBST. This is done in $O(\lg\lg u)$ time.
When it comes to insertion, this is done by first finding the correct group, then inserting into group. If the group becomes too big, i.e., larger than 2w, we split the BBST into two, and remove its representative. We then pick a representative for each of the new BBSTs and insert these into the x-fast trie. One of the new representatives simply replace the old one, while the other needs to be inserted into a new slot, i.e., by setting a 0 to a 1. When setting a 0 to a 1, this forces us to move up x-fast trie, updating the value of the internal nodes, which takes $O(w)$ time. However, this is done at most once for every $O(w)$ insertion, hence it takes $O(1)$ time (amortized). In total, it takes $O(\lg\lg u)$ time to find a group + $O(1)$ amortized to handle splitting group and adding new rep to x-fast tree, which in total is equals to $O(\lg\lg u)$.
\smallskip
\begin{figure}[H]
\centering
\includegraphics[width=0.5\textwidth]{y-fast trie split}
\caption{BBST split into two new BBSTs where new representatives are inserted into x-fast trie}
\label{fig:y-trie2}
\end{figure}
\section{Fusion trees}
A fusion tree is essentially a B-tree with branching factor of $w^{(1/5)}$. There two types of fusion trees: static and dynamic. Dynamic fusion trees will not be discussed in this class, but those who are interested can look it up \cite{dynamicfusion:1996}. Instead, we will focus on static fusion trees \cite{statfusion:1990}. Our goal: Static predecessor in $O(\lg w n)$ time.
\subsection{How it works}
To achieve the desired runtimes for updates and queries, the fusion tree must be able to search a node containing up to $w^{(1/5)}$ keys in constant time. This is done by "sketching", which essentially compress the keys so that they all fit into one machine word, which allows comparisons to be done in parallel.
Reaching the required solution for predecessor in $O(\lg w n)$ time requires 4 main "ingredients"/computations:
\begin{enumerate}
\item Sketching (has nothing to do with the research field of the prof.)
\item Word level parallelism, i.e., parallel comparison achieved through compression
\item Power of multiplication
\item Finding the Most Significant Set Bit (MSSB) in $O(1)$ time (connected to point (3))
\begin{itemize}
\item Note: Some machines today have MSSB built into their CPU, so that we don't have to think about finding it. We will however assume that our machine don't have MSSB built into its CPU, and define how to go forward with finding it.
\end{itemize}
\end{enumerate}
\subsection{Sketching}
As we noted earlier, sketching is a kind of compression where we want each fusion node to fit in a single machine word. Sketching is done by storing what's called \emph{sketches} of our keys. A sketch is made by grouping all the positions where we have branching (i.e., where both branches of an internal node reach out to nodes). To distinguish two paths, it is sufficient to look at their branching point, i.e., the first bit where any two keys differ.
Since there is a maximum of k keys, there will be a maximum of k-1 branching points, which means that we can identify a key using a maximum of k-1 bits. And so, the number of branch bits is $<$ k = w$^{(1/5)}$.
An important property of the sketch function is that it preserves the order of the keys, as $x_0$ $<$ ... $<$ $x_{k-1}$ $=>$ sk($x_0$) $<$ ... $<$ sk($x_{k-1}$) where sk($x_i$) is the sketch of the key $x_i$.
Take Figure 4 below as an example of sketching, where u = 16 and w = 4, we have the keys 0000, 0010, 1100, 1111, and the dashed lines represent branching in that node. This branching gives us the sketches 00, 01, 10 and 11 for the keys, respectively.
\begin{figure}[H]
\centering
\includegraphics[width=1\textwidth]{Sketching}
\caption{An example of sketching}
\label{fig:sketching}
\end{figure}
\emph{Problem}: What if we add a new key, and this results in two keys having the same sketch, as shown in Figure 4 below? \\
\emph{Fix}: Say that sk($x_i$) = pred(sk(q)), also learn $x_{i+1}$. First find highest point where the nodes with the same key branch differently, which we will call y. If the newly added key was added to the right of y (if we fell off y to the right), we set e = y011...1. If we fell off y to the left, we set e = y100...0. If we have multiple leaves with same sketch, we do this successively. \\
\emph{Claim}: If we search pred(sk(e)), we will find the correct child branch.
\begin{figure}[H]
\centering
\includegraphics[width=1\textwidth]{sketching2}
\caption{Sketch after adding a new key to tree}
\label{fig:sketching2}
\end{figure}
\subsection{Word level parallelism (parallel comparison)}
The purpose of the compression achieved by sketching is to allow all of the keys to be stored in one w-bit word. The sketch of a fusion node is the bit string sk(node) = 1sk($x_0$)1sk($x_1$)...1sk($x_{k-1}$). Essentially, we have packed all sketch words together in one string by prepending a set bit to each of them.
The sketch function uses b $\le$ $r^4$ bits. Each block then uses 1 + b $\le$ $w^{4/5}$ bits. Because k $\le$ $w^{1/5}$, the total number of bits in the node sketch of a fusion node is maximum w, which fit into our words.
We can make each query sketch sk(q) as long as sk(node), so that each word in sk(node) can be compared with sk(q) in one operation, demonstrating word-level parallelism. We compute sk(node) in preprocessing. In runtime, we compute sk(q) = 0sk(q)0sk(q)...0sk(q). We then take the difference of sk(node) and sk(q). The leading bit of each block in the difference will be 1 iff sk(q) $\le$ sk($x_i$), otherwise 0. Since $x_i$'s in the fusion node is in sorted order, the difference between the two sketches will for a while be 0s with some other stuff, then 1s with some other stuff, i.e., sk(node) - sk(q) = 0...0...1...1...1... .
\emph{Note}: It is important to note that structure is not arbitrary: We know that these blocks of bits are of some width, and that we only care about some parts of them. If we know how many blocks that contain 1 in the leading bit, we could figure out what block is the leftmost block with 1, i.e., the MSSB, which leads us to our next part.
\subsection{Power of multiplication}
In the last subsection, we ended up with the difference of sk(node) and sk(q) being 0s with some other stuff, followed by 1s with some other stuff. We don't care about the other stuff, so by taking the bitwise AND of the difference and the constant $(10^b)^k$, we clear everything except from the leading bit of each block.
After the mulitplication, we can mask by packing together all the ones in the first block and masking away all the other blocks to 0. Finally, we shift right, so that we have a group of 0s followed by a group of 1s. This prepares us for our next and last step, finding the MSSB.
\subsection{Approximating the sketch}
Before we look at finding MSSB, there is one other thing we need to look at, namely approximating the sketch. With only the standard word operations, such as those of the C programming language, it can be difficult to compute a perfect sketch in constant time. By instead packing the sketch bits into an approximate sketch which has all the important bits but also some irrelevant bits we are able to achieve this. Just like the perfect sketch, the approximate sketch preserves the order of the keys, giving us that sk($x_0$) $<$ ... $<$ sk($x_{k-1}$).
We compute the approximate sketch by using a bitwise AND between sk(node) and $\sum_{i=0}^{r-1} 2^{b_i}$ which serves as a mask to remove all the non-sketch bits from the key, followed by multiplication with some constant m that shifts the sketch bits into a small range, then masking out all but the shifted sketch bits.
\emph{Claim}: Given keys $x_0$ $<$ ... $<$ $x_{k-1}$ and branch bits $b_0$ $<$ ... $<$ $b_{r-1}$ (r $<$ k), there exists a number m $\in$ {0, ..., u-1}, where m = $\sum_{i=1}^r 2^{m_i}$ such that
\begin{enumerate}
\item For all i,j $\neq$ i',j', $m_i$ + $b_j$ $\neq$ $m_{i'}$ + $b_{j'}$, i.e., $m_i$ + $b_j$ are distinct pairs for all (i, j). This will ensure that the sketch bits don't get altered by the multiplication in such a way that we can't reverse it to find keys later on.
\item $m_0$ + $b_0$ $<$ $m_1$ + $b_1$ $<$ ..., i.e., $m_i$ + $b_i$ is ordered in strictly increasing order.
\item ($m_{r-1}$ + $b_{r-1}$) - ($m_0$ + $b_0$) $\le$ $r^4$, i.e., the sketch bits are packaged into a range of maximum size $r^4$
\end{enumerate}
\emph{Proof}: Let $m_0$ = 0 and 1 $<$ t $\le$ r. Suppose that we already found $m_1$, ..., $m_{t-1}$. Pick the smallest key $m_t$ where both point (1) and (2) is satisfied. (1) requires that $m_t$ $\neq$ $b_i$ - $b_j$ + $m_s$ for 1 $\le$ i, j $\le$ r and 1 $\le$ s $\le$ t-1. This implies that there are less than t$r^2$ $\le$ $r^3$ values that $m_t$ must avoid. Since $m_t$ is chosen to be minimal, we further have that ($b_t$ + $m_t$) $\le$ ($b_{t-1}$ + $m_{t-1}$) + $r^3$, which implies point (3), and thereby proves our claim.
\subsection{MSSB}
In the next lecture we will look at the last "ingredient", namely MSSB, and how we can find this in constant time.
\begin{thebibliography}{1}
\bibitem{statfusion:1990}
Willard Fredmund.
\newblock Blasting through the information theoretic barrier with fusion trees.
\newblock {\em Proceedings of the Twenty-Second Annual ACM Symposium on Theory
of Computing}, 1990.
\bibitem{dynamicfusion:1996}
Rajeev Raman.
\newblock Priority queues: small, monotone and trans-dichotomous.
\newblock {\em Fourth Annual European Symposium on Algorithms}, page 121–137,
1996.
\end{thebibliography}
\end{document}