Algorithm_review
15. NP & NPC
- A reduces to B
- A $\leq_P$ B: A is equally or less difficult than B
- Let X and Y be the set of ‘yes’ instances for A and B. A reduces to B if there exists $f: A\rightarrow B$ , s.t. for all instances $x \in A$, $x \in X \Leftrightarrow f(x) \in Y$.
16-17. NPC & Reductions
-
CLIQUE: given a graph with n nodes, is there a clique with n/3 nodes?
- 3-CNF-SAT $\leq_P$ CLIQUE
-
SUBSET-SUM: given a set of numbers $ S = {a_1,…,a_n} $ and a target value $t$, and we want to find a subset $ S’\subseteq S$ summing to $t$.
- 3-CNF-SAT $\leq_P$ SUBSET-SUM
-
HAM-CYCLE: given an undirected graph G = (V, E), does there exist a simple cycle $\Gamma$ that contains every node in V.
- 3-CNF-SAT $\leq_P$ DIR-HAM-CYCLE
-
TSP: given a set of n cities and a pairwise distance function d(u, v), is there a tour of length ≤ D?
- HAM-CYCLE $\leq_P$ TSP
18. Lower bounds
-
Merging two lists: How many comparisons needed to merge two lists of size n into sorted order ?
- If two sorted lists, an upper bound is 2n-1comparisons
- For any two lists, the lower bound is 2n-1comparisons
-
Finding the max: How many comparisons to find the largest number in an unsorted array of n distinct numbers?
- upper bound is n-1 comparisons
- lower bound is n-1comparisons
- show that $w_k+b_k \geq n-k$, by induction
- show that as long as $w_k+b_k > 1$, algo would not terminate. (Either $w_k > 1$ or $b_k > 1$)
- algo terminates when $w_k+b_k = 1$
-
Finding the max and min.
- upper bound is 3n/2-2 comparisons
- compare each 2 elem in pairs (n/2 cp.), split into two arrays, Big and Small, both of size $n/2$
- get max from Big(n/2-1 cp.), get min from Small(n/2-1 cp.)
- lower bound is 3n/2-2 comparisons
- WBR-method
- upper bound is 3n/2-2 comparisons
-
Sorting n numbers: How many comparisons are needed to sort n numbers?
- upper bound is $O(n \log n)$ comparisons using merge sort
- lower bound is $\Omega(n \log n)$ comparisons (for comparison-based sorting algorithm)
- Complexity of algorithm is the length of the longest root-leaf path
- $2^h \geq (num\ of\ leaves\ of\ dectree) \geq n!$ $\Rightarrow$ $h\geq n/2 * (\log_2n-1)$
19-22. Randomized algorithms
Part 1
-
Las Vegas vs Monte Carlo
- A Las Vegas randomized algorithm: variable running time, always right answer
- A Monte Carlo randomizedalgorithm: fixed running time, sometimes wrong answer
-
Max-Cut: Given a graph G, split vertices into two sides to maximize the number of edges between the sides.
- NP-complete
- a Monte Carlo 2-approximation algorithm: Put each node in a random side with probability 1⁄2
- In a graph with e edges, the algorithm produces a cut with expected size e/2.
-
Quicksort
-
Pick a random pivot element s $\Rightarrow$ Partition $\Rightarrow$ Recursion
-
$R(n) = 1/n * (R(1)+R(n-1)+R(2)+R(n-2)+…+R(n- 1)+R(1)) + n * 1/n * (n-1)$
$=2/n * \sum_k R(k) + (n-1)$
-
$R(n) \leq an\log n + b$
-
-
Hash tables
-
Direct addressing:waste of space
-
Closed addressing: every entry in hash table points to a linked list
-
Load factor: $\alpha = num\ of\ items / table_ size$
- Regardless of the hash function, an adversary can choose a set of n inputs to make all operations $O(n)$ time $\Rightarrow$ Universal hashing
- Say H is a universal hash family if for any keys $x \neq y$, $Pr_{h\in H}[h(x) = h(y)] = 1/m$, where m is the table size
- …Constrcuting universal hash family…not finished…
- Open addressing
-
Part 2
- Bloom filter:
- Consists of
- an array A of size m, initially all set to 0
- k independent hash functions $h_1, …, h_k$, each mapping keys to {1, 2 , …, m}
- Operations
- to store key x: set $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$ all to 1
- to search key x: read array locations $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$: if all values are 1, output yes; otherwise no
-
Correctness:
- No false negative: returns no only if there is at least one of $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$ equal 0, but we set all of these to 1 during insertion
- Exist false positive: maybe we inserted some keys that hashed to the same k locations as x
- Deletes can be done by storing a count of how many keys hashed to that location, and inc / dec the counts when inserting or deleting (…may exist problems)
- Consists of
- String fingerprinting:
- $(a_1, …, a_n)$, $(b_1, …, b_n)$, $a=\sum_n a *2^{i-1}$, $b=\sum_n b *2^{i-1}$, F(a) = a mod p, F(b) = b mod p
- Correctness:
- No false positives: if $F(a)\neq F(b)$, then $a\neq b $
- Exist false positive: If $F(a)=F(b)$, then a mod p = b mod p. So either $a=b$, or $a\neq b$ but $p \mid (a-b)$
- String matching: Let X and Y be binary strings with length n and m, resp., where $n\geq m$, decide if string X contains Y. I.e. we want to match Y to a part of X.
- A Monte Carlo algo: FINGERPRINT(X(j), Y) $\Rightarrow$ $O(m+n)$
- A Las Vegas algo: Monte Carlo and double check, Prob=1/n for false positive
Part 3
- Chernoff Bounds:
- …Really..?
Part 4
- Distributed computing:
- Emmmmm…
23-24. Approximation algorithms
Part 1
-
Approximation ratio: Let X be a maxi(mini)mization problem. Let A be an algorithm for X. A is an α-approximation algorithm for X if A always returns an answer that’s at least(most) 1/α (α) times the optimal
- Makespan minimizing:
- SUBSET-SUM $\leq_P$ Makespan minimizing
- Graham’s list scheduling: (1) List the jobs in any order. (2) As long as there are unfinished jobs, if any processor doesn’t have a job now, give it the next job in the list.
- 2-approximation
-
The knapsack problem: We want to put items in the knapsack to maximize the total value, but not exceed the weight limit.
-
DP
- $OPT(i, v) = \min {OPT(i-1,v), wi+OPT(i-1,v-vi)}$
- $V = \max_{v≤nv^\star} OPT(n,v) \leq W$, where $v^\star$ is the maximum weight among all items
- time complexity: $O(n^2v^\star)$
-
polynomial time approximation scheme (PTAS)
- Rounding
-
Part 2
- TSP:
- MST+DFS+shortcutting $\Rightarrow$ 2-approximation
- MST+perfect-matching+Euler-cycle+shortcutting $\Rightarrow$ 1.5-approximation
- Euler cycle: each edge once // Hamilton cycle(TSP): each node once
- K-center: We want to minimize the max distance that any site is from its closest center.
- Gonzalez’s algorithm: initially C is empty.
more info: zhufm@shanghaitech.edu.cn