部分算法回顾

Intorduction to algorithms in 2018Fall

Posted by Fernando on January 4, 2019

Algorithm_review

15. NP & NPC

  • A reduces to B
    • A $\leq_P$ B: A is equally or less difficult than B
    • Let X and Y be the set of ‘yes’ instances for A and B. A reduces to B if there exists $f: A\rightarrow B$ , s.t. for all instances $x \in A$, $x \in X \Leftrightarrow f(x) \in Y$.

16-17. NPC & Reductions

  • CLIQUE: given a graph with n nodes, is there a clique with n/3 nodes?

    • 3-CNF-SAT $\leq_P$ CLIQUE
  • SUBSET-SUM: given a set of numbers $ S = {a_1,…,a_n} $ and a target value $t$, and we want to find a subset $ S’\subseteq S$ summing to $t$.

  • 3-CNF-SAT $\leq_P$ SUBSET-SUM
  • HAM-CYCLE: given an undirected graph G = (V, E), does there exist a simple cycle $\Gamma$ that contains every node in V.

    • 3-CNF-SAT $\leq_P$ DIR-HAM-CYCLE
  • TSP: given a set of n cities and a pairwise distance function d(u, v), is there a tour of length ≤ D?

    • HAM-CYCLE $\leq_P$ TSP

18. Lower bounds

  • Merging two lists: How many comparisons needed to merge two lists of size n into sorted order ?

    • If two sorted lists, an upper bound is 2n-1comparisons
    • For any two lists, the lower bound is 2n-1comparisons
  • Finding the max: How many comparisons to find the largest number in an unsorted array of n distinct numbers?

    • upper bound is n-1 comparisons
    • lower bound is n-1comparisons
      • show that $w_k+b_k \geq n-k$, by induction
      • show that as long as $w_k+b_k > 1$, algo would not terminate. (Either $w_k > 1$ or $b_k > 1$)
      • algo terminates when $w_k+b_k = 1$
  • Finding the max and min.

    • upper bound is 3n/2-2 comparisons
      • compare each 2 elem in pairs (n/2 cp.), split into two arrays, Big and Small, both of size $n/2$
      • get max from Big(n/2-1 cp.), get min from Small(n/2-1 cp.)
    • lower bound is 3n/2-2 comparisons
      • WBR-method
  • Sorting n numbers: How many comparisons are needed to sort n numbers?

    • upper bound is $O(n \log n)$ comparisons using merge sort
    • lower bound is $\Omega(n \log n)$ comparisons (for comparison-based sorting algorithm)
      • Complexity of algorithm is the length of the longest root-leaf path
      • $2^h \geq (num\ of\ leaves\ of\ dectree) \geq n!$ $\Rightarrow$ $h\geq n/2 * (\log_2n-1)$

19-22. Randomized algorithms

Part 1
  • Las Vegas vs Monte Carlo

    • A Las Vegas randomized algorithm: variable running time, always right answer
    • A Monte Carlo randomizedalgorithm: fixed running time, sometimes wrong answer
  • Max-Cut: Given a graph G, split vertices into two sides to maximize the number of edges between the sides.

    • NP-complete
    • a Monte Carlo 2-approximation algorithm: Put each node in a random side with probability 1⁄2
      • In a graph with e edges, the algorithm produces a cut with expected size e/2.
  • Quicksort

    • Pick a random pivot element s $\Rightarrow$ Partition $\Rightarrow$ Recursion

    • $R(n) = 1/n * (R(1)+R(n-1)+R(2)+R(n-2)+…+R(n- 1)+R(1)) + n * 1/n * (n-1)$

      $=2/n * \sum_k R(k) + (n-1)$

    • $R(n) \leq an\log n + b$

  • Hash tables

    • Direct addressing:waste of space

    • Closed addressing: every entry in hash table points to a linked list

    • Load factor: $\alpha = num\ of\ items / table_ size$

    • Regardless of the hash function, an adversary can choose a set of n inputs to make all operations $O(n)$ time $\Rightarrow$ Universal hashing
      • Say H is a universal hash family if for any keys $x \neq y$, $Pr_{h\in H}[h(x) = h(y)] = 1/m$, where m is the table size
      • …Constrcuting universal hash family…not finished…
    • Open addressing
Part 2
  • Bloom filter:
    • Consists of
      • an array A of size m, initially all set to 0
      • k independent hash functions $h_1, …, h_k$, each mapping keys to {1, 2 , …, m}
    • Operations
      • to store key x: set $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$ all to 1
      • to search key x: read array locations $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$: if all values are 1, output yes; otherwise no
    • Correctness:

      • No false negative: returns no only if there is at least one of $A[h_1(x)], A[h_2(x)], …, A[h_k(x)]$ equal 0, but we set all of these to 1 during insertion
      • Exist false positive: maybe we inserted some keys that hashed to the same k locations as x
    • Deletes can be done by storing a count of how many keys hashed to that location, and inc / dec the counts when inserting or deleting (…may exist problems)
  • String fingerprinting:
    • $(a_1, …, a_n)$, $(b_1, …, b_n)$, $a=\sum_n a *2^{i-1}$, $b=\sum_n b *2^{i-1}$, F(a) = a mod p, F(b) = b mod p
    • Correctness:
      • No false positives: if $F(a)\neq F(b)$, then $a\neq b $
      • Exist false positive: If $F(a)=F(b)$, then a mod p = b mod p. So either $a=b$, or $a\neq b$ but $p \mid (a-b)$
  • String matching: Let X and Y be binary strings with length n and m, resp., where $n\geq m$, decide if string X contains Y. I.e. we want to match Y to a part of X.
    • A Monte Carlo algo: FINGERPRINT(X(j), Y) $\Rightarrow$ $O(m+n)$
    • A Las Vegas algo: Monte Carlo and double check, Prob=1/n for false positive
Part 3
  • Chernoff Bounds:
    • …Really..?
Part 4
  • Distributed computing:
    • Emmmmm…

23-24. Approximation algorithms

Part 1
  • Approximation ratio: Let X be a maxi(mini)mization problem. Let A be an algorithm for X. A is an α-approximation algorithm for X if A always returns an answer that’s at least(most) 1/α (α) times the optimal

  • Makespan minimizing:
    • SUBSET-SUM $\leq_P$ Makespan minimizing
    • Graham’s list scheduling: (1) List the jobs in any order. (2) As long as there are unfinished jobs, if any processor doesn’t have a job now, give it the next job in the list.
      • 2-approximation
  • The knapsack problem: We want to put items in the knapsack to maximize the total value, but not exceed the weight limit.

    • DP

      • $OPT(i, v) = \min {OPT(i-1,v), wi+OPT(i-1,v-vi)}$
      • $V = \max_{v≤nv^\star} OPT(n,v) \leq W$, where $v^\star$ is the maximum weight among all items
      • time complexity: $O(n^2v^\star)$
    • polynomial time approximation scheme (PTAS)

      • Rounding
Part 2
  • TSP:
    • MST+DFS+shortcutting $\Rightarrow$ 2-approximation
    • MST+perfect-matching+Euler-cycle+shortcutting $\Rightarrow$ 1.5-approximation
      • Euler cycle: each edge once // Hamilton cycle(TSP): each node once
  • K-center: We want to minimize the max distance that any site is from its closest center.
    • Gonzalez’s algorithm: initially C is empty.

more info: zhufm@shanghaitech.edu.cn