Data Science 常用算法

LaTex可以正常显示啦!😬

Posted by Fernando on February 15, 2019

Algorithms in Data Science

Support Vector Machine (SVM)

  • Hard margin SVM
  • Soft margin SVM
  • Non-linear SVM

Decision Tree

  • Feature selection
  • Tree spanning
    • Information gain
    • Gini index
  • Pruning

Term Frequency - Inverse Document Frequency (TF-IDF)

  • Preprocessing (vectorise)
  • Term frequency
    • $TF(x) = \frac{N(x)}{N}$
  • Inverse document frequency
    • $IDF(x) = \log \frac{N}{N(x)}$
    • Smoothing: $\log \frac{N+1}{N(x)+1}+1$
  • $TF-IDF(x) = TF(x) * IDF(x)$

Latent Semantic Analysis (LSA)

  • $\arg min_{C_k} X_F = \sqrt{\sum_{i=1}^M \sum_{j=1}^N|X_{ij}|^2},\ where\ X=C-C_k $
  • $C=U \Sigma V^T$, $C_k = U\Sigma_kV^T$

Adaptive Boosting (Adaboost) [not clear]

  • Steps
    1. Intilization
    2. Iteration
    3. Combine classifications
  • Pro and cons
    • High accuracy
    • Simple, no feature selection
    • no overfitting