Data Science

Data Science 常用算法

LaTex可以正常显示啦！😬

Posted by Fernando on February 15, 2019

Algorithms in Data Science

Support Vector Machine (SVM)

Hard margin SVM
Soft margin SVM
Non-linear SVM

Decision Tree

Feature selection
Tree spanning
- Information gain
- Gini index
Pruning

Term Frequency - Inverse Document Frequency (TF-IDF)

Preprocessing (vectorise)
Term frequency
- $TF(x) = \frac{N(x)}{N}$
Inverse document frequency
- $IDF(x) = \log \frac{N}{N(x)}$
- Smoothing: $\log \frac{N+1}{N(x)+1}+1$
$TF-IDF(x) = TF(x) * IDF(x)$

Latent Semantic Analysis (LSA)

$\arg min_{C_k} X_F = \sqrt{\sum_{i=1}^M \sum_{j=1}^N|X_{ij}|^2},\ where\ X=C-C_k $
$C=U \Sigma V^T$, $C_k = U\Sigma_kV^T$

Adaptive Boosting (Adaboost) [not clear]

Steps
1. Intilization
2. Iteration
3. Combine classifications
Pro and cons
- High accuracy
- Simple, no feature selection
- no overfitting