AdaBoost

1. Basic idea

Build a set of weak learners and use the following weak learner to correct the mistakes in the previous weak learner

Adaboost use {-1, 1} for labels. NOT {0, 1}
- So we can use decision boundary as 0
Very specific to binary classification. Required labels: {-1, +1}
Target: minimize the weighted error sum:
- $\epsilon_m = \sum_{n=1}^N w_n^{(m)}I(y_m(x_n)\neq t_n)$

2. Steps:

Given a training set $T = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}$ , and $y_i$ is the labels with {-1, +1}
Initiate the weights w_i of each sample x_i, $w_i = 1/N$ , where N is the sample number
For m = 1 to M, where m is the m-th weak learner:
- Fit the m-th weak learner, $f_m()$ , to minimize weighted error sum $\epsilon_m$ .
  - To build this weak learner, we can iterate all features and values in each feature, pick the splitting point to minimize $\epsilon_m$
  - The calculate $\epsilon_m = \sum_{n=1}^N w_n^{(m)}I(f_m(x_n)\neq t_n)$
- Calculate the weight of $f_m()$ , $\alpha_m$ . The smaller error --> the larger $\alpha_m$
  - $\alpha_m = \frac{1}{2}ln(\frac{1-\epsilon_m}{\epsilon_m})$
  - If $\epsilon_m \leq 1/2$ --> $\alpha_m >= 0$ , the smaller error, the larger $\alpha_m$
- Update the weights of each points:
  - $w_i = w_i exp[-\alpha_m y_i f_m(x_i)], i = 1, ..., n$
  - Normalize the weights: $w_i = \frac{w_i}{\sum_{j=1}^nw_j}$
    If $f_m(x_i)$ and $y_i$ are equal --> $exp[-\alpha_m y_i f_m(x_i)] < 1$ --> reduce the weight
    Otherwise, increase the weight
Sum up the weak learners to get the final classifier:
$G(x) = sign(\sum_{m=1}^M \alpha_m f_m(x))$

3. Additive Model

f(x) = \sum_{m=1}^M\beta_m b(x;\gamma_m)

Where, $b(x;\gamma_m)$ is the base function, $\gamma_m$ is the parameters in $b(x;\gamma_m)$ , $\beta_m$ is the weight of $b(x;\gamma_m)$ .

Target: minimize a loss function $L(y, f(x))$ : $\min_{\beta_m,\gamma_m}\sum_{i=1}^NL(y_i, \sum_{m=1}^M\beta_mb(x_i; \gamma_m))$

Forward Stagewise Algorithm: To simplify this procedure, we can learn one base function in one step: $\min_{\beta,\gamma}\sum_{i=1}^NL(y_i, \beta b(x_i; \gamma))$

Steps for training:

Input: $T = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}$ , loss function $L(y, f(x))$ , base function set $\{b(x, \gamma)\}$
Output: Additive model: $f(x)$
Initialize $f_0(x) = 0$
for m = 1 to M:
- Minimize the loss function to find $\beta_m$ and $\gamma_m$ :
$(\beta_m, \gamma_m) = \arg min_{\beta,\gamma}\sum_{i=1}^NL(y_i, f_{m-1}(x_i)+\beta b(x_i, \gamma))$
- update $f_m(x)$ :
$f_m(x) = f_{m-1}(x)+\beta_mb(x;\gamma_m)$
Sum up $f_m(x)$ :
$f(x) = f_M(x) = \sum_{m=1}^M\beta_mb(x;\gamma_m)$

We find that $f_{m-1}(x_i)$ is a constant value in $\arg min_{\beta,\gamma}\sum_{i=1}^NL(y_i, f_{m-1}(x_i)+\beta b(x_i, \gamma))$ . We only need to optimize $\beta b(x_i, \gamma)$ and find the values of $\beta_m$ and $\gamma_m$ to minimize the loss function.

4. Use forward stagewise algorithm for AdaBoost

https://blog.csdn.net/v_JULY_v/article/details/40718799

PreviousCART (Classification and regression tree)NextGBDT

Last updated 5 years ago

Was this helpful?