AdaBoost

1. Basic idea

Build a set of weak learners and use the following weak learner to correct the mistakes in the previous weak learner

  • Adaboost use {-1, 1} for labels. NOT {0, 1}

    • So we can use decision boundary as 0

  • Very specific to binary classification. Required labels: {-1, +1}

  • Target: minimize the weighted error sum:

    • ϵm=n=1Nwn(m)I(ym(xn)tn)\epsilon_m = \sum_{n=1}^N w_n^{(m)}I(y_m(x_n)\neq t_n)

2. Steps:

  • Given a training set T={(x1,y1),(x2,y2),...,(xn,yn)}T = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}, and yiy_i is the labels with {-1, +1}

  • Initiate the weights w_i of each sample x_i, wi=1/Nw_i = 1/N, where N is the sample number

  • For m = 1 to M, where m is the m-th weak learner:

    • Fit the m-th weak learner, fm()f_m(), to minimize weighted error sum ϵm\epsilon_m.

      • To build this weak learner, we can iterate all features and values in each feature, pick the splitting point to minimize ϵm\epsilon_m

      • The calculate ϵm=n=1Nwn(m)I(fm(xn)tn)\epsilon_m = \sum_{n=1}^N w_n^{(m)}I(f_m(x_n)\neq t_n)

    • Calculate the weight of fm()f_m(), αm\alpha_m. The smaller error --> the larger αm\alpha_m

      • αm=12ln(1ϵmϵm)\alpha_m = \frac{1}{2}ln(\frac{1-\epsilon_m}{\epsilon_m})

      • If ϵm1/2\epsilon_m \leq 1/2 --> αm>=0\alpha_m >= 0, the smaller error, the larger αm\alpha_m

    • Update the weights of each points:

      • wi=wiexp[αmyifm(xi)],i=1,...,nw_i = w_i exp[-\alpha_m y_i f_m(x_i)], i = 1, ..., n

      • Normalize the weights:wi=wij=1nwjw_i = \frac{w_i}{\sum_{j=1}^nw_j}

        • If fm(xi)f_m(x_i) and yiy_i are equal --> exp[αmyifm(xi)]<1exp[-\alpha_m y_i f_m(x_i)] < 1 --> reduce the weight

        • Otherwise, increase the weight

  • Sum up the weak learners to get the final classifier:

    G(x)=sign(m=1Mαmfm(x))G(x) = sign(\sum_{m=1}^M \alpha_m f_m(x))

3. Additive Model

f(x)=m=1Mβmb(x;γm)f(x) = \sum_{m=1}^M\beta_m b(x;\gamma_m)

Where, b(x;γm)b(x;\gamma_m) is the base function, γm\gamma_m is the parameters in b(x;γm)b(x;\gamma_m), βm\beta_m is the weight of b(x;γm)b(x;\gamma_m).

Target: minimize a loss function L(y,f(x))L(y, f(x)): minβm,γmi=1NL(yi,m=1Mβmb(xi;γm))\min_{\beta_m,\gamma_m}\sum_{i=1}^NL(y_i, \sum_{m=1}^M\beta_mb(x_i; \gamma_m))

Forward Stagewise Algorithm: To simplify this procedure, we can learn one base function in one step: minβ,γi=1NL(yi,βb(xi;γ))\min_{\beta,\gamma}\sum_{i=1}^NL(y_i, \beta b(x_i; \gamma))

Steps for training:

  • Input: T={(x1,y1),(x2,y2),...,(xn,yn)}T = \{(x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\}, loss function L(y,f(x))L(y, f(x)), base function set {b(x,γ)}\{b(x, \gamma)\}

  • Output: Additive model: f(x)f(x)

  • Initialize f0(x)=0f_0(x) = 0

  • for m = 1 to M:

    • Minimize the loss function to find βm\beta_m and γm\gamma_m:

    (βm,γm)=argminβ,γi=1NL(yi,fm1(xi)+βb(xi,γ))(\beta_m, \gamma_m) = \arg min_{\beta,\gamma}\sum_{i=1}^NL(y_i, f_{m-1}(x_i)+\beta b(x_i, \gamma))

    • update fm(x)f_m(x):

    fm(x)=fm1(x)+βmb(x;γm)f_m(x) = f_{m-1}(x)+\beta_mb(x;\gamma_m)

  • Sum up fm(x)f_m(x):

    f(x)=fM(x)=m=1Mβmb(x;γm)f(x) = f_M(x) = \sum_{m=1}^M\beta_mb(x;\gamma_m)

We find that fm1(xi)f_{m-1}(x_i) is a constant value in argminβ,γi=1NL(yi,fm1(xi)+βb(xi,γ))\arg min_{\beta,\gamma}\sum_{i=1}^NL(y_i, f_{m-1}(x_i)+\beta b(x_i, \gamma)). We only need to optimize βb(xi,γ)\beta b(x_i, \gamma)and find the values of βm\beta_m and γm\gamma_m to minimize the loss function.

4. Use forward stagewise algorithm for AdaBoost

Last updated

Was this helpful?