1. Basic idea
Build a set of weak learners and use the following weak learner to correct the mistakes in the previous weak learner
Adaboost use {-1, 1} for labels. NOT {0, 1}
So we can use decision boundary as 0
Very specific to binary classification. Required labels: {-1, +1}
Target: minimize the weighted error sum:
ϵm=∑n=1Nwn(m)I(ym(xn)=tn)
2. Steps:
Given a training set T={(x1,y1),(x2,y2),...,(xn,yn)}, and yi is the labels with {-1, +1}
Initiate the weights w_i of each sample x_i, wi=1/N, where N is the sample number
For m = 1 to M, where m is the m-th weak learner:
Fit the m-th weak learner, fm(), to minimize weighted error sum ϵm.
To build this weak learner, we can iterate all features and values in each feature, pick the splitting point to minimize ϵm
The calculate ϵm=∑n=1Nwn(m)I(fm(xn)=tn)
Calculate the weight of fm(), αm. The smaller error --> the larger αm
αm=21ln(ϵm1−ϵm)
If ϵm≤1/2 --> αm>=0, the smaller error, the larger αm
Update the weights of each points:
wi=wiexp[−αmyifm(xi)],i=1,...,n
Normalize the weights:wi=∑j=1nwjwi
If fm(xi) and yi are equal --> exp[−αmyifm(xi)]<1 --> reduce the weight
Otherwise, increase the weight
Sum up the weak learners to get the final classifier:
G(x)=sign(∑m=1Mαmfm(x))
3. Additive Model
f(x)=m=1∑Mβmb(x;γm) Where, b(x;γm) is the base function, γm is the parameters in b(x;γm), βm is the weight of b(x;γm).
Target: minimize a loss function L(y,f(x)): minβm,γm∑i=1NL(yi,∑m=1Mβmb(xi;γm))
Forward Stagewise Algorithm: To simplify this procedure, we can learn one base function in one step: minβ,γ∑i=1NL(yi,βb(xi;γ))
Steps for training:
Input: T={(x1,y1),(x2,y2),...,(xn,yn)}, loss function L(y,f(x)), base function set {b(x,γ)}
Output: Additive model: f(x)
Initialize f0(x)=0
for m = 1 to M:
Minimize the loss function to find βm and γm:
(βm,γm)=argminβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi,γ))
update fm(x):
fm(x)=fm−1(x)+βmb(x;γm)
Sum up fm(x):
f(x)=fM(x)=∑m=1Mβmb(x;γm)
We find that fm−1(xi) is a constant value in argminβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi,γ)). We only need to optimize βb(xi,γ)and find the values of βm and γm to minimize the loss function.
4. Use forward stagewise algorithm for AdaBoost