CART (Classification and regression tree)
CART is the base method for XGBoost. It can be used to build (1) regression tree and (2) classification tree
Deal with Categorical and Numerical feature
Categorical
If the category number is more than 2,
Enumerate all of splitting combinations
Pick splitting point with lowest Gini index
If category number is 2.
Split directly
Numerical
Sort values based on the numerical feature
Find the splitting point between numerical values
Calculate the Gini index of all possible splitting point
Find the splitting point corresponding to the minimum value
CART regression tree
Target is
Where, and
CART regression tree is
Where,
Steps to generate a CART regression Tree:
Iterate all features
For each feature, browsing all possible splitting point
For each splitting point, Measure the sum of square root error
--> Each feature: Find the best splitting point
--> Combine all features: find the best splitting point
Split the samples based on the best splitting point of a feature:
Output the child nodes:
and
and
Finally, split the input space into M areas , and output the CART model as
CART classification tree
Target is Gini index:
is feature 的samples
is samples in with label k
Gini index the smaller, the better
Steps to generate a CART classification Tree:
Iterate all features
For each feature, browsing all possible splitting point
For each splitting point, Measure the
--> Each feature: Find the best splitting point
Select the feature with minimum , as the best splitting point
Recursive the subsamples of each child tree
Last updated
Was this helpful?