Decision Tree
Decision tree is non-parametric, supervised learning method.
There are three types of nodes:
Root
Inner node
Leaf node
To build a tree, we can use top-down method: from root to leaves, we select the features that can split the nodes with largest information gain.
Information gain:
Entropy:
Information Gain: Difference of entropy before and after node splitting
Example:

Pruning:
One of the issue of decision tree is the overfitting. We can use pruning to reduce the overfitting.
There are two methods:
Pre-pruning: early stoping based on child weights, child count in leave node
Post-pruning: The basic idea is using pruning to remove nodes that do not provide additional information. There are mainly two techniques:
bottom up method;
top down method;
Bottom up method: Reduced error pruning
Starting at the leaves, replace each node with its most popular class
If prediction accuracy is not affected --> Keep this change
If not --> change back
Top down method: Cost complexity pruning
Generate a series of trees , where is the initial tree and is the root alone.
At step i, create by removing a subtree from .
Replacing it with a leaf node with value chosen as in the tree building algorithm
The subtree is chosen as
Define the error rate of tree T over dataset S as err(T, S)
Enumerate all subtrees in and select the subtree that minimize the function:
prune(T, t) defines the tree obtained by pruning the subtrees t from the tree T
Pros & Cons
Pros:
Simply understandable and interpret
Dealing with categorical and numerical data
Easy to deal with missing samples
Easy to deal with uncorrelated features
Cons:
Easy overfitting
Neglect the relationship and correlation between features
Different tree node splitting strategy lead to different bias:
Information gain bias towards the features with more data (ID3)
Information gain ratio bias towards the features with less data (CART)
Last updated
Was this helpful?