Decision Tree

Decision tree is non-parametric, supervised learning method.

There are three types of nodes:

To build a tree, we can use top-down method: from root to leaves, we select the features that can split the nodes with largest information gain.

IG(T,a) = H(T) - \sum_a H(T|a) \\ = -\sum^J_{i=1}p_ilog_2p_i - \sum_ap(a)(\sum^J_{i=1}[-p(i|a)log_2P(i|a)])

One of the issue of decision tree is the overfitting. We can use pruning to reduce the overfitting.

There are two methods:

Pre-pruning: early stoping based on child weights, child count in leave node
Post-pruning: The basic idea is using pruning to remove nodes that do not provide additional information. There are mainly two techniques:
- bottom up method;
- top down method;

Bottom up method: Reduced error pruning

Starting at the leaves, replace each node with its most popular class
- If prediction accuracy is not affected --> Keep this change
- If not --> change back

Top down method: Cost complexity pruning

Generate a series of trees $T_0, ..., T_m$ , where $T_0$ is the initial tree and $T_m$ is the root alone.
At step i, create $T_i$ by removing a subtree from $T_{i-1}$ .
- Replacing it with a leaf node with value chosen as in the tree building algorithm
- The subtree is chosen as
  - Define the error rate of tree T over dataset S as err(T, S)
  - Enumerate all subtrees in $T_{i-1}$ and select the subtree that minimize the function: $\frac{err(prune(T,t),S) - err(T,S)}{|leaves(T)| - |leaves(prune(T,t))|}$
  - prune(T, t) defines the tree obtained by pruning the subtrees t from the tree T

Pros:
- Simply understandable and interpret
- Dealing with categorical and numerical data
- Easy to deal with missing samples
- Easy to deal with uncorrelated features
Cons:
- Easy overfitting
- Neglect the relationship and correlation between features
- Different tree node splitting strategy lead to different bias:
  - Information gain bias towards the features with more data (ID3)
  - Information gain ratio bias towards the features with less data (CART)

Last updated 5 years ago

Was this helpful?