The algorithm to fit a regression tree is as follows:
- Use the recursive binary splitting to obtain a large tree. Consider a minimum number of observations in the terminal nodes as a criterion to stop the recursive binary splitting.
- Apply the cost complexity pruning to get the optimum subtree. The cost complexity pruning is defined as a function of \alpha, which works as a tuning parameter.
- The tuning parameter \alpha is determined using the K-fold cross-validation. A training and a test set are then defined. In each fold, the training set is splitten applying the recursive binary splitting. To the resulting tree, the cost complexity pruning is applied. Then, for a different set of specific \alphas, the mean squared prediction error on the test set is evaluated. The results for the different folds and for the different \alphas are averaged, being \alpha the value that minimizes the average error.
- Once we get the \alpha value, we replace it in Step 2 equation to get the optimum subtree.
Some important definitions:
- Subtree. A tree that results from pruning a larger tree.
- Recursive binary splitting. Top-down greedy approach to divide the data into distinct and non-overlapping regions. It is top-down because it begins at the top of the tree (at which point all observations belong to a single region) and then successively splits the predictor space; each split is indicated via two new branches further down on the tree. It is greedy because at each step of the tree-building process, the best split is made at that particular step, rather than looking ahead and picking a split that will lead to a better tree in some future step.
- Cost complexity pruning. It is a way to get optimum subtrees. Mathematically, this is done adding a penalty term to the mean squared prediction error expression. This penalty term is governed by a tuning parameter, which controls the trade-off between the subtree’s complexity and its fit to the training data.