mail@pastecode.io avatar
a year ago
1.0 kB
[28/03, 5:21 pm] Rohit: In this case the threshold value for 1st split is 0.23. and the tree has 3 nodes 1 Root and 2 leaf. As in this case our dataset is created using normal distribution where 150 data samples are created with . mean = 5 and standard deviation = 2 and 150 data sample are created with mean = -5 and standard deviation = 2 we get exact 50-50% class distribution.
[28/03, 5:21 pm] Rohit: This tree has 21 nodes and on pruning we get 3 nodes. Here our dataset is created with rnorm values of (1,2) and (-1,2) against the (5,2), (-5,2) in the previous case, we are getting poor splits as compared to the previous results. Here, a greater number of nodes are created because there are morealues in a smaller range, which leads to overlapping thus, a greater number of split nodes are created. This may lead to overfitting when unseen data appears and hence we prune to tree to avoid this problem.
[28/03, 5:22 pm] Rohit: On Pruning the previous tree with complexity parameter, we get this above tree which has 3 nodes.