An Idea of How to Improve the Random Trees of Leo Breiman

Time

-

Locations

RE 036

Speaker: 

Romà Domènech Masana, Illinois Tech Ph.D. candidate (AMAT)

Description: 

The idea focuses on making more complex splits. A Random Tree is grown by several steps. A Random Tree needs a dataset to start with. This data set DF must contain a response variable y, and a set of predictors x1,..,xm. In each step the dataset DF is split into two according to a condition x1<= c1, where x1 is one of the predictors in the dataset, and c1 is a constant value. x1 and c1 are chosen so that the resulting standard deviation of y, in each of the two splitted data sets is smaller than the original standard deviation of y in the initial dataset DF. I suggest to consider splits of the form: s1*( m1*x1 + z1 ) <= s1*c1 s2*( m2*x1 + z1 ) <= s2*c2 s3*( m3*x1 + z1 ) <= s3*c3 where the x1 and z1 are the two predictors and the m's and c's are the non-negative valued selected constants that maximize the objective function decrease. Note that the constants s1, s2, and s3 are either -1 or 1 and are also selected by the algorithm during the optimization. I choose these kind of combinations because they allow to isolate bounded regions in just 1 split, while the classical random tree would need at least 4 .

Event Topic:

Computational Mathematics & Statistics

Getting to Campus