Model Configuration Parameters¶

In addition to the feature-specific configurations, there are additional parameters set at the model level. These are similar to those available in other boosting packages.

mode: This can be ‘classification’ or ‘regression’. StructureBoost uses a single class to handle both situations.
num_trees: The number of trees to build. We recommend using an eval_set with early stopping so that the number of trees built is learned dynamically. When using that option, you can set num_trees to a large value. However, one can also specify a set number of trees.
max_depth: The maximum depth to build the trees. The larger the number, the more likely you are to overfit, and the less likely to underfit. Default is 3.
learning_rate: The “step size” to use when adding each tree. We recommend erring on the side of having smaller steps and more trees (and using early stopping with an eval_set to optimize model size). StructureBoost works particularly well under these conditions since it will have multiple opportunities to visit the space of possible splits.
subsample: How much of the training data to use at each tree. Will be interpreted as a number of rows if given an integer >1 and a percentage of the data if given a float between 0 and 1. Rows can be chosen with or without replacement, depending on the value of replace.
replace: If True, the data set for each tree will be chosen with replacement (as in the bootstrap). If False, the rows for each tree will chosen without replacement.
loss_fn: By default, we will use log loss (aka entropy, categorical cross-entropy, maximum likelihood) for classification and mean squared error for regression. To specify a custom loss_fn, one can pass a tuple containing two (vectorized) functions that return the first and second derivatives of the loss_fn.
feat_sample_by_tree: The fraction (or number) of features to sample from at each tree. Will be interpreted as a number of features for integers>1 and a fraction for floats between 0 and 1.
feat_sample_by_node: The fraction (or number) of features to sample from at each node. Will be interpreted as a number of features for integers>1 and a fraction for floats between 0 and 1. Effect is cumulative with feat_sample_by_tree - chooses a subset relative to the size of the subset passed to it for each tree.
gamma: Regularization parameter as in XGBoost. If the best split results in a gain less than gamma, it will not be executed.
reg_lambda: L2 Regularization parameter as in XGBoost. Serves to “shrink” the size of the value at each leaf.
na_unseen_action: How to choose which way Missing Values should be sent in the tree, if there are no Missing Values at that node at the time of choosing a split. Default is “weighted_random”, which randomly fixes the direction in proportion to the number of data points that went into each direction. Alternative is “random”, which chooses a direction with equal probability regardless of how many data points went each direction.
min_sample_split: How many data points a node must have to consider splitting it further. Default is 2.