Skip to content

Upweight categorical features in GFR sampler #167

@andrewherren

Description

@andrewherren

Right now, the GFR sampler enumerates all possible cutpoints (for numeric and categorical features) and then samples according to the (unweighted) likelihood x prior for each split.

To account for the fact that categorical features often have fewer possible cutpoints, we should "up-weight" the likelihood x prior entries by a factor of (# number of numeric cutpoints) / (# of categorical cutpoints) so that the prior probability of splitting on any given feature is roughly uniform.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions