Generator to (repeatedly) select subsets of a dataset.
The Balancer can equalize the number of samples/features in a dataset, or select an absolute number or fraction of all available data. Selection is performed given a particular attribute and additionally can be limited to a subset of the dataset defined by more complex criteria (see limit argument). The node can either “mark” elements as selected by adding a corresponding attribute to the output dataset, or actually apply the selection by returning a new dataset with only selected elements.
Notes
Available conditional attributes:
(Conditional attributes enabled by default suffixed with +)
Methods
generate(ds) | Generate the desired number of balanced datasets datasets. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node Set to None to disable postprocessing. |
set_space(name) | Set the processing space name of this node. |
Parameters: | amount : {‘equal’} or int or float
attr : str
count : int
limit : None or str or dict
apply_selection : bool
space : str
enable_ca : None or list of str
disable_ca : None or list of str
postproc : Node instance, optional
descr : str
|
---|
Methods
generate(ds) | Generate the desired number of balanced datasets datasets. |
get_postproc() | Returns the post-processing node or None. |
get_space() | Query the processing space name of this node. |
reset() | |
set_postproc(node) | Assigns a post-processing node Set to None to disable postprocessing. |
set_space(name) | Set the processing space name of this node. |
Generate the desired number of balanced datasets datasets.