Skip to content

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn #1171

@PivovarA

Description

@PivovarA

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

What is this?

Intel® Extension for Scikit-learn provides drop-in replacement patching functionality for a seamless way to speed up Scikit-learn application.

Our results

I used automlbenchmarks on large datasets to compare accuracy of autosklearn with patching and without.

datasetName library acc auc balacc logloss
Airlines autosklearn w patching 0.667087 0.720931 0.654484 0.664864
Albert autosklearn w patching 0.677265 0.738089 0.677265 0.642248
Covertype autosklearn w patching 0.918092 0.835118 0.214061
Airlines autosklearn w/o pathcing 0.654552 0.696719 0.663029 0.686289
Albert autosklearn w/o patching 0.652432 0.706782 0.652432 0.691148
Covertype autosklearn w/o patching 0.908678 0.829109 0.252917

The table below represent the difference of autosklearn with patching and w/o patching

datasetName diff accuracy diff auc diff balacc diff logloss
Airlines 0.012535 0.024212 -0.008545 -0.02132
Albert 0.024833 0.031307 0.024833 -0.0489
Covertype 0.009414 0 0.006009 -0.03886

Accuracy was improved because the number of trained models was increased. The full list of algorithms, that can be accelerated with intel extension for scikit-learn can be founded here.

datasetName Airlines Albert Covertype
total number of models w patching 154 180 118
total number of models w/o patching 130 142 110

How to reproduce our results

To add the intel extension for scikit-learn to the benchmark, you just need to add 2 lines at the beginning of the autosklearn exec file:

from sklearnex import patch_sklearn
patch_sklearn()

and add scikit-learn-intelex to the requirements.

I also change constraints for a more honest comparison:

test:
  folds: 2
  max_runtime_seconds: 1800
  cores: 72

And remove environment settings from autosklearn exec file.

os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

All measurements were done on AWS c5.18xlarge instance (Intel Xeon Platinum with 36 cores)

Some benefits of Intel® Extension for Scikit-learn

  • Library uses all capabilities of the hardware, which allows you to get a significant performance boost for the classic machine learning algorithms. Check their patching section and medium articles for more details.

  • All optimizations can be easily integrated into scikit-learn application by changing one line of code. Check their get started section for more details.

I also think, that Intel® Extension for Scikit-learn can help to solve these problems: #445, #923, #1153

What do you think?

Metadata

Metadata

Assignees

Labels

documentationSomething to be documented

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions