-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn
What is this?
Intel® Extension for Scikit-learn provides drop-in replacement patching functionality for a seamless way to speed up Scikit-learn application.
Our results
I used automlbenchmarks on large datasets to compare accuracy of autosklearn with patching and without.
datasetName | library | acc | auc | balacc | logloss |
---|---|---|---|---|---|
Airlines | autosklearn w patching | 0.667087 | 0.720931 | 0.654484 | 0.664864 |
Albert | autosklearn w patching | 0.677265 | 0.738089 | 0.677265 | 0.642248 |
Covertype | autosklearn w patching | 0.918092 | 0.835118 | 0.214061 | |
Airlines | autosklearn w/o pathcing | 0.654552 | 0.696719 | 0.663029 | 0.686289 |
Albert | autosklearn w/o patching | 0.652432 | 0.706782 | 0.652432 | 0.691148 |
Covertype | autosklearn w/o patching | 0.908678 | 0.829109 | 0.252917 |
The table below represent the difference of autosklearn with patching and w/o patching
datasetName | diff accuracy | diff auc | diff balacc | diff logloss |
---|---|---|---|---|
Airlines | 0.012535 | 0.024212 | -0.008545 | -0.02132 |
Albert | 0.024833 | 0.031307 | 0.024833 | -0.0489 |
Covertype | 0.009414 | 0 | 0.006009 | -0.03886 |
Accuracy was improved because the number of trained models was increased. The full list of algorithms, that can be accelerated with intel extension for scikit-learn can be founded here.
datasetName | Airlines | Albert | Covertype |
---|---|---|---|
total number of models w patching | 154 | 180 | 118 |
total number of models w/o patching | 130 | 142 | 110 |
How to reproduce our results
To add the intel extension for scikit-learn to the benchmark, you just need to add 2 lines at the beginning of the autosklearn exec file:
from sklearnex import patch_sklearn
patch_sklearn()
and add scikit-learn-intelex to the requirements.
I also change constraints for a more honest comparison:
test:
folds: 2
max_runtime_seconds: 1800
cores: 72
And remove environment settings from autosklearn exec file.
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
All measurements were done on AWS c5.18xlarge instance (Intel Xeon Platinum with 36 cores)
Some benefits of Intel® Extension for Scikit-learn
- Library uses all capabilities of the hardware, which allows you to get a significant performance boost for the classic machine learning algorithms. Check their patching section and medium articles for more details.
- All optimizations can be easily integrated into scikit-learn application by changing one line of code. Check their get started section for more details.
I also think, that Intel® Extension for Scikit-learn can help to solve these problems: #445, #923, #1153
What do you think?