Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

# Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

## What is this?

[Intel® Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex) provides drop-in replacement patching functionality for a seamless way to speed up Scikit-learn application. 

## Our results

I used [automlbenchmarks](https://github.com/openml/automlbenchmark) on large datasets to compare accuracy of autosklearn with patching and without.

| datasetName | library                  | acc      | auc      | balacc   | logloss  |
|-------------|--------------------------|----------|----------|----------|----------|
| Airlines    | autosklearn w patching   | 0.667087 | 0.720931 | 0.654484 | 0.664864 |
| Albert      | autosklearn w patching   | 0.677265 | 0.738089 | 0.677265 | 0.642248 |
| Covertype   | autosklearn w patching   | 0.918092 |          | 0.835118 | 0.214061 |
| Airlines    | autosklearn w/o pathcing | 0.654552 | 0.696719 | 0.663029 | 0.686289 |
| Albert      | autosklearn w/o patching | 0.652432 | 0.706782 | 0.652432 | 0.691148 |
| Covertype   | autosklearn w/o patching | 0.908678 |          | 0.829109 | 0.252917 |

The table below represent the difference of autosklearn with patching and w/o patching

| datasetName | diff accuracy | diff auc | diff balacc | diff logloss |
|-------------|---------------|----------|-------------|--------------|
| Airlines    | 0.012535      | 0.024212 | -0.008545   | -0.02132     |
| Albert      | 0.024833      | 0.031307 | 0.024833    | -0.0489      |
| Covertype   | 0.009414      | 0        | 0.006009    | -0.03886     |

Accuracy was improved because the number of trained models was increased. The full list of algorithms, that can be accelerated with intel extension for scikit-learn can be founded [here](https://intel.github.io/scikit-learn-intelex/algorithms.html).

| datasetName                         | Airlines | Albert | Covertype |
|-------------------------------------|----------|--------|-----------|
| total number of models w patching   | 154      | 180    | 118       |
| total number of models w/o patching | 130      | 142    | 110       |

## How to reproduce our results

To add the intel extension for scikit-learn to the benchmark, you just need to add 2 lines at the beginning of the [autosklearn exec](https://github.com/openml/automlbenchmark/blob/master/frameworks/autosklearn/exec.py) file:

```python
from sklearnex import patch_sklearn
patch_sklearn()
```

and add scikit-learn-intelex to the [requirements](https://github.com/openml/automlbenchmark/blob/master/frameworks/autosklearn/requirements.txt).

I also change constraints for a more honest comparison:

```
test:
  folds: 2
  max_runtime_seconds: 1800
  cores: 72
```

And remove environment settings from [autosklearn exec](https://github.com/openml/automlbenchmark/blob/master/frameworks/autosklearn/exec.py) file.

```python
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
```

All measurements were done on AWS c5.18xlarge instance (Intel Xeon Platinum with 36 cores)

## Some benefits of Intel® Extension for Scikit-learn

- Library uses all capabilities of the hardware, which allows you to get a significant performance boost for the classic machine learning algorithms. Check their [patching section](https://github.com/oneapi-src/oneDAL#scikit-learn-patching) and [medium articles](https://github.com/intel/scikit-learn-intelex#-follow-us-on-medium) for more details.

![](https://raw.githubusercontent.com/intel/scikit-learn-intelex/master/doc/sources/_static/scikit-learn-acceleration-2021.2.3.PNG)
- All optimizations can be easily integrated into scikit-learn application by changing one line of code. Check their [get started section](https://github.com/intel/scikit-learn-intelex#%EF%B8%8F-get-started) for more details. 

I also think, that Intel® Extension for Scikit-learn can help to solve these problems: https://github.com/automl/auto-sklearn/issues/445, https://github.com/automl/auto-sklearn/issues/923, https://github.com/automl/auto-sklearn/issues/1153

What do you think?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn #1171

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

What is this?

Our results

How to reproduce our results

Some benefits of Intel® Extension for Scikit-learn

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

datasetName	library	acc	auc	balacc	logloss
Airlines	autosklearn w patching	0.667087	0.720931	0.654484	0.664864
Albert	autosklearn w patching	0.677265	0.738089	0.677265	0.642248
Covertype	autosklearn w patching	0.918092		0.835118	0.214061
Airlines	autosklearn w/o pathcing	0.654552	0.696719	0.663029	0.686289
Albert	autosklearn w/o patching	0.652432	0.706782	0.652432	0.691148
Covertype	autosklearn w/o patching	0.908678		0.829109	0.252917

datasetName	diff accuracy	diff auc	diff balacc	diff logloss
Airlines	0.012535	0.024212	-0.008545	-0.02132
Albert	0.024833	0.031307	0.024833	-0.0489
Covertype	0.009414	0	0.006009	-0.03886

datasetName	Airlines	Albert	Covertype
total number of models w patching	154	180	118
total number of models w/o patching	130	142	110

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn #1171

Description

Accuracy of autosklearn can be improved with Intel® Extension for Scikit-learn

What is this?

Our results

How to reproduce our results

Some benefits of Intel® Extension for Scikit-learn

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions