Experiment result analysis on the GutenTAG datasets
On this website, we present detailed results of the experiments on our synthetically generated datasets (with GutenTAG). We show errors, qualitative results, and the runtime of the different algorithms.
Result Overview
In this analysis, we just look at the results of all 60 relevant algorithms with their best parameter configuration on the GutenTAG datasets. Those datasets are generated synthetically and were also used to find the best parameter configurations for the algorithms:
- Experiments: 10428
- Algorithms: 60
- Datasets: 187
The number of experiments is smaller than \(\text{# Algos} \times \text{# Datasets}\) because univariate algorithms cannot process multivariate datasets and those combinations are excluded.
The next table shows an excerpt of the result table with 9 of 26 columns. The complete table with the (quality and runtime) results of all algorithms on all datasets can be downloaded here.
algorithm | dataset | status | ROC_AUC | AVERAGE_PRECISION | PR_AUC | RANGE_PR_AUC | execute_main_time | hyper_params | |
---|---|---|---|---|---|---|---|---|---|
0 | ARIMA | cbf-combined-diff-1 | Status.OK | 0.815319 | 0.454742 | 0.465248 | 0.453215 | 71.414111 | {"differencing_degree": 1, "distance_metric": ... |
1 | ARIMA | cbf-combined-diff-3 | Status.OK | 0.955978 | 0.241877 | 0.127965 | 0.136431 | 129.666755 | {"differencing_degree": 1, "distance_metric": ... |
2 | ARIMA | cbf-diff-count-1 | Status.OK | 0.439091 | 0.014368 | 0.008516 | 0.016521 | 72.992341 | {"differencing_degree": 1, "distance_metric": ... |
3 | ARIMA | cbf-diff-count-3 | Status.OK | 0.868527 | 0.129214 | 0.090548 | 0.053913 | 75.303179 | {"differencing_degree": 1, "distance_metric": ... |
4 | ARIMA | cbf-diff-count-4 | Status.OK | 0.626002 | 0.082363 | 0.054644 | 0.034841 | 183.925331 | {"differencing_degree": 1, "distance_metric": ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10423 | k-Means | sinus-type-pattern | Status.OK | 0.999999 | 0.999901 | 0.999900 | 0.577762 | 67.510581 | {"anomaly_window_size": 100, "n_clusters": 50,... |
10424 | k-Means | sinus-type-pattern-shift | Status.OK | 0.999738 | 0.957231 | 0.956725 | 0.544578 | 53.177865 | {"anomaly_window_size": 100, "n_clusters": 50,... |
10425 | k-Means | sinus-type-platform | Status.OK | 0.998038 | 0.738244 | 0.735666 | 0.555714 | 59.745593 | {"anomaly_window_size": 100, "n_clusters": 50,... |
10426 | k-Means | sinus-type-trend | Status.OK | 0.999994 | 0.999410 | 0.999407 | 0.560816 | 48.915376 | {"anomaly_window_size": 100, "n_clusters": 50,... |
10427 | k-Means | sinus-type-variance | Status.OK | 0.999990 | 0.999019 | 0.999014 | 0.579041 | 82.035531 | {"anomaly_window_size": 100, "n_clusters": 50,... |
10428 rows × 9 columns
Error analysis
We first want to look at the ability of the algorithms to process the different datasets. Some of the algorithms are restricted by our time and memory constraints and others produce errors when specific invariants or implementation deficits are encountered.
Algorithm problems grouped by algorithm training type
Unsupervised:
status | Status.ERROR | Status.OK | Status.TIMEOUT | ALL | |
---|---|---|---|---|---|
algo_input_dimensionality | algorithm | ||||
UNIVARIATE | SAND | 26 | 137 | 0 | 163 |
VALMOD | 6 | 157 | 0 | 163 | |
Series2Graph | 3 | 160 | 0 | 163 | |
Left STAMPi | 1 | 162 | 0 | 163 | |
ARIMA | 0 | 163 | 0 | 163 | |
DSPOT | 0 | 160 | 3 | 163 | |
DWT-MLEAD | 0 | 163 | 0 | 163 | |
FFT | 0 | 163 | 0 | 163 | |
GrammarViz | 0 | 163 | 0 | 163 | |
HOT SAX | 0 | 114 | 49 | 163 | |
MedianMethod | 0 | 163 | 0 | 163 | |
NormA | 0 | 153 | 10 | 163 | |
NumentaHTM | 0 | 163 | 0 | 163 | |
PCI | 0 | 163 | 0 | 163 | |
PST | 0 | 163 | 0 | 163 | |
PhaseSpace-SVM | 0 | 163 | 0 | 163 | |
S-H-ESD (Twitter) | 0 | 163 | 0 | 163 | |
SSA | 0 | 163 | 0 | 163 | |
STAMP | 0 | 163 | 0 | 163 | |
STOMP | 0 | 163 | 0 | 163 | |
Spectral Residual (SR) | 0 | 163 | 0 | 163 | |
Subsequence IF | 0 | 163 | 0 | 163 | |
Subsequence LOF | 0 | 163 | 0 | 163 | |
TSBitmap | 0 | 163 | 0 | 163 | |
Triple ES (Holt-Winter's) | 0 | 163 | 0 | 163 | |
MULTIVARIATE | DBStream | 155 | 32 | 0 | 187 |
CBLOF | 0 | 187 | 0 | 187 | |
COF | 0 | 187 | 0 | 187 | |
COPOD | 0 | 187 | 0 | 187 | |
Extended Isolation Forest (EIF) | 0 | 187 | 0 | 187 | |
HBOS | 0 | 187 | 0 | 187 | |
IF-LOF | 0 | 187 | 0 | 187 | |
Isolation Forest (iForest) | 0 | 187 | 0 | 187 | |
KNN | 0 | 187 | 0 | 187 | |
LOF | 0 | 187 | 0 | 187 | |
PCC | 0 | 187 | 0 | 187 | |
Torsk | 0 | 180 | 7 | 187 | |
k-Means | 0 | 187 | 0 | 187 |
Semi-supervised:
status | Status.ERROR | Status.OK | Status.TIMEOUT | ALL | |
---|---|---|---|---|---|
algo_input_dimensionality | algorithm | ||||
UNIVARIATE | TARZAN | 32 | 131 | 0 | 163 |
Bagel | 0 | 163 | 0 | 163 | |
Donut | 0 | 163 | 0 | 163 | |
ImageEmbeddingCAE | 0 | 163 | 0 | 163 | |
OceanWNN | 0 | 163 | 0 | 163 | |
Random Forest Regressor (RR) | 0 | 163 | 0 | 163 | |
SR-CNN | 0 | 163 | 0 | 163 | |
XGBoosting (RR) | 0 | 163 | 0 | 163 | |
MULTIVARIATE | LSTM-AD | 98 | 81 | 8 | 187 |
EncDec-AD | 39 | 17 | 131 | 187 | |
LaserDBN | 23 | 164 | 0 | 187 | |
DeepAnT | 10 | 177 | 0 | 187 | |
OmniAnomaly | 4 | 183 | 0 | 187 | |
HealthESN | 0 | 150 | 37 | 187 | |
Hybrid KNN | 0 | 187 | 0 | 187 | |
Random Black Forest (RR) | 0 | 174 | 13 | 187 | |
RobustPCA | 0 | 187 | 0 | 187 | |
TAnoGan | 0 | 73 | 114 | 187 | |
Telemanom | 0 | 187 | 0 | 187 |
Supervised:
status | Status.ERROR | Status.OK | Status.TIMEOUT | ALL | |
---|---|---|---|---|---|
algo_input_dimensionality | algorithm | ||||
MULTIVARIATE | MultiHMM | 95 | 92 | 0 | 187 |
Normalizing Flows | 9 | 66 | 112 | 187 | |
Hybrid Isolation Forest (HIF) | 0 | 187 | 0 | 187 |
As we can see in the above tables, most algorithms can process almost all of the datasets. In the next subsections, we highlight some outlying algorithms.
Very slow algorithms
Algorithms, for which more than 50% of all executions ran into the timeout:
status | Status.ERROR | Status.OK | Status.TIMEOUT | ALL | ||
---|---|---|---|---|---|---|
algo_training_type | algo_input_dimensionality | algorithm | ||||
SEMI_SUPERVISED | MULTIVARIATE | EncDec-AD | 39 | 17 | 131 | 187 |
TAnoGan | 0 | 73 | 114 | 187 | ||
SUPERVISED | MULTIVARIATE | Normalizing Flows | 9 | 66 | 112 | 187 |
All time series in the GutenTAG collection have the same length (of \( 10000\) points). The algorithms EncDec-AD, TAnoGan, and Normalizing Flows are large deep learning models that take a long time to train and execute. This forces them either into the 2h training time limit or the 2h test time limit.
All unsupervised algorithms besides HOT SAX and NormA are fast enough to finish within our time limit for all datasets.
Broken algorithms
Algorithms, which failed for at least 50% of the executions:
status | Status.ERROR | Status.OK | Status.TIMEOUT | ALL | ||
---|---|---|---|---|---|---|
algo_training_type | algo_input_dimensionality | algorithm | ||||
SEMI_SUPERVISED | MULTIVARIATE | LSTM-AD | 98 | 81 | 8 | 187 |
SUPERVISED | MULTIVARIATE | MultiHMM | 95 | 92 | 0 | 187 |
UNSUPERVISED | MULTIVARIATE | DBStream | 155 | 32 | 0 | 187 |
Errors exist independent of the algorithm’s learning type. Prominent algorithms in this category are LSTM-AD, MultiHMM, and DBStream that failed for more than 50% of their executions. To get a better feeling for the reason of algorithm failures, we distinguish between different errors in the next section.
Categorization of errors
We categorize all observed errors into specific categories and then count the number of executions that had errors belonging to a category. The next table shows which errors were observed how often for which algorithm.
algorithm | ALL (sum) | ARIMA | Bagel | CBLOF | COF | COPOD | DBStream | DSPOT | DWT-MLEAD | DeepAnT | Donut | EncDec-AD | Extended Isolation Forest (EIF) | FFT | GrammarViz | HBOS | HOT SAX | HealthESN | Hybrid Isolation Forest (HIF) | Hybrid KNN | IF-LOF | ImageEmbeddingCAE | Isolation Forest (iForest) | KNN | LOF | LSTM-AD | LaserDBN | Left STAMPi | MedianMethod | MultiHMM | NormA | Normalizing Flows | NumentaHTM | OceanWNN | OmniAnomaly | PCC | PCI | PST | PhaseSpace-SVM | Random Black Forest (RR) | Random Forest Regressor (RR) | RobustPCA | S-H-ESD (Twitter) | SAND | SR-CNN | SSA | STAMP | STOMP | Series2Graph | Spectral Residual (SR) | Subsequence IF | Subsequence LOF | TARZAN | TAnoGan | TSBitmap | Telemanom | Torsk | Triple ES (Holt-Winter's) | VALMOD | XGBoosting (RR) | k-Means |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
error_category | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- OK - | 9443 | 163 | 163 | 187 | 187 | 187 | 32 | 160 | 163 | 177 | 163 | 17 | 187 | 163 | 163 | 187 | 114 | 150 | 187 | 187 | 187 | 163 | 187 | 187 | 187 | 81 | 164 | 162 | 163 | 92 | 153 | 66 | 163 | 163 | 183 | 187 | 163 | 163 | 163 | 174 | 163 | 187 | 163 | 137 | 163 | 163 | 163 | 163 | 160 | 163 | 163 | 163 | 131 | 73 | 163 | 187 | 180 | 163 | 157 | 163 | 187 |
- OOM - | 146 | 39 | 98 | 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- TIMEOUT - | 484 | 3 | 131 | 49 | 37 | 8 | 10 | 112 | 13 | 114 | 7 | ||||||||||||||||||||||||||||||||||||||||||||||||||
Bug | 177 | 98 | 10 | 23 | 25 | 3 | 12 | 6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Incompatible parameters | 55 | 55 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Invariance/assumption not met | 1 | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Max recursion depth exceeded | 20 | 20 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Model loading error | 4 | 4 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Not converged | 95 | 95 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Wrong shape error | 1 | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
other | 2 | 2 |
We can for example see that the high error rate of LSTM-AD is mostly due to hitting the memory limit of 3GB. However, the errors of MultiHMM are due to it’s model not reaching a converged state during training. We assume that some assumptions of the MultiHMM-approach are not met for the datasets with errors.
In general, our GutenTAG dataset are easy to process and well defined, because 91% of all experiments were successful and 6% of the remaining experiments were timeouts or OOMs.
Algorithm quality assessment based on ROC_AUC
The next table shows the min, mean, median, and max ROC_AUC metric score computed over all datasets for each algorithm:
algorithm | LSTM-AD | Subsequence LOF | PhaseSpace-SVM | DWT-MLEAD | SAND | Donut | GrammarViz | Torsk | Left STAMPi | EncDec-AD | STOMP | STAMP | k-Means | Normalizing Flows | Telemanom | Series2Graph | Random Forest Regressor (RR) | VALMOD | XGBoosting (RR) | HealthESN | ImageEmbeddingCAE | Random Black Forest (RR) | ARIMA | PST | NormA | SSA | Subsequence IF | OceanWNN | HOT SAX | DeepAnT | DBStream | PCI | Triple ES (Holt-Winter's) | NumentaHTM | LaserDBN | MedianMethod | FFT | OmniAnomaly | TSBitmap | KNN | Extended Isolation Forest (EIF) | CBLOF | Isolation Forest (iForest) | HBOS | Hybrid Isolation Forest (HIF) | IF-LOF | LOF | Spectral Residual (SR) | S-H-ESD (Twitter) | COF | DSPOT | COPOD | PCC | Bagel | RobustPCA | SR-CNN | TAnoGan | MultiHMM | TARZAN | Hybrid KNN |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | 0.123730 | 0.341819 | 0.307866 | 0.125859 | 0.167172 | 0.151962 | 0.207808 | 0.077172 | 0.126879 | 0.344264 | 0.009910 | 0.009910 | 0.000000 | 0.004679 | 0.092208 | 0.069038 | 0.405782 | 0.055046 | 0.373735 | 0.107862 | 0.106465 | 0.130278 | 0.050505 | 0.016049 | 0.013301 | 0.114735 | 0.000020 | 0.156114 | 0.147374 | 0.000095 | 0.102123 | 0.022453 | 0.239234 | 0.377848 | 0.119141 | 0.004040 | 0.014141 | 0.077511 | 0.132703 | 0.000000 | 0.000000 | 0.039293 | 0.000051 | 0.144394 | 0.054343 | 0.000101 | 0.164697 | 0.002450 | 0.473684 | 0.000000 | 0.273283 | 0.000051 | 0.055375 | 0.058306 | 0.000000 | 0.500000 | 0.000960 | 0.047605 | 0.000571 | 0.000003 |
mean | 0.965738 | 0.941804 | 0.920328 | 0.907602 | 0.898257 | 0.894965 | 0.894852 | 0.885825 | 0.880459 | 0.877664 | 0.874267 | 0.874142 | 0.872913 | 0.869716 | 0.863892 | 0.861379 | 0.860457 | 0.858050 | 0.856619 | 0.853132 | 0.851142 | 0.818027 | 0.816814 | 0.803791 | 0.786847 | 0.771233 | 0.765155 | 0.734238 | 0.731207 | 0.726896 | 0.719925 | 0.696556 | 0.673339 | 0.670671 | 0.655523 | 0.648901 | 0.644080 | 0.644023 | 0.637278 | 0.614195 | 0.609879 | 0.606390 | 0.603377 | 0.599450 | 0.599160 | 0.587309 | 0.577457 | 0.568847 | 0.559200 | 0.555700 | 0.554605 | 0.543097 | 0.532033 | 0.525684 | 0.514437 | 0.502331 | 0.481889 | 0.478073 | 0.474698 | 0.449687 |
median | 0.996443 | 0.995904 | 0.980000 | 0.972041 | 0.984132 | 0.973340 | 0.991579 | 0.979313 | 0.981922 | 0.999900 | 0.988399 | 0.988399 | 0.997220 | 0.994933 | 0.977484 | 0.942775 | 0.883773 | 0.971650 | 0.886839 | 0.915416 | 0.944112 | 0.843654 | 0.895639 | 0.871631 | 0.954595 | 0.845423 | 0.841325 | 0.752219 | 0.760240 | 0.853177 | 0.783729 | 0.662587 | 0.668647 | 0.645183 | 0.659910 | 0.567188 | 0.593000 | 0.658707 | 0.624381 | 0.623641 | 0.594593 | 0.558942 | 0.589781 | 0.585596 | 0.584366 | 0.559124 | 0.534933 | 0.544846 | 0.500000 | 0.521308 | 0.501351 | 0.526410 | 0.508369 | 0.550798 | 0.500000 | 0.500000 | 0.481301 | 0.488418 | 0.486515 | 0.444118 |
max | 1.000000 | 1.000000 | 0.999928 | 0.999992 | 1.000000 | 1.000000 | 1.000000 | 0.999990 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.998586 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.999800 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.999600 | 0.999650 | 1.000000 | 1.000000 | 0.998544 | 0.998600 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.934293 | 1.000000 | 0.880000 | 0.999596 | 1.000000 | 0.999784 | 1.000000 |
The following boxplots give a more visual picture of the score distributions. The algorithms are ordered by their mean ROC_AUC score (not included in the visualization) and the first and last 10 algorithms are shown by default. Use the legend on the right to display additional algorithms.
Best algorithms (based on mean ROC_AUC)
min | mean | median | max | |
---|---|---|---|---|
algorithm | ||||
LSTM-AD | 0.123730 | 0.965738 | 0.996443 | 1.000000 |
Subsequence LOF | 0.341819 | 0.941804 | 0.995904 | 1.000000 |
PhaseSpace-SVM | 0.307866 | 0.920328 | 0.980000 | 0.999928 |
DWT-MLEAD | 0.125859 | 0.907602 | 0.972041 | 0.999992 |
SAND | 0.167172 | 0.898257 | 0.984132 | 1.000000 |
Worst algorithms (based on mean ROC_AUC)
min | mean | median | max | |
---|---|---|---|---|
algorithm | ||||
SR-CNN | 0.500000 | 0.502331 | 0.500000 | 0.880000 |
TAnoGan | 0.000960 | 0.481889 | 0.481301 | 0.999596 |
MultiHMM | 0.047605 | 0.478073 | 0.488418 | 1.000000 |
TARZAN | 0.000571 | 0.474698 | 0.486515 | 0.999784 |
Hybrid KNN | 0.000003 | 0.449687 | 0.444118 | 1.000000 |
Scores of best algorithms
In the next figure, we show the scorings of the 4 best algorithms on the dataset “sinus-diff-count-2”:
Runtime-weighted AUC_ROC scores
In the next figure, we try to combine the runtime and result quality of the algorithms into one metric by weighting the ROC_AUC score by the inverse scaled overall runtime. Algorithms that take exceptionally long to process the datasets are punished and have a smaller weighted ROC_AUC score. Algorithms that are very fast keep their original ROC_AUC score.
Algorithm runtime assessment
This section looks at the runtime of the algorithms. We distinguish between training and execution runtime in our paper. The following figures just look at the combined (overall) runtime.
The next table shows the min, mean, median, and max overall runtime aggregated over all GutenTAG datasets for each algorithm.
Keep in mind that all GutenTAG datasets have the same length of \(10000\) points and most datasets just contain a single channel. Only 25 datasets are multivariate.
algorithm | DBStream | MedianMethod | TSBitmap | Spectral Residual (SR) | FFT | PCI | Extended Isolation Forest (EIF) | DWT-MLEAD | PCC | KNN | LOF | COPOD | NormA | IF-LOF | TARZAN | HBOS | LaserDBN | Isolation Forest (iForest) | STOMP | Subsequence IF | S-H-ESD (Twitter) | CBLOF | Subsequence LOF | SSA | GrammarViz | Series2Graph | PST | MultiHMM | RobustPCA | XGBoosting (RR) | STAMP | COF | Left STAMPi | VALMOD | PhaseSpace-SVM | SAND | NumentaHTM | k-Means | OceanWNN | Donut | Hybrid Isolation Forest (HIF) | EncDec-AD | ImageEmbeddingCAE | Normalizing Flows | HOT SAX | Torsk | ARIMA | DSPOT | Random Black Forest (RR) | SR-CNN | Telemanom | Hybrid KNN | Random Forest Regressor (RR) | Triple ES (Holt-Winter's) | Bagel | DeepAnT | HealthESN | LSTM-AD | TAnoGan | OmniAnomaly |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | 0.000000 | 2.167620 | 2.092893 | 2.978843 | 2.768183 | 4.014369 | 5.236932 | 5.203950 | 5.205858 | 5.194909 | 5.189140 | 6.203962 | 0.000000 | 6.299420 | 0.000000 | 7.425389 | 0.000000 | 8.048582 | 10.053667 | 9.187731 | 10.025252 | 8.141591 | 6.378969 | 9.810762 | 2.366554 | 0.000000 | 7.068686 | 0.000000 | 11.206035 | 29.423896 | 2.546820 | 27.155001 | 0.000000 | 0.000000 | 24.383494 | 0.000000 | 72.370925 | 5.902524 | 147.090236 | 238.261296 | 298.579914 | 0.000000 | 19.853276 | 0.000000 | 0.000000 | 0.000000 | 71.414111 | 0.000000 | 0.000000 | 734.100854 | 214.989800 | 332.475525 | 1081.153213 | 1662.362775 | 1730.130522 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
mean | 1.485884 | 3.646354 | 4.204380 | 4.448068 | 4.683874 | 6.203468 | 6.599313 | 6.958002 | 7.298360 | 7.352563 | 7.476626 | 8.305900 | 8.640119 | 8.946259 | 9.336935 | 9.983120 | 10.513870 | 11.019563 | 11.900743 | 13.441904 | 13.829732 | 14.455307 | 18.856173 | 18.941656 | 22.910533 | 23.010118 | 24.472725 | 33.024754 | 35.213852 | 36.827480 | 38.341096 | 39.020412 | 53.121713 | 53.299238 | 82.067981 | 85.901954 | 91.581986 | 98.254350 | 250.583056 | 339.671690 | 485.455303 | 504.099002 | 654.139068 | 680.576232 | 927.950220 | 1244.610911 | 1253.151379 | 1295.773654 | 1477.619365 | 1478.505805 | 1836.957278 | 1929.474488 | 2118.757325 | 2487.212241 | 2771.290942 | 3128.522926 | 3180.770797 | 3269.571687 | 3839.777410 | 7113.762881 |
median | 0.000000 | 2.796953 | 4.882090 | 3.889712 | 3.599423 | 5.278512 | 6.065740 | 6.412318 | 6.820794 | 6.541939 | 6.669530 | 7.604242 | 6.737077 | 8.490258 | 10.090874 | 9.324363 | 10.716289 | 10.572865 | 11.528972 | 13.585095 | 13.678194 | 9.720897 | 21.301228 | 19.145623 | 21.936855 | 21.989442 | 29.936323 | 0.000000 | 14.051364 | 35.782439 | 27.600829 | 39.622869 | 53.934283 | 59.732887 | 61.484677 | 47.611511 | 90.902945 | 74.116743 | 219.784961 | 355.775571 | 483.189570 | 0.000000 | 574.251242 | 0.000000 | 597.805815 | 1103.504400 | 698.592820 | 125.862472 | 1424.658453 | 1580.434546 | 1572.787677 | 1480.946807 | 1934.163334 | 2361.207052 | 2378.648069 | 2938.454040 | 3259.633157 | 0.000000 | 0.000000 | 7276.879391 |
max | 13.955889 | 7.528286 | 7.265769 | 7.908754 | 21.075458 | 9.260761 | 10.825573 | 11.201316 | 12.648174 | 32.120757 | 12.498324 | 14.220524 | 34.183225 | 18.329226 | 19.855369 | 15.828537 | 19.874952 | 14.398077 | 16.301006 | 22.905463 | 22.759653 | 105.374497 | 74.296122 | 41.621639 | 79.390667 | 53.055090 | 40.106281 | 392.575913 | 519.506858 | 43.991729 | 377.011789 | 65.967019 | 60.658616 | 101.013242 | 310.056020 | 595.575047 | 135.339259 | 605.901906 | 618.353864 | 459.234046 | 725.922349 | 7732.446555 | 1803.073701 | 7228.146324 | 4702.380252 | 6335.579315 | 6603.765796 | 6931.930536 | 7014.062132 | 3366.463148 | 7286.619047 | 7235.540268 | 3809.431023 | 5115.088311 | 9596.165659 | 7397.980260 | 7226.635755 | 8320.575837 | 13536.386278 | 7304.766755 |
The following boxplots give a more visual picture of the runtime distributions. The algorithms are ordered by their mean overall runtime and the first and last 10 algorithms are shown by default. Use the legend on the right to display additional algorithms.
In the next figure, we show the algorithm mean runtime in relation to the achieved mean ROC_AUC score. We distinguish between the different learning types because the runtime of an algorithm depends on its learning procedure.
Attention
The following figure does not accommodate for OOM or TIMEOUT errors. This is especially visible for Normalizing Flows (supervised), which ran into the time limit for most of the datasets but has a relatively small runtime in the figure below! For algorithms with many errors (cf. Section Error analysis), the aggregated runtimes and metric scores are not meaningful.
Detailed analysis of certain algorithm or dataset aspects
Best algorithms for base oscillations
Find dataset names:
Sine
ECG
Random Walk
CBF
Poly
Best algorithms for anomaly type
Extremum
Frequency
Mean Shift
Pattern
Pattern Shift
Platform
Variance
Amplitude
Trend
Most fluctuating algorithms based on anomaly type
Best algorithms for single/multiple-same/multiple-different anomalies
Single anomaly datasets
Multiple same anomalies datasets
Multiple different anomaly datasets
Best algorithm of algorithm family
algorithm | ROC_AUC | |
---|---|---|
algo_family | ||
trees | PST | 0.803791 |
reconstruction | Donut | 0.894965 |
forecasting | LSTM-AD | 0.965738 |
encoding | GrammarViz | 0.894852 |
distribution | DWT-MLEAD | 0.907602 |
distance | Subsequence LOF | 0.941804 |