Case studies
On this page, we present some preliminary results in form of three short case studies:
No anomalies
In this case study, we want to see how the algorithms behave on datasets that do not contain any known anomalies. For this, we included the following datasets (all without known/labelled anomalies) as examples in our experiments:
collection | dataset | input dimensionality | learning type | |
---|---|---|---|---|
232 | KDD-TSAD | 079_UCR_Anomaly_DISTORTEDresperation2 | univariate | semi-supervised |
261 | KDD-TSAD | 108_UCR_Anomaly_NOISEresperation2 | univariate | semi-supervised |
340 | KDD-TSAD | 187_UCR_Anomaly_resperation2 | univariate | semi-supervised |
498 | NAB | art_daily_no_noise | univariate | unsupervised |
500 | NAB | art_daily_perfect_square_wave | univariate | unsupervised |
501 | NAB | art_daily_small_noise | univariate | unsupervised |
502 | NAB | art_flatline | univariate | unsupervised |
505 | NAB | art_noisy | univariate | unsupervised |
513 | NAB | ec2_cpu_utilization_c6585a | univariate | unsupervised |
Note that we do not have any supervised datasets without known anomalies in this list. Therefore, we cannot inspect supervised algorithms and exclude them from further analysis. In addition, all of the above datasets are univariate. Since multivariate algorithms are also executed on univariate datasets, this does not limit the number of considered algorithms further.
The following table lists the number of successfully processed, erroneous, or timed-out datasets for each of the remaining 57 algorithms:
status | Status.ERROR | Status.OK | Status.TIMEOUT | |
---|---|---|---|---|
algo_training_type | algorithm | |||
SEMI_SUPERVISED | Bagel | 0 | 0 | 3 |
DeepAnT | 3 | 0 | 0 | |
EncDec-AD | 3 | 0 | 0 | |
HealthESN | 0 | 0 | 3 | |
Hybrid KNN | 3 | 0 | 0 | |
ImageEmbeddingCAE | 3 | 0 | 0 | |
LSTM-AD | 1 | 0 | 2 | |
SR-CNN | 0 | 0 | 3 | |
TARZAN | 3 | 0 | 0 | |
TAnoGan | 0 | 0 | 3 | |
OmniAnomaly | 2 | 1 | 0 | |
Random Forest Regressor (RR) | 0 | 1 | 2 | |
Donut | 0 | 2 | 1 | |
LaserDBN | 0 | 3 | 0 | |
OceanWNN | 0 | 3 | 0 | |
Random Black Forest (RR) | 0 | 3 | 0 | |
RobustPCA | 0 | 3 | 0 | |
Telemanom | 0 | 3 | 0 | |
XGBoosting (RR) | 0 | 3 | 0 | |
UNSUPERVISED | HOT SAX | 9 | 0 | 0 |
Left STAMPi | 9 | 0 | 0 | |
NormA | 9 | 0 | 0 | |
SAND | 9 | 0 | 0 | |
k-Means | 9 | 0 | 0 | |
DBStream | 8 | 1 | 0 | |
VALMOD | 6 | 2 | 1 | |
S-H-ESD (Twitter) | 6 | 3 | 0 | |
Triple ES (Holt-Winter's) | 3 | 5 | 1 | |
COF | 3 | 6 | 0 | |
PST | 3 | 6 | 0 | |
PhaseSpace-SVM | 0 | 6 | 3 | |
Series2Graph | 3 | 6 | 0 | |
ARIMA | 0 | 7 | 2 | |
CBLOF | 2 | 7 | 0 | |
STAMP | 0 | 7 | 2 | |
Subsequence LOF | 0 | 7 | 2 | |
IF-LOF | 1 | 8 | 0 | |
NumentaHTM | 1 | 8 | 0 | |
Torsk | 0 | 8 | 1 | |
COPOD | 0 | 9 | 0 | |
DSPOT | 0 | 9 | 0 | |
DWT-MLEAD | 0 | 9 | 0 | |
Extended Isolation Forest (EIF) | 0 | 9 | 0 | |
FFT | 0 | 9 | 0 | |
GrammarViz | 0 | 9 | 0 | |
HBOS | 0 | 9 | 0 | |
Isolation Forest (iForest) | 0 | 9 | 0 | |
KNN | 0 | 9 | 0 | |
LOF | 0 | 9 | 0 | |
MedianMethod | 0 | 9 | 0 | |
PCC | 0 | 9 | 0 | |
PCI | 0 | 9 | 0 | |
SSA | 0 | 9 | 0 | |
STOMP | 0 | 9 | 0 | |
Spectral Residual (SR) | 0 | 9 | 0 | |
Subsequence IF | 0 | 9 | 0 | |
TSBitmap | 0 | 9 | 0 |
As we can see in the table, the algorithms Bagel, DeepAnT, EncDec-AD, HealthESN, Hybrid KNN, ImageEmbeddingCAE, LSTM-AD, SR-CNN, TARZAN, TAnoGan, HOT SAX, Left STAMPi, NormA, SAND, and k-Means already fail to process the datasets entirely. Our preliminary finding is that these algorithms make assumptions about the dataset that are not met if there are no anomalies present. In contrast, the algorithms LaserDBN, OceanWNN, Random Black Forest (RR), RobustPCA, Telemanom, XGBoosting (RR), COPOD, DSPOT, DWT-MLEAD, Extended Isolation Forest (EIF), FFT, GrammarViz, HBOS, Isolation Forest (iForest), KNN, LOF, MedianMethod, PCC, PCI, SSA, STOMP, Spectral Residual (SR), Subsequence IF, and TSBitmap are able to process the datasets successfully. All other algorithms can process just some of the datasets and fail for others. The reason for those failures are a task for further analysis.
Our evaluation metrics are not defined for time series with all-zeros-labels (no single anomaly annotated), which means that we cannot present those metrics here. Alternatively, we show an example dataset and the scorings of selected algorithms. You can use the legend on the right side to enable/disable the display of individual algorithm scorings.