…Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive!
…Usually such behavior is not proficient to obtain good results, but this time I think that the change of prospective has been positive!
Chebyshev Theorem
In many real scenarios (under certain conditions) the Chebyshev Theorem provides a powerful algorithm to detect outliers.
The method is really easy to implement and it is based on the distance of Zeta-score values from k standard deviation.
…Surfing on internet you can find several explanations and theoretical explanation of this pillar of the Descriptive Statistic, so I don’t want increase the Universe Entropy explaining once again something already available and better explained everywhere 🙂
Approach based on Mutual Information
Before I explain my approach I have to say that I have not had time to check in literature if this method has been already implemented (please drop a comment if someone finds out a reference! … I don’t want take improperly credits).
The aim of the method is to remove iteratively the sorted Z-Scores till the mutual information between the Z-Scores and the candidates outlier I(Z|outlier) increases.
At each step the candidate outlier is the Z-score having the highest absolute value.
Basically, respect the Chebyschev method, there is no pre-fixed threshold.
Experiments
I compared the two methods through canonical distribution, and at a glance it seems that results are quite good.
Test on Normal Distribution |
As you can see in the above experiment the Mutual information criteria seems more performant in the outlier detection.
Test on Normal Distribution having higher variance |
The following experiments have been done with Gamma Distribution and Negative Exponential
Results on Gamma seem comparable. |
Experiment done using Negative Exponential distribution |
…In the next days I’m going to test the procedure on data having multimodal distribution.
Stay Tuned
Cristian