blog/outliers-detection-in-r/ #63
Replies: 23 comments 22 replies
-
"Comment written by Felix Kluxen on August 17, 2020 09:27:12: Dear Antoine, thank you for this helpful post. Just my two cents: I think it sometimes makes sense to formally distinguish two classes of outliers: extreme values and mistakes. Cheers, Hawkins, D. M., 1980. Identification of outliers. Chapman and Hall, London ; New York." |
Beta Was this translation helpful? Give feedback.
-
Comment written by Antoine Soetewey on August 17, 2020 10:32:36: Dear Felix, Thanks for your comment, the article has been updated accordingly (see first and fourth paragraph of the introduction). Feel free to let me know if there is any inconsistency. Regards, |
Beta Was this translation helpful? Give feedback.
-
Comment written by Felix Kluxen on August 17, 2020 11:30:30: Excellent! The elephant in the room with statistically identified outliers (here values that are probably not mistakes) is obviously that you cannot solve the issue of what researchers should do with the information - as you write. This really depends on the research question, eg subsets, responder/non-responder etc, and usually involves a suprising amount of needed reflection on the researcher's side... or the willingness to think the model assumptions through. If a statistical test result relies on a single influential value this should caution the researcher to make overambitious claims. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Comment written by Antoine Soetewey on August 17, 2020 12:15:18: You're totally right, outliers require thoughtful reflection and caution for many statistical analyses! |
Beta Was this translation helpful? Give feedback.
-
Dear Antoine |
Beta Was this translation helpful? Give feedback.
-
Glad you find it useful! |
Beta Was this translation helpful? Give feedback.
-
Hi Antoine |
Beta Was this translation helpful? Give feedback.
-
Comment written by vijayarajamanickam on December 03, 2020 12:26:17: Dear Antonie, I tried to detect outliers using this script
Most of them are working well, but in some cases it showing Integer(0). Many thanks |
Beta Was this translation helpful? Give feedback.
-
Comment written by Antoine Soetewey on December 03, 2020 18:00:30: Dear, When you have the result: it simply means that there is no outlier according to this method. If you run Hope this helps. Regards, |
Beta Was this translation helpful? Give feedback.
-
If you do not want to simply remove outliers, you can indeed use "Winsorization" which is a technique to replace extreme data values with less extreme values. See for instance the Winsorize() function in R, or this article. Hope this helps. Regards, |
Beta Was this translation helpful? Give feedback.
-
Antoine regards |
Beta Was this translation helpful? Give feedback.
-
Dear Antoine |
Beta Was this translation helpful? Give feedback.
-
Dear Antoine, Thank you very much. This is very helpful indeed. Regards, |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Thanks for this excellent post, Antoine. I just wanted to remind you of the |
Beta Was this translation helpful? Give feedback.
-
Very comprehensive and super helpful! Many thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi there, I have a dataset that is seasonal, ie waves. I’ve fitted regression models with harmonic terms(sine and cosine). However before model fitting do you know if any of these methods would work with seasonal data? I thought the Hampel filter could work? Thanks in’s advance, SL |
Beta Was this translation helpful? Give feedback.
-
Very informative. |
Beta Was this translation helpful? Give feedback.
-
Hello! And the density plot has two peaks. How to explain the application of the Grubbs test in this case? |
Beta Was this translation helpful? Give feedback.
-
Hi. My name is Marlenildo and I just found your blog. I'm excited to read all the posts. To learn more about statistics and blog too. Regarding detecting outliers, consider using the |
Beta Was this translation helpful? Give feedback.
-
@AntoineSoeteway, Another simple way I found to remove outliers using dplyr and the "lares" package on a variable with outliers: var_no_outliers <- if_else(outlier_turkey(var_with_outliers, 1.5) == T, NA, var_with_outliers) Fyi, Have a good week :-) |
Beta Was this translation helpful? Give feedback.
-
You're welcome, @AntoineSoetewey -- I was wondering that myself :-) I'll ask the question on his github. P.S.-- any good websites to point me to (or perhaps something you might add to your chisquare tutorial) on using "permutation" tests for chisquare tests of independence when expected cell counts are low, and you don't want to use Fisher's exact test? I ask because I've recently been using the "compareGroups" package which (as a novice) I've found very helpful for the analysis I'm currently doing. However, the package developer for some reason decided not to have the package use Fisher's exact test and to use a "permutation" test instead but then needs the user to enter two additional parameters Chisq.test.B = integer number of permutations [default = 2000] and Chisq.test.seed = integer [with no default]. Not much on the web about how to do that and wasn't mentioned in your blog on chi square tests of independence. Have a good rest of your week. |
Beta Was this translation helpful? Give feedback.
-
@AntoineSoetewey, |
Beta Was this translation helpful? Give feedback.
-
Outliers detection in R - Stats and R
Learn how to detect outliers in R thanks to descriptive statistics and via the Hampel filter, the Grubbs, the Dixon and the Rosner tests for outliers
https://statsandr.com/blog/outliers-detection-in-r/
Beta Was this translation helpful? Give feedback.
All reactions