Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stat_density_2d error messages uninformative #6374

Open
oracle5th opened this issue Mar 21, 2025 · 1 comment · May be fixed by #6375
Open

stat_density_2d error messages uninformative #6374

oracle5th opened this issue Mar 21, 2025 · 1 comment · May be fixed by #6375
Labels
messages requests for improvements to error, warning, or feedback messages

Comments

@oracle5th
Copy link

I found an error thrown by stat_density_2d not very informative. It computes an illegal bandwidth for me behind the scene, which causes an internal error that is not explained in the messages. Specifying the bandwidth explicitly can fix the problem. However, I expect stat_density_2d can either handle these edge cases or point out that the default value given the data is illegal and manual input is required.

The following example works fine:

library(ggplot2)
df <- data.frame(x=sample(0:10, 100, replace=T), y=rep(0:10, 100, replace=T))
ggplot(df) + stat_density_2d(geom='density_2d', mapping=aes(x,y))

but the next one will throw an error:

df <- data.frame(x=sample(0:10, 100, replace=T), y=c(rep(5, 80), sample(0:10, 20, replace=T)))
ggplot(df) + stat_density_2d(geom='density_2d', mapping=aes(x,y))

Error in stat_density_2d():
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in seq_len():
! argument must be coercible to non-negative integer

The error messages is quite confusing. By digging into the warnings, I found the root cause of the problem:

1: Computation failed in stat_density2d()
Caused by error in MASS::kde2d():
! bandwidths must be strictly positive

In stat_density_2d, h is automatically computed before calling kde2d, if not given

if (is.null(h)) {
  h <- c(MASS::bandwidth.nrd(data$x), MASS::bandwidth.nrd(data$y))
  h <- h * adjust
}

# calculate density
dens <- MASS::kde2d(
  data$x, data$y, h = h, n = n,
  lims = c(scales$x$dimension(), scales$y$dimension())
)

and bandwidth.nrd uses the following formula by default

function(x)
{
    r <- quantile(x, c(0.25, 0.75))
    h <- (r[2] - r[1])/1.34
    4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)
}

So if one of data$x and data$y has more than 75% of identical values, defualt bandwidth will become 0 without warning, and it will immediately be considered as illegal by kde2d.

@teunbrand
Copy link
Collaborator

Thanks for the report! I can reproduce the issue on my end. I agree that the message should be more informative and perhaps suggest that the user provide the h argument manually.

@teunbrand teunbrand added the messages requests for improvements to error, warning, or feedback messages label Mar 21, 2025
@teunbrand teunbrand linked a pull request Mar 21, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
messages requests for improvements to error, warning, or feedback messages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants