Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_predicted(): zero-inflation options #413

Closed
bwiernik opened this issue Aug 3, 2021 · 10 comments
Closed

get_predicted(): zero-inflation options #413

bwiernik opened this issue Aug 3, 2021 · 10 comments
Labels
Enhancement 💥 Implemented features can be improved or revised get_predicted Function specific issues

Comments

@bwiernik
Copy link
Contributor

bwiernik commented Aug 3, 2021

Currently, for zero-inflated models, get_predicted() and its downstream functions like modelbased::estimate_expectation() always return the equivalent to type = "conditional" (predicted values assuming non-zero). It would be good to allow users to specify other methods, such as predicting unconditional response predictions (incorporating both parts of the model) or just the zero-inflation parts.

It would also be good I think to make the default equivalent to type = "response" (incorporating both model parts).

See the type argument in predict.glmmTMB():

type
Denoting mu as the mean of the conditional distribution and p as the zero-inflation probability, the possible choices are:

"link"
conditional mean on the scale of the link function, or equivalently the linear predictor of the conditional model
"response"
expected value; this is mu*(1-p) for zero-inflated models and mu otherwise
"conditional"
mean of the conditional response; mu for all models (i.e., synonymous with "response" in the absence of zero-inflation
"zprob"
the probability of a structural zero (gives an error for non-zero-inflated models)
"zlink"
predicted zero-inflation probability on the scale of the logit link functio>
"disp"
dispersion parameter however it is defined for that particular family as described in sigma.glmmTMB

@DominiqueMakowski
Copy link
Member

do you have a reproducible example with a model that has these components so I can play around?

@bwiernik
Copy link
Contributor Author

bwiernik commented Aug 4, 2021

Away from computer. The second example in ?glmmTMB

@DominiqueMakowski
Copy link
Member

library(glmmTMB)

m <- glmmTMB(count ~ spp + mined + (1|site),
             zi=~spp + mined,
             family=nbinom2, data=Salamanders)


head(insight::get_predicted(m))
#> [1] 0.5387752 1.0768783 0.3554236 2.4701755 2.4950498 2.1819828
head(insight::get_predicted(m, type = "zprob"))
#> [1] 2.040119 2.040119 2.040119 1.174339 1.174339 1.174339

Created on 2021-08-04 by the reprex package (v2.0.0)

custom types should work after the latest commit.

So now it becomes a question or wether we want to change / add to the behaviour of our main predict argument (this is also what would drive easystats/modelbased#136). We could have predict = "dispersion" or something like that, but then again I'm not very familiar with these models so I don't know

@strengejacke strengejacke added the Enhancement 💥 Implemented features can be improved or revised label Aug 4, 2021
@bwiernik
Copy link
Contributor Author

bwiernik commented Aug 4, 2021

Yes, I think the predict argument should have options for zero inflated and dispersion parameters

@bwiernik
Copy link
Contributor Author

bwiernik commented Aug 4, 2021

I think we should revert the type argument. Our predict argument fills the same role, and it's confusing to have two.

Instead, I think we add options to predict to include these:

Existing

  • "link"
    • linear predictor on the link scale (the conditional part for zero-inflated models)
    • with confidence intervals (uncertainty intervals on linear prediction)
    • same as glmmTMB's type = "link"

Existing labels, need adjusted behavior

  • "expectation"
    • expected value (mean) on the response scale, including both the zero-inflated and conditional parts
    • with confidence intervals (uncertainty intervals on the conditional mean)
    • currently, it ignores the zero-inflated part
    • should be mu*(1-p) for zero-inflated models and mu otherwise
    • This would be equivalent to glmmTMB's type = "response"
  • "prediction"
    • expected value (mean) on the response scale, including both the zero-inflated and conditional parts
    • with predction intervals (uncertainty intervals on the individual cases)
  • "response"
    • "prediction", but classifying probabilities in binomial models to 0-1
    • uncertainty intervals would generally become [0, 1] for bernoulli models, but could be a range of integers for binomial models with multiple trials

New labels

  • "conditional"
    • expected value (mean) on the response scale, conditional on non-structural zero
    • with confidence intervals (uncertainty intervals on the conditional mean)
    • what is currently returned by predict = "expectation"
  • "zprob"
    • expected probability of a structural zero
    • with confidence intervals (uncertainty intervals on the expected probability)
    • effectively the "expectation" for the zero-inflation part of the model
    • should give an error or message for non-zero-inflated models
  • "zlink"
    • linear predictor for a structural zero on the link scale
    • with confidence intervals (uncertainty intervals on the linear prediction)
    • effectively the "expectation" for the zero-inflation part of the model
    • should give an error or message for non-zero-inflated models
  • "dispersion"
    • expected value for the dispersion parameter for the model (e.g., conditional SD for a normal linear model)
    • with confidence intervals (uncertainty intervals on the conditional dispersion)
    • For gaussian models with type = "disp", glmmTMB returns estimates/CI on the sigma (SD) scale; I don't see a need to offer other scales (variance or log-variance [this is what is actually modeled])

For predict = "prediction", we should include the dispersion parameter in the prediction intervals. @DominiqueMakowski If you could add a placeholder for that, I can fill in the necessary extractions from glmmTMB objects. How do we currently handle dispersion for things like Poisson models where there is a variance term in the model?

@DominiqueMakowski
Copy link
Member

The needed steps to add/edit I believe are:

  • extending the list of possible arguments here:

predict = c("expectation", "link", "prediction", "response", "relation"),

  • editing the logic here for the type:

insight/R/get_predicted.R

Lines 645 to 652 in 1540c06

# Type (that's for the initial call to stats::predict)
if (!is.null(type) && all(type == "auto")) {
if (info$is_linear) {
type <- "response"
} else {
type <- "link"
}
}

  • and here for the CI

insight/R/get_predicted.R

Lines 633 to 643 in 1540c06

# Prediction and CI type
if (predict == "link") {
ci_type <- "confidence"
scale <- "link"
} else if (predict == "expectation") {
ci_type <- "confidence"
scale <- "response"
} else if (predict %in% c("prediction", "response")) {
ci_type <- "prediction"
scale <- "response"
}

(essentially since we have one "master" argument predict, it is then passed to the .get_predicted_args() helper that assigns the traditional arguments)

@strengejacke
Copy link
Member

@DominiqueMakowski can insight be submitted, or is there anything that needs to be addressed for modelbased?

@DominiqueMakowski
Copy link
Member

I wouldn't say that this issue of adding more options for glmmTMB is urgent so probably not a blocker for a CRAN update

@strengejacke
Copy link
Member

I think this will be closed in #501

@strengejacke
Copy link
Member

should be resolved in #501 to #503

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement 💥 Implemented features can be improved or revised get_predicted Function specific issues
Projects
None yet
Development

No branches or pull requests

3 participants