Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probabilistic location of points of change for Bayesian models #37

Open
DominiqueMakowski opened this issue Nov 18, 2019 · 6 comments
Open
Assignees
Labels
Feature idea 🔥 New feature or request

Comments

@DominiqueMakowski
Copy link
Member

By locating the points of change (using find_inversions) on all the posterior draws of the link we could have a uni/multi-modal distribution of points of change.

@DominiqueMakowski DominiqueMakowski self-assigned this Nov 18, 2019
@DominiqueMakowski DominiqueMakowski added the Feature idea 🔥 New feature or request label May 25, 2021
@DominiqueMakowski
Copy link
Member Author

Let me clarify what you said, past-Dom:

The describe_nonlinear() function breaks a nonlinear curve by locating points of direction change.

data <- modelbased::estimate_relation(lm(Sepal.Width ~ poly(Petal.Length, 3), data = iris))
modelbased::describe_nonlinear(data, x = "Petal.Length")
#> Start |  End | Length | Change | Slope |   R2
#> ---------------------------------------------
#> 1.00  | 3.62 |   0.36 |  -1.03 | -0.39 | 0.09
#> 3.62  | 6.90 |   0.54 |   0.51 |  0.16 | 0.09

Created on 2021-05-25 by the reprex package (v1.0.0)

However, in a Bayesian / bootsrapped context, we have many iterations of that "curve". So we could, in theory, get the location of a given inversion across all draws, and thus have a distribution of these locations. And conclude something like; "the relationship between x and y goes from negative to positive at around 0.33 (95% CI [0.21, 0.42])".

This comes with some critical issues:

  • How to be sure that a given "point" of inversion is the same one at various locations
  • How to deal with a different pattern / number of inversions, how to summarize them
  • I'm not sure this is would be even a valid approach, especially since, since the creation of this issue, some packages have emerged looking at "points of change" I think.

I'm not sure it's an issue worth looking further into, especially since describe_nonlinear should not be used as an inferential procedure, but rather as a purely descriptive and exploratory insight into a pattern.

@mattansb
Copy link
Member

Is the idea that this would work with smooths as well? If so, I suggest having a look here: https://gavinsimpson.github.io/gratia/reference/derivatives.html

Also, I think @lindeloev might know a thing or two about change points... (:

@lindeloev
Copy link

lindeloev commented May 25, 2021

I think this is a pretty useful idea and something I could see myself using regularly! Random thoughts:

  • Terminology: Describing the x where dy/dx = 0 is quite distinct from change point models where the change point is an extra model parameter and often marks a point of discontinuoity. So it could be confusing to call it "points of change". I'd suggest either sticking with "points of inversion" (though that also sounds somewhat discontinuous to me). If you have the freedom to change the terminology, perhaps "extrema points" would be good?

  • In my mind, identifying extrema really is descriptive because it's merely a property of the existing fitted parameter(s) - not a new model or inference per see. Just as, e.g., reporting the location of changes in curvature (extrema of the derivative) would be. So I think that it falls well within the domain of describe_nonlinear.

  • Number of extrema: There can be between [0, N-1] extrema for a poly(N) model. For a given MCMC model, some draws may visit (N-1)-extrema models while other draws visit (N-2)-extrema models. So something like the probability of each extremum (proportion of samples) would need to be reported - at least in cases where all MCMC samples do not result in the same number. Not sure what the best layout for a report would be.

@bwiernik
Copy link
Contributor

The mathematical term for this "point of inversion" is "inflection point". I would suggest using that language.

@bwiernik
Copy link
Contributor

One important thing to bear in mind is that when summarizing multiple curves, the computation needs to be done on the curves (curvewise), not on points collapsing across curves (pointwise). See https://mjskay.github.io/ggdist/reference/curve_interval.html for discussion

@lindeloev
Copy link

Ah, sorry I understood it as the first derivative. "Inflection point" is good. My other thoughts are still relevant, I think.

Maybe it could be generalized so the user can choose which derivative to find maximum of:

f': Extrema
f'': Inflection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants