Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] add low-rank fitting #134

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[PoC] add low-rank fitting #134

wants to merge 1 commit into from

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Nov 7, 2023

This is a proof of concept for low-rank fitting for mgwr that tries to avoid multicollinearity errors during fit. It requires a few changes to spglm, but the main idea is:

  1. swap _compute_betas() and _compute_betas_gwr() to use more robust least squares solvers
  2. implement an iterative low-rank solver for the gwr hat matrix.

This allows you to get a solution for degenerate test cases like #132, even though the solution is not unique. Second, this would allow us to implement post-fitting local multicollinearity checks to help folks in cases like #116.

Copy link
Member

@knaaptime knaaptime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sweet. Do we want to warn the user in the event the low-rank solver gets invoked, just to let them know they've been ridged? (maybe something like you get down at the bottom of the statsmodels summary?)

@Ziqi-Li
Copy link
Member

Ziqi-Li commented Nov 9, 2023

@ljwolf This will be useful! And like @knaaptime said, a low-rank warning will be helpful. One consideration is that whether we want to throw a warning at each location (which may give lots of warnings), or just to have one single warning.

The other possibility to work around the problem is to increase the bandwidth. In most cases, local collinearity occurs when data are very similar or even identical at a small bandwidth, and increasing the bandwidth will bring more variation and potentially increase the rank. So if a small bandwidth gives error during the searching, skip it and try a larger one may fix it. This will not work if the data are collinear globally.

@martinfleis
Copy link
Member

just to have one single warning

I would probably opt for a single warning to avoid it being too noisy.

The other possibility to work around the problem is to increase the bandwidth

We can add a clear error message suggesting that but it often happens during the bandwidth selection and if there is a way of treating these corner cases in the implementation, I think it is preferable.

@TaylorOshan
Copy link
Collaborator

Yeah I think there is a question as to whether you want warnings during bandwidth selection while trying many bandwidths or just during final model fitting after bandwidth selection. We would need to decide if we want users to be able to end up with a final model that has been patched or is it more to allow the bandwidth selection routine to continue.

I have some other thoughts that I will try to gather shortly.

@Ziqi-Li
Copy link
Member

Ziqi-Li commented Nov 9, 2023

A warning for the final model is useful, it would be also helpful to have the warning during the searching when the verbose is set to True.

@ljwolf
Copy link
Member Author

ljwolf commented Nov 10, 2023

@knaaptime: to let them know they've been ridged

yes, 100%. This is what I mean by "post-fit diagnostics". This needs to be added to the results classes pending the local rank.

@Ziqi-Li: throw a warning at each location (which may give lots of warnings), or just to have one single warning

I would only warn if the final model has a low-rank solution. It doesn't really matter if the optimiser hits a "bad" region of the solution space, so long as that's not where we stay at the end of the day!

@Ziqi-Li, @martinfleis, @TaylorOshan: increase the bandwidth

Sure, fine. I'd like to know what makes sense to you for this to look like?

I think we can check during golden section. If any local fit is degenerate, then the "score" for that bandwidth should be set to -numpy.inf, and the optimiser will end searches in that direction. Still, we need these permissive solvers to make the map/pool.map() pattern work.

The alternative is to let the optimiser use the overdetermined solutions. I like this approach somewhat, because we're not forcing the bandwidth to do the regularisation... but this style of API poses extremely common issues in LME4's mixed models, so I'm split.

Either way, we should warn in the results if there is a local singular fit, and a larger minimum bandwidth might be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants