-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] add low-rank fitting #134
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is sweet. Do we want to warn the user in the event the low-rank solver gets invoked, just to let them know they've been ridged? (maybe something like you get down at the bottom of the statsmodels summary?)
@ljwolf This will be useful! And like @knaaptime said, a low-rank warning will be helpful. One consideration is that whether we want to throw a warning at each location (which may give lots of warnings), or just to have one single warning. The other possibility to work around the problem is to increase the bandwidth. In most cases, local collinearity occurs when data are very similar or even identical at a small bandwidth, and increasing the bandwidth will bring more variation and potentially increase the rank. So if a small bandwidth gives error during the searching, skip it and try a larger one may fix it. This will not work if the data are collinear globally. |
I would probably opt for a single warning to avoid it being too noisy.
We can add a clear error message suggesting that but it often happens during the bandwidth selection and if there is a way of treating these corner cases in the implementation, I think it is preferable. |
Yeah I think there is a question as to whether you want warnings during bandwidth selection while trying many bandwidths or just during final model fitting after bandwidth selection. We would need to decide if we want users to be able to end up with a final model that has been patched or is it more to allow the bandwidth selection routine to continue. I have some other thoughts that I will try to gather shortly. |
A warning for the final model is useful, it would be also helpful to have the warning during the searching when the |
yes, 100%. This is what I mean by "post-fit diagnostics". This needs to be added to the results classes pending the local rank.
I would only warn if the final model has a low-rank solution. It doesn't really matter if the optimiser hits a "bad" region of the solution space, so long as that's not where we stay at the end of the day!
Sure, fine. I'd like to know what makes sense to you for this to look like? I think we can check during golden section. If any local fit is degenerate, then the "score" for that bandwidth should be set to The alternative is to let the optimiser use the overdetermined solutions. I like this approach somewhat, because we're not forcing the bandwidth to do the regularisation... but this style of API poses extremely common issues in LME4's mixed models, so I'm split. Either way, we should warn in the results if there is a local singular fit, and a larger minimum bandwidth might be needed. |
This is a proof of concept for low-rank fitting for
mgwr
that tries to avoid multicollinearity errors during fit. It requires a few changes to spglm, but the main idea is:_compute_betas()
and_compute_betas_gwr()
to use more robust least squares solversgwr
hat matrix.This allows you to get a solution for degenerate test cases like #132, even though the solution is not unique. Second, this would allow us to implement post-fitting local multicollinearity checks to help folks in cases like #116.