[PoC] add low-rank fitting #134

ljwolf · 2023-11-07T12:22:28Z

This is a proof of concept for low-rank fitting for mgwr that tries to avoid multicollinearity errors during fit. It requires a few changes to spglm, but the main idea is:

swap _compute_betas() and _compute_betas_gwr() to use more robust least squares solvers
implement an iterative low-rank solver for the gwr hat matrix.

This allows you to get a solution for degenerate test cases like #132, even though the solution is not unique. Second, this would allow us to implement post-fitting local multicollinearity checks to help folks in cases like #116.

knaaptime

this is sweet. Do we want to warn the user in the event the low-rank solver gets invoked, just to let them know they've been ridged? (maybe something like you get down at the bottom of the statsmodels summary?)

Ziqi-Li · 2023-11-09T12:40:36Z

@ljwolf This will be useful! And like @knaaptime said, a low-rank warning will be helpful. One consideration is that whether we want to throw a warning at each location (which may give lots of warnings), or just to have one single warning.

The other possibility to work around the problem is to increase the bandwidth. In most cases, local collinearity occurs when data are very similar or even identical at a small bandwidth, and increasing the bandwidth will bring more variation and potentially increase the rank. So if a small bandwidth gives error during the searching, skip it and try a larger one may fix it. This will not work if the data are collinear globally.

martinfleis · 2023-11-09T12:48:49Z

just to have one single warning

I would probably opt for a single warning to avoid it being too noisy.

The other possibility to work around the problem is to increase the bandwidth

We can add a clear error message suggesting that but it often happens during the bandwidth selection and if there is a way of treating these corner cases in the implementation, I think it is preferable.

TaylorOshan · 2023-11-09T12:55:15Z

Yeah I think there is a question as to whether you want warnings during bandwidth selection while trying many bandwidths or just during final model fitting after bandwidth selection. We would need to decide if we want users to be able to end up with a final model that has been patched or is it more to allow the bandwidth selection routine to continue.

I have some other thoughts that I will try to gather shortly.

Ziqi-Li · 2023-11-09T13:02:53Z

A warning for the final model is useful, it would be also helpful to have the warning during the searching when the verbose is set to True.

ljwolf · 2023-11-10T14:21:48Z

@knaaptime: to let them know they've been ridged

yes, 100%. This is what I mean by "post-fit diagnostics". This needs to be added to the results classes pending the local rank.

@Ziqi-Li: throw a warning at each location (which may give lots of warnings), or just to have one single warning

I would only warn if the final model has a low-rank solution. It doesn't really matter if the optimiser hits a "bad" region of the solution space, so long as that's not where we stay at the end of the day!

@Ziqi-Li, @martinfleis, @TaylorOshan: increase the bandwidth

Sure, fine. I'd like to know what makes sense to you for this to look like?

I think we can check during golden section. If any local fit is degenerate, then the "score" for that bandwidth should be set to -numpy.inf, and the optimiser will end searches in that direction. Still, we need these permissive solvers to make the map/pool.map() pattern work.

The alternative is to let the optimiser use the overdetermined solutions. I like this approach somewhat, because we're not forcing the bandwidth to do the regularisation... but this style of API poses extremely common issues in LME4's mixed models, so I'm split.

Either way, we should warn in the results if there is a local singular fit, and a larger minimum bandwidth might be needed.

add low-rank tooling depending on spglm@17699d7dc

0847b09

ljwolf requested review from Ziqi-Li, TaylorOshan, martinfleis and knaaptime November 7, 2023 12:23

knaaptime reviewed Nov 8, 2023

View reviewed changes

Ziqi-Li mentioned this pull request Dec 2, 2023

Switching from multiprocessing to joblib #136

Closed

martinfleis mentioned this pull request Dec 20, 2023

LinAlgError: Matrix is Singular #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] add low-rank fitting #134

[PoC] add low-rank fitting #134

ljwolf commented Nov 7, 2023 •

edited

Loading

knaaptime left a comment

Ziqi-Li commented Nov 9, 2023

martinfleis commented Nov 9, 2023

TaylorOshan commented Nov 9, 2023

Ziqi-Li commented Nov 9, 2023

ljwolf commented Nov 10, 2023

[PoC] add low-rank fitting #134

Are you sure you want to change the base?

[PoC] add low-rank fitting #134

Conversation

ljwolf commented Nov 7, 2023 • edited Loading

knaaptime left a comment

Choose a reason for hiding this comment

Ziqi-Li commented Nov 9, 2023

martinfleis commented Nov 9, 2023

TaylorOshan commented Nov 9, 2023

Ziqi-Li commented Nov 9, 2023

ljwolf commented Nov 10, 2023

ljwolf commented Nov 7, 2023 •

edited

Loading