Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KMedoids #298

Merged
merged 8 commits into from
Mar 28, 2025
Merged

Add KMedoids #298

merged 8 commits into from
Mar 28, 2025

Conversation

juliohm
Copy link
Member

@juliohm juliohm commented Mar 27, 2025

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.24%. Comparing base (1de5b52) to head (8ad1323).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #298      +/-   ##
==========================================
+ Coverage   98.17%   98.24%   +0.07%     
==========================================
  Files          48       49       +1     
  Lines        1367     1424      +57     
==========================================
+ Hits         1342     1399      +57     
  Misses         25       25              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juliohm juliohm marked this pull request as ready for review March 27, 2025 20:57
@juliohm juliohm requested a review from eliascarv March 27, 2025 20:57
@eliascarv
Copy link
Member

I think the only point that we don't have a simple way to resolve would be the type of dists = fill(Inf, nobs), as we don't have a practical way of knowing what the type returned by the parwise of the columns would be.

So I think starting it as Float64 is the simplest and safest solution, just like it is in the PR.

@juliohm
Copy link
Member Author

juliohm commented Mar 27, 2025 via email

@juliohm juliohm requested a review from eliascarv March 27, 2025 23:36
@juliohm
Copy link
Member Author

juliohm commented Mar 27, 2025

There is still an issue regarding unitful columns with different units, but that should be addressed in TableDistances.jl. If distances are computed with different units, and these units can't be added, we will get an error. Perhaps TableDistances.jl should remove the units to make it work, or warn users that columns must have compatible units.

@juliohm
Copy link
Member Author

juliohm commented Mar 28, 2025

I am going to merge assuming that the last review was an approval with comment 👍🏽

@juliohm juliohm merged commit 1bb3db8 into master Mar 28, 2025
8 checks passed
@juliohm juliohm deleted the kmedoids branch March 28, 2025 01:39
@eliascarv
Copy link
Member

Amazing PR @juliohm!

@juliohm
Copy link
Member Author

juliohm commented Mar 28, 2025

Quick correction: the StdFeats transform applied in the beginning of KMedoids takes care of the units since it applies the ZScore for Continuous variables. I pushed a fix to master renaming the resulting column from cluster to label and will do the same downstream for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants