Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AnomalyDetection] Add base classes and specifiable protocol #33845

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Feb 4, 2025

This is the first part of the code push for anomaly detection transform.

@github-actions github-actions bot added the python label Feb 4, 2025
@shunping
Copy link
Contributor Author

shunping commented Feb 4, 2025

r: @damccorm

Copy link
Contributor

github-actions bot commented Feb 4, 2025

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@shunping shunping changed the title [Anomaly Detection] Add base classes and specifiable protocol [AnomalyDetection] Add base classes and specifiable protocol Feb 4, 2025
Copy link

codecov bot commented Feb 4, 2025

Codecov Report

Attention: Patch coverage is 91.16022% with 16 lines in your changes missing coverage. Please review.

Project coverage is 59.12%. Comparing base (5aae10d) to head (bc6fa8f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/ml/anomaly/base.py 83.60% 10 Missing ⚠️
sdks/python/apache_beam/ml/anomaly/specifiable.py 95.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #33845      +/-   ##
============================================
+ Coverage     59.08%   59.12%   +0.03%     
  Complexity     3237     3237              
============================================
  Files          1156     1158       +2     
  Lines        176907   177113     +206     
  Branches       3391     3391              
============================================
+ Hits         104532   104715     +183     
- Misses        69008    69031      +23     
  Partials       3367     3367              
Flag Coverage Δ
python 81.25% <91.16%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but had some questions. In general, docs would be really helpful here. I really like the concept

sdks/python/apache_beam/ml/anomaly/base.py Show resolved Hide resolved
sdks/python/apache_beam/ml/anomaly/base_test.py Outdated Show resolved Hide resolved
key=None,
error_if_exists=True,
on_demand_init=True,
just_in_time_init=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time following this - I think some of it is just indented python is hard to read, but I think it would be very helpful to have a walkthrough here of what this function is doing

I think I've mostly figured out how it is working, but it would help to have this background.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more comments. PTAL.

sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/ml/anomaly/specifiable.py Outdated Show resolved Hide resolved
def run_init(self):
original_init(self, **self._init_params)

def new_getattr(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not understanding this well, but could we just do something a bit simpler like:

def new_getattr(self, name):
   if not self._initialized:
      original_init()
      self._initialized = True
   return orig_getattr(self, name)

if we keep track of the original getattr function in orig_getattr

Copy link
Contributor Author

@shunping shunping Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is like this, but there are some edge cases we need to avoid. Otherwise, we will end up with an infinite loop. I refactor the code a bit to improve its readability. PTAL.


from typing_extensions import Self

ACCEPTED_SPECIFIABLE_SUBSPACES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we start with these registered? Would it be cleaner to just register them on import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the accepted subspaces for the known specifiable, but not the actual specifiable classes. The purpose is to avoid using a global space for every specifiable classes.

I think they should be predefined before the true registration takes place. WDYT?

@shunping shunping marked this pull request as draft February 6, 2025 19:59
@shunping shunping force-pushed the anomaly-detection branch 2 times, most recently from 371d465 to 0b3a7ca Compare February 7, 2025 04:45
@shunping shunping marked this pull request as ready for review February 7, 2025 04:55
@shunping
Copy link
Contributor Author

shunping commented Feb 7, 2025

@damccorm, I've added docstrings and refactored some code for readability. PTAL. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants