-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Random time-based sampler #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock. But take a look at the comments to see if you can address them
src/torchcodec/samplers/_implem.py
Outdated
num_clips is not None and seconds_between_clip_starts is not None | ||
): | ||
# This is internal only and should never happen | ||
raise ValueError("Bad, bad programmer!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details to this message, like the actual values of num_clips and seconds_between...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the error message to something more conventional, but to repeat the comment: this is non-user facing code, it's purely internal and should never ever be triggered
test/samplers/test_samplers.py
Outdated
# 2 different clips when teh sampling range is 2 seconds. | ||
# range to be 1 second or 2 seconds. Since we set | ||
# seconds_between_clip_starts to 1 we expect exactly one clip with the | ||
# sampling range is of size 1, and 2 different clips when teh sampling range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/teh/the
Also why was this comment change made? Text seems the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had just edited a typo from seconds_between_clip_start to seconds_between_clip_starts
@@ -252,10 +274,10 @@ def test_sampling_range_default_behavior_random_sampler(): | |||
|
|||
num_clips = 20 | |||
num_frames_per_clip = 15 | |||
sampling_range_start = -20 | |||
sampling_range_start = -20 if sampler is clips_at_random_indices else 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment saying negative ranges are fine because they are like python indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is relevant to add in this test honestly. This should just be part of the docstring of samplers (which is still to be done)
@@ -321,14 +343,19 @@ def test_sampling_range_default_regular_sampler(sampler): | |||
partial( | |||
clips_at_regular_indices, sampling_range_start=-1, sampling_range_end=1000 | |||
), | |||
# Note: the hard-coded value of sampling_range_start=12 is because we know | |||
# Note: the hard-coded value of sampling_range_start=13 is because we know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not exactly 13.01? Can you explain the epsilon value in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming exactly 13.01 corresponds to end_stream_seconds
, then 13.01
would be an invalid value here.
We just need to set a value that is < end_stream_seconds
so that the clip start is valid, but large enough so that the clip span is such that it goes beyond end_stream_seconds
and thus triggers the "error" policy.
Similarly for the index-based case, we set this value to -1
, thus enforcing the clip to start on the last frame.
I will slightly update the comment so as to not claim that the video duration is exactly 13.01.
if sampler is clips_at_random_timestamps: | ||
with pytest.raises( | ||
ValueError, | ||
match=re.escape("num_clips (0) must be > 0"), | ||
): | ||
sampler(decoder, num_clips=0) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's just me but parameterizing tests and then adding if conditions for particular values with specific results seems like an anti-pattern.
It makes the test harder to read. Why not have separate tests so it's way more explicit
One of the reasons these tests exist is to show the user what not to do in a very clear and explicit manner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that parametrization can lead to some quirks like this. If you feel strongly that these 2 unique checks should be written into 2 new separate test functions, I'll do it as a follow up.
I expect however these error checks to be a "write once and never go back to it" thing, so I'm personally OK with the if/else awkwardness.
sampling_range_width = sampling_range_end - sampling_range_start | ||
# torch.rand() returns in [0, 1) | ||
# which ensures all clip starts are < sampling_range_end | ||
clip_start_seconds = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should sort these clips starts by start_timestamps?
Do you know how the existing samplers do it and what the user expectations are here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This PR adds a random time-based sampler, and also updates our benchmark.
It must take a
num_clips
parameter instead ofseconds_between_clip_starts
like inclips_at_regular_timestamps()
.This means the
_generic_time_based_sampler
takes bothnum_clips
andseconds_between_clip_starts
as parameter, and they're mutually exclusive. That is not user-facing.The
_implem.py
file is growing large. When this PR is merged, I think it'll be time to re-organize and split the implementation in different files. Typically the policies could be kept separate.