-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for gifs/video files #12
Comments
The problem is the false positives. the normal 6% rate is ok in a single image, but in a video will almost be a 100% hit rate among all the frames. A video has to be scanned using possibly some very clever thresholds and looping back to problem spots for a second pass |
Could that 6% number be used to our advantage in this case? If we know we'll be checking multiple photos, then we grant 1 positive as a false positive, but the chances of say, 3 or more false positives are... 0.0216% if I remember my combinatorics right. So if even 2 or more frames flag then we are pretty sure it should be flagged? |
A video could have thousands or tens of thousands of frames. though. I am not saying it's not possible, but requires some thought |
That's why I think a solid sampling of them would be sufficient if a decent thought process could be thought of. Thinking from an attacker's perspective, they would want to try to get around it. So something like this example algorithm
If N is 20, this would be 42 total frames analyzed. We would cover the entirety of the video and it would be incredibly difficult to hide anything in the intervals we didn't scan because of the random frames grabbed. Of course those dials could be tweaked over time, or could be an environment variable for how fine tuned the user may want to analyze videos. If more than some percentage fail, then consider the video failed. If 6% are false positives, then it'd be reasonable to say that if 10-20% failed then we are reasonably sure that the video should be failed. (That also could be an environment variable) |
Yes that's what I meant with more thought :) Anyway, feel free to send a PR ;) |
Wanted to give a heads up, don't worry haven't forgotten about this. I'm not a python dev so I'm learning python while fiddling with it. It looks like the library can detect if it's a gif and can let me grab frames using seek. I plan to grab the requested number of scanned frames from an environment variable, then follow the quick algorithm described above For every gif then I'm going to try to make the variable mean "For every gif expect this many checks". So the higher the number the more fine-grained the check will be. Then another variable with a default value of 20% or so will say "If higher than this number register as a positive, reject" If you could confirm, I'm planning on adding this logic to |
I think we could add some basic video/gif support by re-using the image processing and taking frames of each gif/video that pictrs stores.
I don't know python well, but I do know ffmpeg and video processing very well. Using a library like ffmpeg-python I think we could add on support for videos/gifs. ffmpeg could then be easily bundled with the containers, and could be toggled with an environment variable
The process could go like:
I think this would be a valid way to start checking videos as well. With the sequenced every n seconds combined with random frames grabbed, it would be extremely difficult to try to hide NSFW in a video.
The text was updated successfully, but these errors were encountered: