Skip to content
This repository has been archived by the owner on Jun 10, 2024. It is now read-only.

SEI data lags behind image frames when stored with PyNvEncoder and read out using PyNvDecoder #525

Open
sakoay opened this issue Aug 31, 2023 · 2 comments

Comments

@sakoay
Copy link

sakoay commented Aug 31, 2023

This might be related to issue 1262669666, but as there seems to be no resolution in that thread I am posting it again in case somebody can help.

I am trying to store user data SEI paired with each encoded frame using PyNvEncoder, but upon reading out this SEI message using PyNvDecoder, it seems to be out of sync with the frames. Specifically, the SEI data encoded for frame 3 is returned when decoding frame 1, the SEI encoded for frame 4 is returned when decoding frame 2, and so forth, and the last few decoded frames are returned with no SEI message. The code I am using to produce this effect is as follows:

import numpy as np
import PyNvCodec as nvc

def write_video(video_file, image_width=150, image_height=60, gpu_id=0):
    print(f"Encoding {video_file}...")
    nvenc = nvc.PyNvEncoder({'codec': 'h264', "fps": "60", 's': f"{image_width}x{image_height}"}, gpu_id, nvc.PixelFormat.NV12)

    input_frame = np.zeros((image_height, image_width), dtype=np.uint8)
    enc_frame = np.ndarray((0, ), dtype=np.uint8)
    sei = np.ndarray((4, ), dtype=np.uint8)
    enc_file = open(video_file, "wb")

    for index in range(10):
        input_frame[0:10, index] = 255
        sei.data[:] = b"%04d" % (index + 1)

        if nvenc.EncodeSingleFrame(input_frame, enc_frame, sei, sync=True):
            enc_frame.tofile(enc_file)
            print(f"  SEI: {sei} = {sei.tobytes().decode()}")

    while nvenc.FlushSinglePacket(enc_frame):
        enc_frame.tofile(enc_file)
        print("  (packet flushed)")
    enc_file.close()

def read_video(video_file, gpu_id=0):
    print(f"Decoding {video_file}...")
    nvdec = nvc.PyNvDecoder(video_file, gpu_id)

    frame_nv12 = np.ndarray((nvdec.Framesize() // nvdec.Width(), nvdec.Width()), dtype=np.uint8)
    sei = np.ndarray(0, dtype=np.uint8)
    packet_data = nvc.PacketData()

    num_decoded = 0
    while nvdec.DecodeSingleFrame(frame_nv12, sei, packet_data):
        num_decoded += 1
        print(f"  frame {num_decoded} = {frame_nv12[0, :15]} <-> SEI = {sei}")

if __name__ == "__main__":
    write_video("test_nv.mp4")
    read_video("test_nv.mp4")

Running this prints the following output (on a machine with a NVIDIA GeForce RTX 3070, but also the same on a RTX 4090 GPU):

Encoding test_nv.mp4...
  SEI: [48 48 48 49] = 0001
  SEI: [48 48 48 50] = 0002
  SEI: [48 48 48 51] = 0003
  SEI: [48 48 48 52] = 0004
  SEI: [48 48 48 53] = 0005
  SEI: [48 48 48 54] = 0006
  SEI: [48 48 48 55] = 0007
  SEI: [48 48 48 56] = 0008
  SEI: [48 48 48 57] = 0009
  SEI: [48 48 49 48] = 0010
Decoding test_nv.mp4...
  frame 1 = [254   3   1   1   1   0   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  52 128]
  frame 2 = [254 254   3   1   1   1   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  53 128]
  frame 3 = [254 254 254   3   1   1   1   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  54 128]
  frame 4 = [254 254 254 254   3   1   1   1   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  55 128]
  frame 5 = [254 254 254 254 254   3   1   1   1   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  56 128]
  frame 6 = [254 254 254 254 254 254   3   1   1   1   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  57 128]
  frame 7 = [254 254 254 254 254 254 254   3   1   1   1   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  49  48 128]
  frame 8 = [254 254 254 254 254 254 254 254   3   1   1   1   0   0   0] <-> SEI = []
  frame 9 = [254 254 254 254 254 254 254 254 254   3   1   1   1   0   0] <-> SEI = []
  frame 10 = [254 254 254 254 254 254 254 254 254 254   3   1   1   1   0] <-> SEI = []

The frame image data is as expected, but as you can see the SEI data originally entered for frames 1 to 3 seem to have been "lost" upon decoding. I am not sure what is happening, but if I switch the nvenc.EncodeSingleFrame() parameter sync to False, this is what is printed instead:

Encoding test_nv.mp4...
  SEI: [48 48 48 52] = 0004
  SEI: [48 48 48 53] = 0005
  SEI: [48 48 48 54] = 0006
  SEI: [48 48 48 55] = 0007
  SEI: [48 48 48 56] = 0008
  SEI: [48 48 48 57] = 0009
  SEI: [48 48 49 48] = 0010
  (packet flushed)
  (packet flushed)
  (packet flushed)
Decoding test_nv.mp4...
  frame 1 = [254   3   1   1   1   0   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  52 128]
  frame 2 = [254 254   3   1   1   1   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  53 128]
  frame 3 = [254 254 254   3   1   1   1   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  54 128]
  frame 4 = [254 254 254 254   3   1   1   1   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  55 128]
  frame 5 = [254 254 254 254 254   3   1   1   1   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  56 128]
  frame 6 = [254 254 254 254 254 254   3   1   1   1   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  57 128]
  frame 7 = [254 254 254 254 254 254 254   3   1   1   1   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  49  48 128]
  frame 8 = [254 254 254 254 254 254 254 254   3   1   1   1   0   0   0] <-> SEI = []
  frame 9 = [254 254 254 254 254 254 254 254 254   3   1   1   1   0   0] <-> SEI = []
  frame 10 = [254 254 254 254 254 254 254 254 254 254   3   1   1   1   0] <-> SEI = []

As expected from asynchronous running there are now extra packets at the end of the encoding loop that need to be flushed to the output file, and interestingly the number of packets flushed is the same as the number of "lost " SEI data.

I would be much obliged to receive any help on this issue! The VideoProcessingFramework has been a game changer in my efforts at high-speed encoding from multiple video sources, and I was so delighted that it supports SEI message storage.

@sakoay
Copy link
Author

sakoay commented Aug 31, 2023

Ok, at least I found something like a workaround by running PyNvDecoder with a standalone demuxer (PyFFmpegDemuxer). The demuxer does return the correct SEI message for every packet, but PyNvDecoder.DecodeFrameFromPacket() only starts returning frames upon the 3rd call, hence resulting in an apparent SEI mismatch. This is the alternate code:

def demux_video(video_file, gpu_id=0):
    print(f"Decoding {video_file} with standalone demuxer...")

    nvdemux = nvc.PyFFmpegDemuxer(video_file)
    nvdec = nvc.PyNvDecoder(nvdemux.Width(), nvdemux.Height(), nvdemux.Format(), nvdemux.Codec(), gpu_id)

    frame_nv12 = np.ndarray((0, ), dtype=np.uint8)
    sei = np.ndarray(0, dtype=np.uint8)
    packet = np.ndarray(0, dtype=np.uint8)
    enc_packet = nvc.PacketData()
    dec_packet = nvc.PacketData()

    num_decoded = 0
    while nvdemux.DemuxSinglePacket(packet, sei):
        if nvdec.DecodeFrameFromPacket(frame_nv12, enc_packet, packet, dec_packet):
            num_decoded += 1
            frame_nv12 = frame_nv12.reshape(-1, nvdemux.Width())
            print(f"  frame {num_decoded} = {frame_nv12[0, :15]} <-> SEI = {sei}")
        else:
            print(f"  frame not ready <-> SEI = {sei}")

    while nvdec.FlushSingleFrame(frame_nv12, dec_packet):
        num_decoded += 1
        frame_nv12 = frame_nv12.reshape(-1, nvdemux.Width())
        print(f"  frame {num_decoded} = {frame_nv12[0, :15]}")

which now prints:

Decoding test_nv.mp4 with standalone demuxer...
  frame not ready <-> SEI = [  0   0   0   1   6   5   4  48  48  48  49 128]
  frame not ready <-> SEI = [  0   0   0   1   6   5   4  48  48  48  50 128]
  frame not ready <-> SEI = [  0   0   0   1   6   5   4  48  48  48  51 128]
  frame 1 = [254   3   1   1   1   0   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  52 128]
  frame 2 = [254 254   3   1   1   1   0   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  53 128]
  frame 3 = [254 254 254   3   1   1   1   0   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  54 128]
  frame 4 = [254 254 254 254   3   1   1   1   0   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  55 128]
  frame 5 = [254 254 254 254 254   3   1   1   1   0   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  56 128]
  frame 6 = [254 254 254 254 254 254   3   1   1   1   0   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  48  57 128]
  frame 7 = [254 254 254 254 254 254 254   3   1   1   1   0   0   0   0] <-> SEI = [  0   0   0   1   6   5   4  48  48  49  48 128]
  frame 8 = [254 254 254 254 254 254 254 254   3   1   1   1   0   0   0]
  frame 9 = [254 254 254 254 254 254 254 254 254   3   1   1   1   0   0]
  frame 10 = [254 254 254 254 254 254 254 254 254 254   3   1   1   1   0]

It seems like a better explanation of why the PyNvDecoder with internal demuxer could report SEI messages with an offset. What do you think?

@RomanArzumanyan
Copy link
Contributor

RomanArzumanyan commented Sep 15, 2023

Hi @sakoay
Apologies for the big delay in reply.

It looks like frame reordering is happening. During this process encoder compresses video frames in order different from input order. It's done for better compression efficiency.

To check this assumption you can initialize your PyNvEncoder in low latency mode, that will eliminate frame reordering.

You can get list of PyNvEncoder supported options with this function:

m.def("GetNvencParams", &GetNvencInitParams, R"pbdoc(
Get list of params PyNvEncoder can be initialized with.
)pbdoc");

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants