Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image grids at the codec level #1197

Closed
farindk opened this issue Jun 25, 2024 · 10 comments
Closed

Image grids at the codec level #1197

farindk opened this issue Jun 25, 2024 · 10 comments

Comments

@farindk
Copy link
Contributor

farindk commented Jun 25, 2024

Some codecs support images composed of tiles at the codec level (ISO 23001-17, h.265).
Is there any use-case to support this for writing and fast access or would it be easier to just use the HEIF tiling mechanism and consider each coded image as a unit?

Note: clearly, we should support tiles in the codec when reading the image. The question is only whether we should provide some means to control the tiling at the codec level when writing the image and whether we should provide some means to decode these images only partially.

#1180

@farindk
Copy link
Contributor Author

farindk commented Jun 25, 2024

Copy of @bradh answer in #1180:

It should be possible to write tiles one-by-one. There is a way to say "each of these extents needs to be done separately", and to specify the range. That opens up the possibility to recycle the bits in one tile that is used to fill many grid spots (e.g. when using a map, and there is a lot of blue ocean; or when making black fill tiles for an image that is rotated to be north-up).

Ok. Now the question still is whether there is a use case to expose this tile organization of the codec in the libheif API. I mean, "reuse of tiles" could also be done with the methods of the HEIF grid image. (Or it could be done implicitly by the uncompressed codec as some form of compression.)

The design would be much easier if we do not have to handle grids at both the HEIF and the codec layer.

@silverbacknet
Copy link

The main use case of tiling at the codec level is parallel processing, huge sizes, or splitting into regions (like half screen content, half talking head), all of which can be more easily and flexibly achieved at the heif tiling level imho. (H.264/AVC has tiling too, as slices.) I'd say the only reason to dive into codec level tiles is if you want to extract them into heif tiles, and there's no reason to expose them for encoding.

With heif, you aren't even limited to using the same encoder for tiles, which is one of the reasons I'm in love with the format. It might be simpler to use in some cases than a layered derived image.

@bradh
Copy link
Contributor

bradh commented Jun 26, 2024

For a case where you might need both:

image

The grid is multiple cameras, and each camera is a high res image producer, so you'd like to tile the (potentially uncompressed or generically compressed) product from each camera.

There are also remote sensing systems (aka satellites with cameras) with multiple telescopes and multiple large sensors per camera. That might or might not make sense as a grid with fill pixels.

@farindk
Copy link
Contributor Author

farindk commented Jun 26, 2024

When the images of several cameras are combined into one HEIF, the codec tiles might also be 'lifted up' into HEIF tiles. Probably even without touching the data by using appropriate extents. Moreover, I would assume that in order to combine the images from several cameras, the images have to be geometrically transformed anyway to be stitched seamlessly.

But let's take the "worst case": each camera does its own geometric transform, produces its own stream, and the HEIF is combining them together via dref into a grid. The implementation could:

  • check on HEIF level, which tiles are covered by the ROI and then,
  • pass the intersection area of the ROI and tile area to the decoder.
  • The decoder only has to decode that part of the image (if the codec supports this).

That helps with fast decoding, but it doesn't with reducing the file size for interactive network streaming.

For network streaming I still think the tiling should be lifted into the HEIF container on the server in an (offline) reformatting step. That might be needed anyway for building the resolution pyramid.

@farindk
Copy link
Contributor Author

farindk commented Jul 19, 2024

In case we support tiles at the codec level (primarily 23001-17). We need a different API as currently drafted in the file-layout branch. As the codec-level tiles have no heif_item_id, we need a different way to address them and we cannot use the standard heif_decode_image to get them.

Another question will be: how can we build the 23001-17 tiled image tile by tile when it's not possible to hold the whole image in memory)? Do we have to allocate the file space for the whole image and fill that in step by step? How could that be done when data is compressed and thus of unknown size? Does 23001-17 have pointers to the tile start positions or do they have to come one after another in the file?
@bradh could you please give some insights into how this might work with 23001-17.

@bradh
Copy link
Contributor

bradh commented Jul 19, 2024

iloc extents should be easy to predict when there is no compression and you know the tile size, number of components and bit depth + padding + alignment.

The proposal for generic compression is in flux, but there will almost certainly be an associated property that will provide the compressed extents. That was icbr in the implementation I did (when it looked more stable than now).

@farindk
Copy link
Contributor Author

farindk commented Jul 19, 2024

Right, iloc extents should make it possible to insert tiles into the files even if they are inserted non-sequential.

@NanGuoBean
Copy link

Hi, I'm very interested in the topic you're discussing, particularly regarding how to encode and save extremely large images with limited memory.

I'm currently facing a specific challenge:

I need to store an image of billions of pixels. Most of this image consists of solid color blocks, so despite its large original size, the compressed file should be relatively small(around 10 MB).I've tried compressing the image in sections and then merging them together, but the merging process requires decoding to bitmap format, which exceeds memory limits.

Seeing your discussion about the tile-by-tile method, I think this might be a direction to solve my problem. Would the tile-by-tile method be applicable for the situation I've described?

Considering the characteristics of the image (large size but mostly solid color blocks), I have thought many would have same problems with dealing large image, but I couldn't find any. :(

are there other more elegant solutions? Thanks!

@farindk
Copy link
Contributor Author

farindk commented Sep 24, 2024

@NanGuoBean I've just added the functions to build images tile by tile. You need the current master branch for this, the functions are not included in the current release v1.18.2.

First add all the tile images to the heif_context and then use this function to combine them into a grid image:

struct heif_error heif_context_add_grid_image(struct heif_context* ctx,
uint32_t image_width,
uint32_t image_height,
uint32_t tile_columns,
uint32_t tile_rows,
const heif_item_id* image_ids,
struct heif_image_handle** out_grid_image_handle);

You might then want to make the grid image the primary image.

For the decoding side, you can get the info how the image is tiled using

struct heif_image_tiling heif_image_handle_get_image_tiling(const struct heif_image_handle* handle);

and then you can decode individual tiles with this function:

struct heif_error heif_image_handle_decode_image_tile(const struct heif_image_handle* in_handle,
struct heif_image** out_img,
enum heif_colorspace colorspace,
enum heif_chroma chroma,
const struct heif_decoding_options* options,
uint32_t x0, uint32_t y0);

Note that this is work in progress. Everything works well already, but the API might still slightly change for the official release.

There are two more methods that use a different internal representation (heif_context_add_tild_image, heif_context_add_unci_image), but they are still experimental.

@farindk
Copy link
Contributor Author

farindk commented Sep 25, 2024

I think we can close this issue as we now have the implementation from #1318.
h265 codec-level tiles seems irrelevant to me at this moment and much harder to do.

@farindk farindk closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants