Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hdf5 improvements #7

Open
wants to merge 114 commits into
base: master
Choose a base branch
from
Open

Conversation

johltn
Copy link

@johltn johltn commented Feb 3, 2022

No description provided.


string(REGEX REPLACE "([^;]+)" "${IFC_LIBRARY_DIR}/\\1.lib" IFC_LIBRARIES "${IFC_LIBRARY_NAMES}")

if (${ifcopenshell_major_version} STREQUAL "0.5")
set(ICU_LIBRARY_NAMES icuuc icudt)
set(ICU_LIBRARY_DIR ${IFCOPENSHELL_ROOT}/deps-vs${MSVC_YEAR}-x86-installed/icu/lib)
set(ICU_INCLUDE_DIR ${IFCOPENSHELL_ROOT}/deps-vs${MSVC_YEAR}-x86-installed/icu/include)
set(ICU_LIBRARY_DIR ${IFCOPENSHELL_ROOT}/_deps-vs${MSVC_YEAR}-${WIN_ARCH}-installed/icu/lib)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the time of the 0.5 build script the dir was still called deps not _deps can you change that back?


// Write a 4D dataset

const H5std_string FILE_NAME("multidim.h5");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the remainder of this test do? Should we open multi_dim_vox.h5 with RDONLY and check whether the dimensions and ranks of the datasets make sense?

Or even better: let's say we compare storage3->count() with std::accumulate(dataset.read()) (oversimplified the last) that way we have an actual test with ASSERT_EQUAL.

@@ -0,0 +1,105 @@
#include "../voxelizer.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significant difference between this one and test_hdf?

class voxel_writer {
private:

#define BEGIN_LOOP(i0, i1, j0, j1, k0, k1) {\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are already defined in storage.h which is included

BEGIN_LOOP(size_t(0), nchunks_x, 0U, nchunks_y, 0U, nchunks_z)
auto c = storage->get_chunk(ijk);

if (c == nullptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the whitespace formatting a bit consistent here?

The general rule (although undocumented still) for this project is } else { and the indentation within BEGIN_LOOP is also not consistent.


const int RANK = 4;

hsize_t dimsf[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find all these variable declarations not immensely useful. I know they do it in the samples, but for me it creates an extra indirection between NX and storage->chunk_size(), I'd rather have a more meaningful name for the variables so that we understand the intent.

I'd also prefer the syntax:

hsize_t dimsf[4] = {continuous_count, chunk_size, chunk_size, chunk_size };

And please remove all the unnecessary spaces between type and variable name. If any of them remain after the reorganization.

hsize_t	   NZ
       ^^^^^^

dimsf[RANK] for consistency

With this advise most of the lines can be eliminated making it much more clear for the casual reader what's going on.

std::vector<uint32_t> hslab;
c->Get(ijk, &v);
hslab.push_back(v);
dataset.write(hslab.data(), H5::PredType::NATIVE_INT, mspace2, dataspace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct, uint32_t is not the same as NATIVE_INT, I'm quite sure this will be completely wrong in 64bit systems. In case c->value_bits() == 32 , the dataset type should be H5::PredType::NATIVE_UINT32

offset[1] = ijk.get(0);
offset[2] = ijk.get(1);
offset[3] = ijk.get(2);
dataspace.selectHyperslab(H5S_SELECT_SET, slab_dimsf, offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I'm not entirely sure of the performance overhead, could be minimal, could be huge, but writing every individual voxel to file separately seems intuitively to me to be very inefficient. HDF5 is not a simple format, it does a lot of stuff on reading and writing witch caching and chunking. So it seems to me that this will create a lot of overhead for individual writes. Even for simple formats it's faster to write large chunks of data at the same time.

I propose to leave it as is for now though, but please make a comment that we should investigate the performance of this.

if (c->value_bits() == 32) {

uint32_t v;
std::vector<uint32_t> hslab;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least for now, let's not create a vector, if you know it's always size one, just use a single uint32_t v and write that directly dataset.write(&v, ...

}

else {
std::vector<int> hslab = { c->Get(ijk) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this more explicit.

int v = c->Get(ijk) ? 1 : 0.

So that it's clear what value we're using for the booleans (even though it's super obvious).

Again

dataset.write(&v, ...

In this case it seems wasteful to use NATIVE_INT (64bit) for a single bit. I would use NATIVE_UINT8, which corresponds to the c++ type uint8_t (8bit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants