Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the netcdf API to support programmatic changes to the plugin search path #3034

Merged
merged 16 commits into from
Nov 13, 2024

Conversation

DennisHeimbigner
Copy link
Collaborator

@DennisHeimbigner DennisHeimbigner commented Sep 30, 2024

Replaces PR #3024 and PR #3033

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following API as being the simplest possible. It does have the disadvantage that it requires use of a global lock (not implemented) if used in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows:

lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);

Internal Architecture

It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the nc_plugin_path_set function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher.

There is a complication with respect to the nc_plugin_path_get function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

Test Changes

  • New tests
    a. unit_test/run_pluginpaths.sh -- was created to test this new capability.
    b. A new test utility has been added as unit_test/run_dfaltpluginpath.sh to test the default plugin path list.
  • New test support utilities
    a. unit_test/ncpluginpath.c -- report current state of the plugin path
    b. unit_test/tst_pluginpaths.c -- test program to support run_pluginpaths.sh

Documentation

  • A new file -- docs/pluginpath.md -- provides documentation of the new API. It includes some
    material taken for filters.md.

Other Major Changes

  1. Cleanup the whole plugin path decision tree. This is described in the docs/pluginpath.md document and summarized in Addendum 2 below.
  2. I noticed that the ncdump/testpathcvt.sh had been disabled, so fixed and re-enabled it. This necessitated some significant changes to dpathmgr.c.

Misc. Changes

  1. Add some path manipulation utilities to netcf_aux.h
  2. Fix some minor bugs in netcdf_json.h
  3. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
  4. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
  5. Fix some size_t <-> int conversion warnings.
  6. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
  7. Cleanup some minor CMakeLists.txt problems.
  8. Provide an implementation of echo -n since it appears to not be
    available on all platforms.
  9. Add a property list mechanism to pass environmental information to filters.
  10. Cleanup Doxyfile.in
  11. Fixed a memory leak in libdap2; surprised that I did not find this earlier.

Addendum 1: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.

typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

  • int nc_plugin_path_ndirs(size_t* ndirsp);
    Arguments: ndirsp -- store the number of directories in this memory.

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

  • int nc_plugin_path_get(NCPluginList* dirs);
    Arguments: dirs -- counted vector for storing the sequence of directies in the internal path list.

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required. If the value of dirs.dirs is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of dirs.dirs to hold the vector of directories.

  • int nc_plugin_path_set(const NCPluginList* dirs);
    Arguments: dirs -- counted vector for providing the new sequence of directories in the internal path list.

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an ndirs argument of 0 will clear the set of plugin paths.

Addendum 2: Build-Time and Run-Time Constants.

Build-Time Constants

Table showing the build-time computation of NETCDF_PLUGIN_INSTALL_DIR and NETCDF_PLUGIN_SEARCH_PATH.
--with-plugin-dir--prefixNETCDF_PLUGIN_INSTALL_DIRNETCDF_PLUGIN_SEARCH_PATH
undefinedundefinedundefinedPLATFORMDEFALT
undefined<abspath-prefix><abspath-prefix>/hdf5/lib/plugin<abspath-prefix>/hdf5/lib/plugin<SEP>PLATFORMDEFALT
<abspath-plugins>N.A.<abspath-plugins><abspath-plugins><SEP>PLATFORMDEFALT
Table showing the computation of the initial global plugin path
HDF5_PLUGIN_PATHInitial global plugin path
undefinedNETCDF_PLUGIN_SEARCH_PATH
<path1;...pathn><path1;...pathn>

@CLAassistant
Copy link

CLAassistant commented Sep 30, 2024

CLA assistant check
All committers have signed the CLA.

@WardF WardF self-assigned this Sep 30, 2024
@WardF WardF added this to the 4.9.3 milestone Sep 30, 2024
@WardF
Copy link
Member

WardF commented Sep 30, 2024

Thanks @DennisHeimbigner I'll take a look at it tomorrow, along with playing catchup on other PR's which cropped up last week while I was at the retreat & committee meetings.

@WardF
Copy link
Member

WardF commented Oct 15, 2024

@DennisHeimbigner , did you want me to take a look at merging the changes in re: a custom prefix? I'm happy to, I just don't want to step on your toes/duplicate effort if you were already doing so.

@DennisHeimbigner
Copy link
Collaborator Author

I am in the process of fixing things, so hang on a bit.

DennisHeimbigner and others added 5 commits October 18, 2024 19:23
…earch path

Replaces PR Unidata#3024
         and PR Unidata#3033

re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following API as being the simplest possible. It does have the disadvantage that it requires use of a global lock (not implemented) if used in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
````
## Internal Architecture

It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the *nc_plugin_path_set* function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher.

There is a complication with respect to the *nc_plugin_path_get* function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

## Test Changes
* New tests<br>
    a. unit_test/run_pluginpaths.sh -- was created to test this new capability.<br>
    b. A new test utility has been added as *unit_test/run_dfaltpluginpath.sh* to test the default plugin path list.
* New test support utilities<br>
    a. unit_test/ncpluginpath.c -- report current state of the plugin path<br>
    b. unit_test/tst_pluginpaths.c -- test program to support run_pluginpaths.sh

## Documentation
* A new file -- docs/pluginpath.md -- provides documentation of the new API. It includes some
  material taken fro filters.md.

## Other Major Changes
1. Cleanup the whole plugin path decision tree. This is described in the *docs/pluginpath.md* document and summarized in Addendum 2 below.
2. I noticed that the ncdump/testpathcvt.sh had been disabled, so fixed and re-enabled it. This necessitated some significant changes to dpathmgr.c.

## Misc. Changes
1. Add some path manipulation utilities to netcf_aux.h
2. Fix some minor bugs in netcdf_json.h
3. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
4. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
5. Fix some size_t <-> int conversion warnings.
6. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
7. Cleanup some minor CMakeLists.txt problems.
8. Provide an implementation of echo -n since it appears to not be
   available on all platforms.
9. Add a property list mechanism to pass environmental information to filters.
10. Cleanup Doxyfile.in
11. Fixed a memory leak in libdap2; surprised that I did not find this earlier.

## Addendum 1: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
````
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

* ````int nc_plugin_path_ndirs(size_t* ndirsp);````
    Arguments: *ndirsp* -- store the number of directories in this memory.

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

* ````int nc_plugin_path_get(NCPluginList* dirs);````
    Arguments:  *dirs* -- counted vector for storing the sequence of directies in the internal path list.

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.  If the value of *dirs.dirs* is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories.

* ````int nc_plugin_path_set(const NCPluginList* dirs);````
    Arguments: *dirs* -- counted vector for providing the new sequence of directories in the internal path list.

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths.

## Addendum 2: Build-Time and Run-Time Constants.

### Build-Time Constants
<table style="border:2px solid black;border-collapse:collapse">
<tr style="outline: thin solid;" align="center"><td colspan="4">Table showing the build-time computation of NETCDF_PLUGIN_INSTALL_DIR and NETCDF_PLUGIN_SEARCH_PATH.</td>
<tr style="outline: thin solid" ><th>--with-plugin-dir<th>--prefix<th>NETCDF_PLUGIN_INSTALL_DIR<th>NETCDF_PLUGIN_SEARCH_PATH
<tr style="outline: thin solid" ><td>undefined<td>undefined<td>undefined<td>PLATFORMDEFALT
<tr style="outline: thin solid" ><td>undefined<td>&lt;abspath-prefix&gt;<td>&lt;abspath-prefix&gt;/hdf5/lib/plugin<td>&lt;abspath-prefix&gt;/hdf5/lib/plugin&lt;SEP&gt;PLATFORMDEFALT
<tr style="outline: thin solid" ><td>&lt;abspath-plugins&gt;<td>N.A.<td>&lt;abspath-plugins&gt;<td>&lt;abspath-plugins&gt;&lt;SEP&gt;PLATFORMDEFALT
</table>

<table style="border:2px solid black;border-collapse:collapse">
<tr style="outline: thin solid" align="center"><td colspan="2">Table showing the computation of the initial global plugin path</td>
<tr style="outline: thin solid"><th>HDF5_PLUGIN_PATH<th>Initial global plugin path
<tr style="outline: thin solid"><td>undefined<td>NETCDF_PLUGIN_SEARCH_PATH
<tr style="outline: thin solid"><td>&lt;path1;...pathn&gt;<td>&lt;path1;...pathn&gt;
</table>
@DennisHeimbigner
Copy link
Collaborator Author

DennisHeimbigner commented Oct 19, 2024

Ok, I fixed things to include the use of CMAKE_INSTALL_PREFIX that you suggested.
I also fixed up configure.ac to do the same thing for automake.
I turned out to be a more extensive change that I originally thought
because there was internal plugin code that also needed to be fixed
to properly hand this case.
I also added a test case for this.

@WardF
Copy link
Member

WardF commented Oct 22, 2024

Thanks Dennis, reviewing!

@WardF
Copy link
Member

WardF commented Nov 11, 2024

@DennisHeimbigner is it correct behavior that we don't want to interpret any default HDF5_PLUGIN_PATH/DIR environmental variable, now? Just confirming, before merging this.

@DennisHeimbigner
Copy link
Collaborator Author

I don't understand the question. Can you elaborate?

@WardF
Copy link
Member

WardF commented Nov 11, 2024

Before this PR, CMakeLists.txt (for example) would check for HDF5_PLUGIN_PATH environmental variable, and if it is defined, would use it in the plugin directory logic. With your PR, the environmental variable HDF5_PLUGIN_PATH is no longer checked for or used when computing anything. Is that by design?

@DennisHeimbigner
Copy link
Collaborator Author

For better or worse, changed the rules.
Using HDF5_PLUGIN_PATH at build-time seems incorrect.
So, at build-time I use the install directory (as per your request)
plus the HDF5 defaults
to create the initial plugin path.
At run-time, it makes sense to use HDF5_PLUGIN_PATH.
So if it defined at run-time, then it completely replaces the
build-time path.
Then finally, any programmatic changes (using the API defined in this PR)
are applied to get the final set of directories to search for plugins.
If any of this seems incorrect to you, let me know.

@DennisHeimbigner
Copy link
Collaborator Author

BTW at build-time, there is an option --with-plugin-path that can be used instead of HDF5_PLUGIN_PATH

@edwardhartnett
Copy link
Contributor

I believe that @DennisHeimbigner is correct. HDF5_PLUGIN_PATH should be used at runtime, but not at build time.

@WardF
Copy link
Member

WardF commented Nov 12, 2024

Sounds good, thanks! Given the size of this PR, I just wanted to make sure something hadn't been overlooked.

@@ -3,7 +3,7 @@
# 2015, 2016, 2017, 2018
# University Corporation for Atmospheric Research/Unidata.

# See netcdf-c/COPYRIGHT file for more info.
# See netcdffff-c/COPYRIGHT file for more info.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this typo, so far it's the only thing I've found that obviously needs to be fixed :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, are you still running AWS-based tests on your end? The are not currently part of our CI, and I will need to correct that.

Copy link
Member

@WardF WardF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, fixing the typo and then will check a basic AWS test.

Copy link
Member

@WardF WardF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dennis, this looks great! The update to the documentation is particularly appreciated. Thanks!

@WardF WardF merged commit fb75ad6 into Unidata:main Nov 13, 2024
107 checks passed
@WardF
Copy link
Member

WardF commented Nov 13, 2024

Well, now that I've merged this and have moved on to some other things, I now have to wonder what the impact of this on netcdf-fortran will be. I'll test that tomorrow, but I suspect we will need to make some changes there as well, if we want Fortran to continue to use plugins. Currently, at configure time, netCDF-Fortran looks for HDF5_PLUGIN_PATH environmental variable. Ideas welcome @DennisHeimbigner @edwardhartnett

@DennisHeimbigner
Copy link
Collaborator Author

I always forget about fortran and C++.
One simple solution is to add the option --with-plugin-dir=$HDF5_PLUGIN_PATH (automake)
or -DNETCDF_PLUGIN_DIR=$HDF5_PLUGIN_PATH (cmake) to the build process.

@WardF
Copy link
Member

WardF commented Nov 14, 2024

Ideally Dennis, we can add the interface to netCDF-Fortran to invoke these pathing calls you added to the C library. In the meantime, as I prep the next release, we'll have to rely on setting HDF5_PLUGIN_PATH as we've been doing in Fortran. But that may not actually be the correct thing at this point, we'll need to sort that out.

@WardF
Copy link
Member

WardF commented Nov 14, 2024

I just noticed that nc-config also isn't working quite right, I'm going to put up a PR here after I untangle this a bit and then we can figure out netcdf-fortran.

WardF added a commit to WardF/netcdf-c that referenced this pull request Nov 14, 2024
@DennisHeimbigner
Copy link
Collaborator Author

ok, we can talk about it at the next netcdf meeting (or some other time if you prefer).

WardF added a commit that referenced this pull request Nov 14, 2024
Update nc-config in support of changes made in #3034
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

set HDF5_PLUGIN_PATH programmatically
4 participants