Rsdk 9697 support multiple nvs #376

npmenard · 2025-01-08T20:51:10Z

Implements the storage traits for any type we can iterate over, we can then easily pass an array of a vec as a storage to viam server.
With this something like
let storage = vec![ NVSStorage::new("nvs").unwrap(), NVSStorage::new("nvs2").unwrap(), ];
Will allow use to store the robot data over multiple NVS

I didn't want to add an extra type to represent a collection of storage and this way the logic is driven by the underlying storage implementation.

npmenard · 2025-01-08T20:54:31Z

micro-rdk/src/esp32/nvs_storage.rs

 }

 #[derive(Clone)]
 pub struct NVSStorage {
    // esp-idf-svc partition driver ensures that only one handle of a type can be created
    // so inner mutability can be achieves safely with RefCell
    nvs: Rc<RefCell<EspCustomNvs>>,
-    partition_name: CString,


store a copy of the partition rather than the partition name to avoid cloning a CString as we pass the Storage through different par to of the code

micro-rdk/src/common/credentials_storage.rs

npmenard · 2025-01-08T20:59:53Z

micro-rdk/src/common/credentials_storage.rs

    fn reset_ota_metadata(&self) -> Result<(), Self::Error>;
 }

 pub trait StorageDiagnostic {
    fn log_space_diagnostic(&self);
 }

+#[derive(Error, Debug)]
+#[error("empty storage collection")]
+pub struct EmptyStorageCollectionError;


this is a special error to handle the case where a collection is empty

npmenard · 2025-01-08T21:00:27Z

micro-rdk/src/common/credentials_storage.rs

@@ -158,6 +163,22 @@ struct RAMCredentialStorageInner {
 #[derive(Default, Clone)]
 pub struct RAMStorage(Rc<Mutex<RAMCredentialStorageInner>>);

+#[derive(Error, Debug)]
+pub enum RAMStorageError {


add an error type for RAMStorage to make it easier to test

npmenard · 2025-01-08T21:03:27Z

micro-rdk/src/common/credentials_storage.rs

+        self.into_iter().any(OtaMetadataStorage::has_ota_metadata)
+    }
+    fn get_ota_metadata(&self) -> Result<OtaMetadata, Self::Error> {
+        self.into_iter().fold(


the idea here is to use fold to call get_ota_metadata until Ok is returned. If all storage in a collection return an error only the last error is reported.
We don't have an iterator that behave like try_fold but actually short-circuit on the first Ok, so we will iterate through the whole collection even after a get_ota_metadata has succeeded. The function get_ota_metadata will however not be called anymore on storage

npmenard · 2025-01-08T21:04:47Z

micro-rdk/src/common/credentials_storage.rs

+{
+    type Error = Storage::Error;
+    fn has_ota_metadata(&self) -> bool {
+        self.into_iter().any(OtaMetadataStorage::has_ota_metadata)


short-circuit on the first true

npmenard · 2025-01-08T21:06:11Z

micro-rdk/src/common/credentials_storage.rs

+#[cfg(feature = "ota")]
+impl<Iterable, Storage: OtaMetadataStorage> OtaMetadataStorage for Iterable
+where
+    for<'a> &'a Iterable: IntoIterator<Item = &'a Storage>,


will only work on collections that implements impl<'a, T,> IntoIterator for &'a ...

I have one small concern here. I don't have a fully worked counterexample, but I wonder a bit about using something like a HashSet as the type backing IntoIterator, because the iteration order isn't deterministic.

npmenard · 2025-01-08T21:07:08Z

micro-rdk/src/esp32/nvs_storage.rs

@@ -136,7 +137,7 @@ impl StorageDiagnostic for NVSStorage {
    fn log_space_diagnostic(&self) {
        let mut stats: nvs_stats_t = Default::default();
        if let Err(err) =
-            esp!(unsafe { nvs_get_stats(self.partition_name.as_ptr(), &mut stats as *mut _) })
+            esp!(unsafe { nvs_get_stats(self.part.handle() as _, &mut stats as *mut _) })


handle is the partition name

npmenard · 2025-01-08T21:24:59Z

micro-rdk/src/common/credentials_storage.rs

+    fn reset_ota_metadata(&self) -> Result<(), Self::Error> {
+        self.into_iter().fold(
+            Err::<_, Self::Error>(EmptyStorageCollectionError.into()),
+            |val, s| val.or(s.reset_ota_metadata()),


taking advantage of the eagerness of or might not be the best approach

Can you elaborate a bit? I think I know what you are saying, but I'd rather not guess.

As opposed to or_else or will always evaluate a function in argument if there is one (because it doesn't actually accept a closure) before considering replacing val hence the eagerness. We use it our advantage to make sure all copies of an object are erased through all storage. In theory we shouldn't need this since we use or_else elsewhere so only on copy should be in the storage array at anytime.

acmorrow

This is neat, LGTM, a few small comments. I'd like to think about it a bit tomorrow before it merges so I can make sure I totally understand how the various folds are working and think about whether there are any tricky edge cases.

micro-rdk/src/common/credentials_storage.rs

acmorrow · 2025-01-08T22:18:26Z

micro-rdk/src/common/credentials_storage.rs

        let mut inner_ref = self.0.lock().unwrap();
        let _ = inner_ref.robot_creds.insert(creds);
        Ok(())
    }
    fn get_robot_credentials(&self) -> Result<RobotCredentials, Self::Error> {
        let inner_ref = self.0.lock().unwrap();
-        let cfg = inner_ref.robot_creds.clone().unwrap_or_default().clone();
-        Ok(cfg)
+        log::info!("get robot creds");


Stray log, or debug!?

As an aside, I do miss some of the logging we used to have pre #316 that gave a little more context about progression through the startup sequence, whether we were using cached credentials or configs, etc. I'm usually for "silence is golden" but in this case that's very useful information to understand how things are coming up, especially in offline mode. Would you be OK with a ticket to re-introduce some of that logging? Or did you not propagate into the refactored world because you found it too chatty?

stray log i will remove it.
Yes let's make a ticket for some logging housekeeping

Done, in https://viam.atlassian.net/browse/RSDK-9715. Feel free to add more things you'd like to see added to a logging pass.

acmorrow · 2025-01-08T22:19:10Z

micro-rdk/src/common/credentials_storage.rs

    }
    fn reset_robot_credentials(&self) -> Result<(), Self::Error> {
        let mut inner_ref = self.0.lock().unwrap();
        let _ = inner_ref.robot_creds.take();
+        log::info!("reset robot creds");


acmorrow · 2025-01-08T22:26:20Z

micro-rdk/src/common/credentials_storage.rs

+        assert!(!v.has_robot_credentials());
+        assert!(ram2
+            .store_robot_credentials(&CloudConfig {
+                app_address: "http://downloadramstorage.org".to_owned(),


maybe .example.org

yes that was for the joke :P

acmorrow · 2025-01-08T22:29:41Z

micro-rdk/src/common/credentials_storage.rs

+    fn reset_ota_metadata(&self) -> Result<(), Self::Error> {
+        self.into_iter().fold(
+            Err::<_, Self::Error>(EmptyStorageCollectionError.into()),
+            |val, s| val.or(s.reset_ota_metadata()),


Can you elaborate a bit? I think I know what you are saying, but I'd rather not guess.

acmorrow · 2025-01-08T22:33:47Z

How do you envision configuring the space of available NVS, for, say, callers of viam_server_start?

npmenard · 2025-01-08T23:33:48Z

How do you envision configuring the space of available NVS, for, say, callers of viam_server_start?

What do you mean?

acmorrow · 2025-01-09T21:44:25Z

How do you envision configuring the space of available NVS, for, say, callers of viam_server_start?

What do you mean?

As discussed in-person, I was interested in how you saw the ability to configure the selection of available partitions expressed in the FFI.

acmorrow

LGTM, though I'm interested in your thoughts about my iteration order worry.

unable to write all keys

…tiple-nvs

acmorrow · 2025-01-13T15:42:45Z

micro-rdk/src/esp32/nvs_storage.rs

-        self.set_string(NVS_ROBOT_ID_KEY, &cfg.id)?;
-        self.set_string(NVS_ROBOT_APP_ADDRESS, &cfg.app_address)?;
+        self.set_string(NVS_ROBOT_ID_KEY, &cfg.id).or_else(|err| {
+            let _ = self.reset_robot_credentials();


By using reset_robot_credentials, I think there is now a requirement that it erase keys in the same order that store_robot_credentials writes them, since a failed erase_key ends the method. Maybe it is better to explicitly just unset the keys as acted on so far in the store methods rather than relying on ordering in a different method? Accumulate some guards?

⬇️ - Is it a bug that reset_robot_credentials doesn't clear NVS_ROBOT_APP_ADDRESS?

acmorrow · 2025-01-13T15:44:57Z

micro-rdk-ffi/src/ffi/runtime.rs

@@ -284,7 +315,22 @@ pub unsafe extern "C" fn viam_server_start(ctx: *mut viam_server_context) -> via
        #[cfg(not(target_os = "espidf"))]
        let storage = micro_rdk::common::credentials_storage::RAMStorage::default();
        #[cfg(target_os = "espidf")]
-        let storage = micro_rdk::esp32::nvs_storage::NVSStorage::new("nvs").unwrap();
+        //let storage =  {micro_rdk::esp32::nvs_storage::NVSStorage::new("nvs").unwrap()};


Remove commented line

acmorrow · 2025-01-13T15:45:55Z

micro-rdk-ffi/src/ffi/runtime.rs

@@ -257,7 +288,7 @@ pub unsafe extern "C" fn viam_server_start(ctx: *mut viam_server_context) -> via
        use micro_rdk::common::credentials_storage::RAMStorage;
        use micro_rdk::common::credentials_storage::RobotConfigurationStorage;
        use micro_rdk::proto::provisioning::v1::CloudConfig;
-
+        log::info!("when the FFI was built a robot config was supplied, defaulting to this robot");


Maybe take this out for now but add a TODO(RSDK-9715) which is my init logging ticket so we can do this in coordination with other startup logging changes?

acmorrow · 2025-01-13T15:47:27Z

examples/esp-idf-component/main/main.c

@@ -310,6 +310,18 @@ void app_main(void)
    return;
  }

+  ret = viam_server_add_nvs_storage(viam_ctx, "nvs");
+  if (ret != VIAM_OK) {
+    ESP_LOGE(TAG,"couldn't set add nvs partition, error : %i", ret);


space after , here and below please

acmorrow · 2025-01-13T15:53:12Z

LGTM, though I'm interested in your thoughts about my iteration order worry.

I think the fix you've made here about ensuring that writes of multiple keys are all-or-nothing in a given partition is necessary. But I'm still a little concerned about iteration order determinism over the set of storages. Per my slack message:

The situation I had in mind looked something like this: you have nvs partitions named A and B, but they are referenced via a HashSet, so the iteration order is indeterminate. If the firmware has some sort of unconditional write on startup (e.g. writing the robot config), then if you come up with A as first iterated once, and B first iterated once, then it will end up stored in both. If those values differ, and then you come up in a mode where you need to consume the value (e.g. offline), you may get either value depending on whether you iterate A or B first, and potentially one of them is out of date.

I haven’t totally convinced myself that this is possible or a real problem, but it has me a little hesitant.

Maybe it can easily be fixed by iterating in sorted order, based on partition name or address?

Or by requiring the partitions to be paired with “priorities”, so we can say “prefer the nvs partition, but spill to config if needed”. Then just iterate sorted on prio, and it wouldn’t matter if the container supporting iteration was non-deterministic. Setting prio though wouldn’t be 100% as you could accidentally set two with prio N and then you are back to non-determinism.

npmenard · 2025-01-13T17:58:05Z

LGTM, though I'm interested in your thoughts about my iteration order worry.

I think the fix you've made here about ensuring that writes of multiple keys are all-or-nothing in a given partition is necessary. But I'm still a little concerned about iteration order determinism over the set of storages. Per my slack message:

The situation I had in mind looked something like this: you have nvs partitions named A and B, but they are referenced via a HashSet, so the iteration order is indeterminate. If the firmware has some sort of unconditional write on startup (e.g. writing the robot config), then if you come up with A as first iterated once, and B first iterated once, then it will end up stored in both. If those values differ, and then you come up in a mode where you need to consume the value (e.g. offline), you may get either value depending on whether you iterate A or B first, and potentially one of them is out of date.
I haven’t totally convinced myself that this is possible or a real problem, but it has me a little hesitant.
Maybe it can easily be fixed by iterating in sorted order, based on partition name or address?
Or by requiring the partitions to be paired with “priorities”, so we can say “prefer the nvs partition, but spill to config if needed”. Then just iterate sorted on prio, and it wouldn’t matter if the container supporting iteration was non-deterministic. Setting prio though wouldn’t be 100% as you could accidentally set two with prio N and then you are back to non-determinism.

I think this can be an issue when storing config or certificate only since we don't explicitly erase these objects when storing them during viam server initialization. We implicitly expect the storage driver to overwrite.
Everywhere else we either check if an object is in storage or erase it unilaterally
We can add a ticket to change the initialization behavior to do an erase before storing but that would increase the wear on the flash.

acmorrow

LGTM

npmenard added 2 commits January 8, 2025 15:42

iterate over storage

53637d7

pass object to store as reference

312ff5f

npmenard requested a review from a team as a code owner January 8, 2025 20:51

remove partition_name

75d8b32

npmenard commented Jan 8, 2025

View reviewed changes

remove dead code

fd4d2ed

npmenard commented Jan 8, 2025

View reviewed changes

micro-rdk/src/common/credentials_storage.rs Show resolved Hide resolved

npmenard commented Jan 8, 2025

View reviewed changes

or

7c17538

npmenard commented Jan 8, 2025

View reviewed changes

acmorrow approved these changes Jan 8, 2025

View reviewed changes

remove stray comments

9fc0988

acmorrow approved these changes Jan 9, 2025

View reviewed changes

npmenard added 3 commits January 10, 2025 14:05

When storing object over multiple key, clean up behind us when we are

14acb1a

unable to write all keys

Merge remote-tracking branch 'origin/main' into RSDK-9697-support-mul…

7a87760

…tiple-nvs

add ffi bindings

875ace7

acmorrow requested changes Jan 13, 2025

View reviewed changes

npmenard added 2 commits January 13, 2025 12:52

comments

f096bf2

todo

f17dea5

npmenard requested a review from acmorrow January 13, 2025 18:26

acmorrow approved these changes Jan 13, 2025

View reviewed changes

npmenard merged commit 0cbf5b1 into viamrobotics:main Jan 13, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rsdk 9697 support multiple nvs #376

Rsdk 9697 support multiple nvs #376

npmenard commented Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025

npmenard Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025

npmenard Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025

npmenard Jan 8, 2025 •

edited

Loading

acmorrow Jan 9, 2025

npmenard Jan 8, 2025

npmenard Jan 8, 2025

acmorrow Jan 8, 2025

npmenard Jan 8, 2025 •

edited

Loading

acmorrow left a comment

acmorrow Jan 8, 2025

npmenard Jan 8, 2025

acmorrow Jan 9, 2025

acmorrow Jan 8, 2025

acmorrow Jan 8, 2025

npmenard Jan 8, 2025

acmorrow Jan 8, 2025

acmorrow commented Jan 8, 2025

npmenard commented Jan 8, 2025

acmorrow commented Jan 9, 2025

acmorrow left a comment

acmorrow Jan 13, 2025

acmorrow Jan 13, 2025

acmorrow Jan 13, 2025

acmorrow Jan 13, 2025

acmorrow commented Jan 13, 2025

npmenard commented Jan 13, 2025

acmorrow left a comment

Rsdk 9697 support multiple nvs #376

Rsdk 9697 support multiple nvs #376

Conversation

npmenard commented Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

npmenard Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npmenard Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npmenard Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npmenard Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

acmorrow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acmorrow commented Jan 8, 2025

npmenard commented Jan 8, 2025

acmorrow commented Jan 9, 2025

acmorrow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acmorrow commented Jan 13, 2025

npmenard commented Jan 13, 2025

acmorrow left a comment

Choose a reason for hiding this comment

npmenard commented Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025 •

edited

Loading

npmenard Jan 8, 2025 •

edited

Loading