Skip to content

Logbook 2025 H1

Sasha Bogicevic edited this page Mar 12, 2025 · 22 revisions

March 2025

2025-03-12

SB on Model tests weird behavior

  • Currently I had to sprinkle treadDelay here and there in the model tests since otherwise they hang for a long time and eventually (I think) report the shrinked values that fail the test.

  • This problem is visible mainly in CI where the resources available are not so big, locally the same tests pass.

  • If I remove the threadDelay the memory grows really big and I need to kill the process.

  • This started happening when I had to replace GetUTxO which no longer exists with queryState

  • I looked at this with NS and found out that we were missing to wait for all nodes to see a DecommitFinalized - we were waiting only on our node to see it. This seemed to have fixed the model test and was a bit surprising to be this easy since I expected a lot of problems in finding out what went wrong.

SB on figuring out what happened in our Head

  • The situation is that we are unable to close because of H13 MustNotChangeVersion

  • This happens because the version in the input datum (open datum) does not match with the version in the output (close datum).

  • Local state says I am on version 3 and onchain it seems the situation is the same - 3! But this can't be since the onchain check would pass then. This is how the datum looks https://preview.cexplorer.io/datum/8e4bd7ac38838098fbf23e5702653df2624bcfa4cf0c5236498deeede1fdca78

  • Looking at the state it seems like we try to close, the snapshot version contains correct version (3) but openVersion is still at 2:

  ...
                  "utxoToCommit": null,
                  "utxoToDecommit": null,
                  "version": 3
                },
                "tag": "ConfirmedSnapshot"
              },
              "headId": "50bb0874ae28515a2cff9c074916ffe05500a3b4eddea4178d1bed0b",
              "headParameters": {
                "contestationPeriod": 300,
                "parties": [
...
              "openVersion": 2,
              "tag": "CloseTx"
            },
            "tag": "OnChainEffect"
          }
		  
  • Question is how did we get to this place? It must be that my node didn't observe and emit one CommmitFinalized which is when we do the version update - upon increment observation.

  • There are 24 lines with CommitFinalize message - only go up to version 2 while there are 36 lines with CommitRecorded - it seems like one recorded commit was not finalized for whatever reason.

  • OnIncrementTx shows up 8 times in the logs but in reality it is tied to only two increments so the third one was never observed.

  • OnDepositTx shows up 12 times in the logs but they are related to only two deposits.

  • Could it be that the decommit failed instead?

  • There is one DecommitRecorded and one DecommitFinalized so it seems good.

  • Seems like we have CommitRecorded for:

    • "utxoToCommit":{"4b31dd7db92bde4359868911c1680ea28c0a38287a4e5b9f3c07086eca1ac26a#0"
    • "utxoToCommit":{"4b31dd7db92bde4359868911c1680ea28c0a38287a4e5b9f3c07086eca1ac26a#1"
    • "utxoToCommit":{"22cb19c790cd09391adf2a68541eb00638b8011593b3867206d2a12a97f4bf0d#0"
  • We received CommitFinalized for:

  • "theDeposit":"44fa1bc9b04d2ffee50fd84088517c3f7b530353834e7c678fdd05073881cb40"

    • "theDeposit":"5b93f95068148482a1e27979517e8ab467f85e72551cfc9baaa2086a60e7353a"
  • So one commit was never finalized but it is a bit hard to connect recorded and finalized commits.

  • OnDepositTx was seen for txids: - 44fa1bc9b04d2ffee50fd84088517c3f7b530353834e7c678fdd05073881cb40 - 5b93f95068148482a1e27979517e8ab467f85e72551cfc9baaa2086a60e7353a - 83e7c36a9d4727e00169409f869d0f94737672c7e87850632b9efe1637f8ef8f

  • OnIncrementTx was seen for:

  • Question is what to do with this Head? Can it be closed somehow?

  • We should query the deposit address to see what kind of UTxOs are available there.

2025-03-10

FT on SideLoad-Snapshot

  • Added an endpoint to GET the latest confirmed snapshot, which is needed to construct the side-load request, but it does not include information about the latest seen snapshot. Waiting on pull#1860 to enhance it.

  • In our scenario, the head got stuck on InitialSnapshot. This means that during side-loading, we must act similarly to clear pending transactions (pull#1840).

  • Wonder if the side-loaded snapshot version should be exactly the same as the current one, given that version bumping requires L1 interaction.

  • Also unclear if we should validate utxoToCommit and utxoToDecommit on the provided snapshot to match the last known state.

  • Concerned that a head can become stuck during a Recover or Decommit client input.

  • SideLoadSnapshot is the first ClientInput that contains a headId and must be verified when received by the node.

  • Uncertain whether WaitOnNotApplicableTx for localTxs not present in the side-loaded confirmed snapshot would trigger automatic re-submission.

  • I think this feature should not be added to TUI since it is not part of the core protocol or user journey.

2025-03-07

FT on SideLoad-Snapshot

  • Now that we have a head stuck on the initial snapshot, I want to explore how we can introspect the node state from the client side, as this will be necessary to create the side-load request.

  • Projecting the latest SnapshotConfirmed seems straightforward, but projecting the latest SeenSnapshot introduces code duplication in HeadLogic.aggregate and the ServerOutput projection.

  • These projections currently conflict heavily with pull#1860. For that reason, we are postponing these changes until it is merged.

2025-03-06

FT on SideLoad-Snapshot

  • We need to break down withHydraNode into several pieces to allow starting a node with incorrect ledger-protocol-params in its running configuration.

  • In this e2e scenario, we exercise a three party network where two nodes (node-1 and node-2) are healthy, and one (node-3) is misconfigured. In this setup, node-1 attemtps to submit a NewTx which is accepted by both healthy members but rejected by node-3. Then, when node-3 goes offline and comes back online using healthy pparams, it is expected to stop cooperating and cause the head to become stuck.

  • It seems that after node-3 comes back online, it only sees a PeerConnected message within 20s. Adding a delay for it to catch up does not help. From its logs, we don’t see messages for WaitOnNotApplicableTx, WaitOnSeenSnapshot, or DroppedFromQueue.

  • If node-3 tries to re-submit the same transaction, it is now accepted by node-3 but rejected by node-1 and node-2 due to ValueNotConservedUTxO (because it was already applied). Since node-3 is not the leader, we don’t see any new SnapshotRequested round being signed.

  • Node-1 and node-2 have already signed and observed each other signing for snapshot number 1, while node-3 has not seen anything. This means node-1 and node-2 are waiting for node-3 to sign in order to proceed. Now the head is stuck and won’t make any progress because node-3 has stopped cooperating.

  • New issue raised for head getting stuck issue#1773, which proposes to forcibly sync the snapshots of the hydra-nodes in order to align local ledger states.

  • Updating the sequence diagram for a head getting stuck using latest findings.

  • Now thinking how could we "Allow introspection of the current snapshot in a particular node"; as we want to be able to notice if the head has become stuck. We want to be able to observed who is missing to sign the current snapshot in flight (which is preventing from getting it confirmed).

  • Noticed that in onOpenNetworkReqTx we keep TransactionReceived even if not applicable, resulting in a list with potentially dupplicate elements (in case of resubmission).

  • Given that a head becoming stuck is an L2 issue due to network connectivity, I’m considering whether we could send more information about the local ledger state as part of PeerConnected to trigger auto-sync recovery based on discrepancies. Or perhaps we should broadcast InvalidTx instead?

  • Valid idea to explore after side-load.

2025-03-05

FT on SideLoad-Snapshot

  • Trying to reproduce a head becoming stuck in BehaviorSpec when a node starts with an invalid ledger.

  • Having oneMonth in BehaviorSpec's waitUntilMatch makes debugging harder. Reduced it to (6 * 24 * 3), allowing full output visibility.

  • After Bob reconnects using a valid ledger, we expected him to accept the transaction if re-submitted by him, but he rejects it instead.

  • It's uncertain whether Bob is rejecting the resubmission or something else, so I need to wait until all transactions are dropped from the queue.

  • Found that when Bob is resubmitting, he is in Idle state when he is expected to restart in Initial state.

  • This is interesting, as if a party suffers a disk error and loses persistence, side-loading may allow it to resume up to a certain point in time.

  • The idea is valid, but we should not accept a side-load when in Idle state—only when in Open state.

  • It seems this is the first time we attempt to restart a node in BehaviorSpec. Now checking if this is the right place or if I should design the scenario differently.

  • When trying to restart the node from existing sources, we noticed the need to use the hydrate function. This suggests we should not force reproducing this scenario in BehaviorSpec.

  • NodeSpec does not seem to be the right place either, as we don't have multiple peers connected to each other.

  • Trying to reproduce the scenario at the E2E level, now running on top of an etcd network.

2025-03-04

SB on fixing the persistence bug

  • Continuing where I left off yesterday - to fix a single test that should throw IncorrectAccessException but instead I saw yesterday:
 uncaught exception: IOException of type ResourceBusy
  • When I sprinkle some spy' to see the values of actually thread ids I don't get this exception anymore, just the test fails. So the exception is tightly coupled with how we check for threads in the PersistenceIncremental handle.
  • I tried labeling the threads and using throwTo from MonadFork but the result is the same.
  • Tried using withBinaryFile in both source and append and use conduit to stream from/to file but that didn't help.
  • Tried using bracket with openBinaryFile and then sink/source handle in the callback but the results are the same.
  • What is happening here?

2025-03-03

SB on api server as the event sink

  • There are only two problems left to solve here. First one being the IncorrectAccessException from persistence in the cluster tests. This one I have a plan on how to solve (have a way to register a thread that will append) and the other problem is some cluster tests fail since appropriate message was not observed.

  • One example test is persistence can load with empty commit.

  • I wanted to verify is the messages are coming through since the test fails at waitFor and I see the messages propagated (but I don't see HeadIsOpened twice!)

  • Looking at the messages the Greetings message does not contain correct HeadStatus anymore! There was a projection that made sure to update this feeld in Greetings message but now we shuffled things around and I don't think this projection works any more.

  • I see all messages correct (except headStatus in Greetings) but only propagated once (and we do restart the node in our test).

  • I see api server being spun up twice but second time I don't see message replay for some reason.

  • One funny thing is I see ChainRollback - perhaps something around this is broken?

  • I see one rebase mistake in Monitoring module that I reverted.

  • After some debugging I notice that the history loaded from the conduit is always empty list. This is the cause of our problems here!

  • Still digging around code to try and figure out what is happening. I see HeadOpened saved in persistence file and can't for the life of me figure out why it is not loaded on restart. I tried even passing in the complete intact event source conduit to make sure I am not consuming the conduit in the Server leaving it empty for the WSServer but this is not the problem I am having.

  • I remapped all projections to work with StateChanged instead of ServerOutput since it makes no sense to remap to ServerOutput just for that.

  • Suspecting that mapWhileC is the problem since it would stop each time it can't convert some StateEvent to ServerOutput from disk!

  • This was it - mapWhileC stops when it encounteres Nothing so it was not processing complete list of events! So happy to fix this.

  • Next is to tackle the IncorrectAccessException from persistence. I know why this happens (obviously we try to append from different thread) and sourcing the contents of a persistence file should not be guarded by correct thread id. In fact, we should allow all possible clients to accept (streamed) persistence contents and make sure to only append from one thread and that is the one in which hydra-node process is actually running.

  • I added another field to PersistenceIncremental called registerThread and it's sole purpose is to register a thread in which we run in - so that we are able to append (I also removed the check for thread id from source and moved it to append )

  • Ok, this was not the fix I was looking for. The registerThread is hidden in the persistence handle so if you don't have access to it from the outside how would you register a thread (for example in our tests).

  • I ended up registering a thread id on append if it doesn't exist and do a check if it is there but see one failure:


  test/Hydra/PersistenceSpec.hs:59:5:
  1) Hydra.Persistence.PersistenceIncremental it cannot load from a different thread once having started appending
       uncaught exception: IOException of type ResourceBusy
       /tmp/hydra-persistence-33802a411f862b7a/data: openBinaryFile: resource busy (file is locked)
       (after 1 test)
         []
         [String "WT",Null,Null]

I still need to investigate.

February 2025

2025-02-27

SB on state of things regarding api server memory consumption

  • There is no CommandFailed and ClientEffect

  • We don't have anymore GetUTxO client input therefore I had to call api using GET /snapshot/utxo request to obtain this information (in cluster tests)

  • For the tests that don't spin the api server I used TestHydraClient and it's queryState function to obtain the HeadState which in turn contains the head UTxO.

  • One important thing to note is that I had to add utxoToCommit in the snapshot projection in order to get the expected UTxO. This was a bug we had and nobody noticed.

  • We return Greetings and InvalidInput types from the api server without wrapping them into TimedServerOutput which is a bit annoying since now we need to double parse json values in tests. If the decoding fails for TimedServerOutput we try to parse just the ServerOutput.

Current problems:

  • After adding /?history=yes to hydra-cluster tests api client I started seeing IncorrectAccessException from the persistence. This is weird to me since all we do is read from the persistence event sink.

  • Querying the hydra node state in our Model tests to get the Head UTxO (instead of using GetUTxO client input) hangs sometimes and I don't see why. I suspect this has something to do with threads spawned in the model tests:

This is the diff, it looks benign:


 waitForUTxOToSpend ::
   forall m.
-  (MonadTimer m, MonadDelay m) =>
+  MonadDelay m =>
   UTxO ->
   CardanoSigningKey ->
   Value ->
   TestHydraClient Tx m ->
   m (Either UTxO (TxIn, TxOut CtxUTxO))
-waitForUTxOToSpend utxo key value node = go 100
+waitForUTxOToSpend utxo key value node = do
+  u <- headUTxO node
+  threadDelay 1
+  if u /= mempty
+    then case find matchPayment (UTxO.pairs u) of
+      Nothing -> pure $ Left utxo
+      Just (txIn, txOut) -> pure $ Right (txIn, txOut)
+    else pure $ Left utxo
  where
-  go :: Int -> m (Either UTxO (TxIn, TxOut CtxUTxO))
-  go = \case
-    0 ->
-      pure $ Left utxo
-    n -> do
-      node `send` Input.GetUTxO
-      threadDelay 5
-      timeout 10 (waitForNext node) >>= \case
-        Just (GetUTxOResponse _ u)
-          | u /= mempty ->
-              maybe
-                (go (n - 1))
-                (pure . Right)
-                (find matchPayment (UTxO.pairs u))
-        _ -> go (n - 1)
-
   matchPayment p@(_, txOut) =
     isOwned key p && value == txOutValue txOut

Model tests sometimes succeed but this is not good enough and we don't want anymore flaky tests.

2025-02-26

SN troubleshooting unclean restarts on etcd branch

4) Test.EndToEnd, End-to-end on Cardano devnet, restarting nodes, close of an initial snapshot from re-initialized node is contested
    Process "hydra-node (2)" exited with failure code: 1
    Process stderr: RunServerException {ioException = Network.Socket.bind: resource busy (Address already in use), host = 0.0.0.0, port = 4002}
  • Seems like the hydra-node is not shutting down cleanly and scenarios like this
  • Isolated test scenarios where we simply expect withHydraNode to start/stop and restart within a certain time and not fail
  • Testing these tests on master it worked fine?! Seems to have something to do with etcd?
  • When debugging withHydraNode and trying to port it to typed-process, I noticed that we don't need the withHydraNode' variant really -> merged them
  • Back to the tests.. why are they failing while the hydra-node binary seems to behave just fine interactively?
  • With several threadDelay and prints all over the place I saw that the hydra-node spawns etcd as a sub-process, but when withProcess (any of its variants) results in stopProcess, the etcd child stays alive!
  • Issuing a ctrl+c on ghci has the etcd process log a signal detected and it shut downs
  • We are not sending SIGINT to the etcd process? Tried interruptProcessGroupOf in the Etcd module
  • My handlers (finally or bracket) are not called!? WTF moment
  • Found this issue which mentions that withProcess sends SIGTERM that is not handled by default
    • Some familiar faces on this one
    • This is also an interesting paragraph about how ctrl+c can be delegated to sub-process (not what we needed)
  • So the solution is two-fold:
    • First, we need to make sure to send SIGINT to the etcd process whenever we are asked to shut down too (in the Etcd module)
    • Also, we should initiate a graceful shutdown when the hydra-node receives SIGTERM
      • This is a better approach than making withHydraNode send a SIGINT to hydra-node
      • While that would work too, dealing SIGTERM in hydra-node is more generally useful
      • For example a docker stop sends SIGTERM to the main proces in a container

2025-02-19

SN working on etcd gprc client integration

  • When starting to use grapesy I had a conflict of ouroboros-network needing an older network than grapesy. Made me drop the ouroboros modules first.
  • Turns out we still depend transitively on the ouroboros-network packages (via cardano-api), but cabal resolver errors are even wors.
  • Adding a allower-newer: network still works
  • Is it fine to just use a newer version of network in the ouroboros-network?
  • The commits that bumped the upper bound does not indicate otherwise
  • Explicitly listed all packages in allow-newer and moved on with life

2025-02-17

SN working on etcd network connectivity

  • Working on PeerConnected (or an equivalent) for etcd network.
  • Changing the inbound type to Either Connectivity msg does not work well with the Authentication layer?
  • The composition using components (ADR7: https://hydra.family/head-protocol/adr/7) is quite complicated and only allows for a all-or-nothing interface out of a component without much support for optional parts.
  • In particular, an Etcd component that delivers Either Connectivity msg as inbound messages cannot be composed easily with the Authenticate component that verifes signatures of incoming messages (it would need to understand that this is an Either and only do it for Right msg).
  • Instead, I explore expanding NetworkCallback to not only deliver, but also provide an onConnectivity callback.
  • After designing a more composable onConnectivity handling, I wondered how the Etcd component would be determinig connectivity.
  • The etcdctl command line tool offers a member list which returns a list of members if on a majority cluster, e.g.
{"header":{"cluster<sub>id</sub>":8903038213291328342,"member<sub>id</sub>":1564273230663938083,"raft<sub>term</sub>":2},"members":\[{"ID":1564273230663938083,"name":"127.0.0.1:5001","peerURLs":\["<http://127.0.0.1:5001>"\],"clientURLs":\["<http://127.0.0.1:2379>"\]},{"ID":3728543818779710175,"name":"127.0.0.1:5002","peerURLs":\["<http://127.0.0.1:5002>"\],"clientURLs":\["<http://127.0.0.1:2380>"\]}\]}
  • But when invoked on a minority cluster it returns
  {"level":"warn","ts":"2025-02-17T22:49:48.211708+0100","logger":"etcd-client","caller":"[email protected]/retry<sub>interceptor</sub>.<go:63>","msg":"retrying
  of unary invoker
  failed","target":"etcd-endpoints://0xc000026000/127.0.0.1:2379","attempt":0,"error":"rpc
  error: code = DeadlineExceeded desc = context deadline exceeded"}
  Error: context deadline exceeded
  • When it cannot connect to an etcd instance it returns
  {"level":"warn","ts":"2025-02-17T22:49:32.583103+0100","logger":"etcd-client","caller":"[email protected]/retry<sub>interceptor</sub>.<go:63>","msg":"retrying
  of unary invoker
  failed","target":"etcd-endpoints://0xc0004b81e0/127.0.0.1:2379","attempt":0,"error":"rpc
  error: code = DeadlineExceeded desc = latest balancer error: last
  connection error: connection error: desc = \\transport: Error while
  dialing: dial tcp 127.0.0.1:2379: connect: connection refused\\"}
  Error: context deadline exceeded
  • When implementing pollMembers, suddenly the waitMessages was not blocked anymore?
  • While a litte crude, polling member list works nicely to get a full list of members (if we are connected to the majority cluster).
  • All this will change when we switch to a proper grpc client anyways

2025-02-05

SB on running conduit only once for projections

  • Current problem we want to solve is instead of passing a conduit to mkProjection function and running it inside we would like to stream data to all of the projections we have.

  • Seems like this is easier said than done since we also rely on a projection result which is a Projection handle that is used to update the TVar inside.

  • I thought it might be a good idea to alter mkProjection and make it run in ConduitT so it can receive events and propagate them further and then, in the end return the Projection handle.

  • I made changes to the mkProjection that compile

mkProjection ::
-  (MonadSTM m, MonadUnliftIO m) =>
+  MonadSTM m =>
   model ->
   -- | Projection function
   (model -> event -> model) ->
-  ConduitT () event (ResourceT m) () -> 
-  m (Projection (STM m) event model)
-mkProjection startingModel project eventSource = do
-  tv <- newTVarIO startingModel
-  runConduitRes $
-    eventSource .| mapM_C (lift . atomically . update tv)
-  pure
+  ConduitT event (Projection (STM m) event model) m ()
+mkProjection startingModel project = do
+  tv <- lift $ newTVarIO startingModel
+  meventSource <- await
+  _ <- case meventSource of
+    Nothing -> pure ()
+    Just eventSource ->
+      void $ yield eventSource .| mapM_C (atomically . update tv)
+  yield $
     Projection
       { getLatest = readTVar tv
       , update = update tv

but the main issue is that I can't get the results of all projections we need in the end that easy.

-- does not compile
headStatusP <- runConduitRes $ yield outputsC .| mkProjection Idle projectHeadStatus
  • We need to be able to process streamed data from disk and also output like 5 of these projections that do different things.
  • I discovered sequenceConduits which allows collection of the conduit result values.
  • Idea was to collect all projections which have the capability of receiving events as the conduit input.
[headStatusP] <- runConduit $ sequenceConduits [mkProjection Idle projectHeadStatus] >> sinkList
  • Oh, just realized sequenceConduits need to have exactly the same type so my plan just failed

I think I need to revisit our approach and start from scratch.

January 2025

2025-01-23

SB on state events streaming

  • So what we want to do is to reduce the memory footprint in hydra-node as the final outcome

  • There are couple of ADRs related to persisting stream of events and having different sinks that can read from the streams

  • Our API needs to become one of these event sinks

  • The first step is to prevent history output by default as history can grow pretty large and it is all kept in memory

  • We need to remove ServerOutput type and map all missing fields to StateChange type since that is what we will use to persist the changes to disk

  • I understand that we will keep existing projections but they will work on the StateChange type and each change will be forwarded to any existing sinks as the state changes over time

  • We already have PersistenceIncremental type that appends to disk, can we use similar handle? Most probably yes - but we need to pick the most performant function to write/read to/from disk.

  • Seems like we currently use eventPairFromPersistenceIncremental to setup event stream/sink. What we do is load all events from disk. We also have a TVar holding the event id. Ideally what we would like is to output every new event in our api server. I should take a look at our projections to see how we output individual messages.

  • Ok, yeah, projections are displaying the last message but looking at this code I am realizing how complex everything is. We should strive for simplicity here.

  • Another thought - would it help us to use Servant at least to separate the routing and handlers? I think it could help but otoh Servant can get crazy complex really fast.

  • So after looking at the relevant code and the issue https://github.com/cardano-scaling/hydra/issues/1618 I believe the most complex thing would be this Websocket needs to emit this information on new state changes. but even this is not hard I believe since we have control of what we need to do when setting up event source/sink pair.

SN on streaming events

  • Streaming events using conduit makes us buy into the unliftio and resourcet environment. Does this go well with our MonadThrow et al classes?
  • When using conduits in createHydraNode, the runConduitRes requires a MonadUnliftIO context. We have a IOSim usage of this though and its not clear if there can be a MonadUnliftIO (IOSim s) instance even?
  • We have not only loading [StateEvent] fully into memory, but also [ServerOutput].
  • Made mkProjection to take a conduit, but then we are running it for each (3 times). Should do something with fuseBoth or zip-like conduit combination.

2025-01-22

SN on multi version explorer

  • Started simplifying the hydra-explorer and wanted to get rid of all hydra-node, hydra-tx etc. dependencies because they include most of the cardano ecosystem. However, on the observer api we will need to refer to cardano-specifics like UTxO and some hydra entities like Party or HeadId. So a dependency onto hydra-tx is most likely needed.
  • Shouldn't these hydra specific types be in an actual hydra-api package? The hydra-tx or a future hydra-client could depend on that then.
  • When defining the observer API I was reaching for the OnChainTx data type as it has json instances and enumerates the things we need to observer. However, this would mean we need to depend on hydra-node in the hydra-explorer.
  • Could use the HeadObservation type, but that one is maybe a bit too low level and does not have JSON instances?
  • OnChainTx is really the level of detail we want (instantiated for cardano transactions, but not corrupted by cardano internal specifics)
  • Logging in the main entry point of Hydra.Explorer is depending on hydra-node anyways. We could be exploring something different to get rid of this? Got https://hackage.haskell.org/package/Blammo recommended to me.
  • Got everything to compile (with a cut-off hydra-chain-observer). Now I want to have an end-to-end integration test for hydra-explorer, that does not concern itself with individual observations, but rather that the (latest) hydra-chain-observer can be used with hydra-explorer. That, plus some (golden) testing agains the openapi schemas should be enough test coverage.
  • Modifying hydra and hydra-explorer repositories to integration test new http-based reporting.
    • Doing so offline from a plane is a bit annoying as both nix or cabal would be pulling dependencies from the internet.
    • Working around using an alias to the cabal built binary:
        alias hydra-chain-observer=../../hydra/dist-newstyle/build/x86_64-linux/ghc-9.6.6/hydra-chain-observer-0.19.0/x/hydra-chain-observer/build/hydra-chain-observer/hydra-chain-observer
  • cabal repl is not picking up the alias, maybe need to add it to PATH?
  • Adding a export PATH=<path to binary>:$PATH to .envrc is quite convenient
  • After connecting the two servers via a bounded queue, the test passes but sub-process are not gracefully stopped.

2025-01-21

SB on stake certificate registration

  • I created a relevant issue to track this new feature request to enable stake certificates on L2 ledger.
  • Didn't plan on working on this right away but wanted to explore a problem with PPViewHashesDontMatch when trying to submit a new tx on L2.
  • This happens both when obtaining the protocol-parameters from the hydra-node or if I query them from cardano-node (the latter is expected to fail on L2 since we reduce the fees to zero)
  • I added the line to print the protocol-parameters in our tx printer and it seems like changePParams is not setting the protocol-parameters correctly for whatever reason:
changePParams :: PParams (ShelleyLedgerEra Era) -> TxBodyContent BuildTx -> TxBodyContent BuildTx
changePParams pparams tx =
  tx{txProtocolParams = BuildTxWith $ Just $ LedgerProtocolParameters pparams}
 
  • There is setTxProtocolParams I should probably use instead.
  • No luck, how come this didn't work? I don't see why setting the protocol-parameters like this doesn't work....
  • I even compared the protocol-parameters loaded into the hydra-node and the ones I get back from hitting the hydra-node api and they are the same as expected
  • Running out of ideas

2025-01-20

SB on looking at withdraw zero problem

  • I want to know why I get mismatch between pparams on L2?
  • It is because we start the hydra-node in a separate temp directory from the test driver so I got rid of the problem by querying hydra-node to obtain L2 protocol-parameters
  • The weird issue I get is that the budget is overspent and it seems bumping the ExecutionUnits doesn't help at all.
  • When pretty-printing the L2 tx I noticed that cpu and memory for cert redeemer are both zero so that must be the source of culprit
  • Adding separately cert redeemer fixed the issue but I am now back to PPViewHashesDontMatch.
  • Not sure why this happens since I am doing a query to obtain hydra-node protocol parameters and using those to construct the transaction.
  • Note that even if I don't change protocol-parameters the error is the same
  • This whole chunk of work is to register a script address as a stake certificate and I still need to try to withdraw zero after this is working.
  • One thing I wanted to do is to use the dummy script as the provided Data in the Cert Redeemers - is this even possible?

2025-01-08

SN on aiken pinning & cleanup

  • When trying to align aiken version in our repository with what is generated into plutus.json, I encountered errors in hydra-tx tests even with the same aiken version as claimed.

  • Error: Expected the B constructor but got a different one

  • Seems to originate from plutus-core when it tries to run the builtin unBData on data that is not a B (bytestring)

  • The full error in hydra-tx tests actually includes what it tried to unBData: Caused by: unBData (Constr 0 [ Constr 0 [ List [ Constr 0 [ Constr 0 [ B #7db6c8edf4227f62e1233880981eb1d4d89c14c3c92b63b2e130ede21c128c61 , I 21 ] , Constr 0 [ Constr 0 [ Constr 0 [ B #b0e9c25d9abdfc5867b9c0879b66aa60abbc7722ed56f833a3e2ad94 ] , Constr 1 [] ] , Map [(B #, Map [(B #, I 231)])] , Constr 0 [] , Constr 1 [] ] ] , Constr 0 .... This looks a lot like a script context. Maybe something off with validator arguments?

  • How can I inspect the uplc of an aiken script?

  • It must be the "compile-time" parameter of the initial script, which expects the commit script hash. If we use that unapplied on the transaction, the script context trips the validator code.

  • How was the initialValidatorScript used on master such that these tests / usages pass?

  • Ahh .. someone applied the commit script parameter and stored the resulting script in the plutus.json! Most likely using aiken blueprint apply -v initial and then passing the aiken blueprint hash -v commit into that.

  • Realized that the plutus.json blueprint would have said that a script has parameters.

Clone this wiki locally