Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON REST request for blocks #27

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

andrewtoth
Copy link
Contributor

This keeps the same behavior as before, except for the second commit which should remove false positives for anchor spends.

For now this just serializes bitcoin::Transactions to pass in to rawtx-rs, but if we like this approach we can update that lib to use the JSON response or just get the stats in this crate instead.

I suspect this would be at least an order of magnitude slower than fetching blocks in binary format, but the extra data we get might make it worth it.

@andrewtoth andrewtoth force-pushed the json-blocks branch 8 times, most recently from dbb89e7 to 84d0814 Compare February 9, 2025 18:44
@0xB10C
Copy link
Owner

0xB10C commented Feb 10, 2025

Thanks for picking this up! I plan to benchmark this and then have a closer look at the code soon.

0xB10C added a commit to 0xB10C/nix that referenced this pull request Feb 12, 2025
@0xB10C 0xB10C mentioned this pull request Feb 12, 2025
@0xB10C
Copy link
Owner

0xB10C commented Feb 12, 2025

Ran this, and it failed with the error message:

Could not collect statistics: REST error: MinReq HTTP GET request error: SerdeJsonError(Error("invalid value: integer `2591798512`, expected i32", line: 1, column: 1354528))

That's probably the version of transaction c659729a7fea5071361c2c1a68551ca2bf77679b27086cc415adeeb03852e369:

{
  "txid": "c659729a7fea5071361c2c1a68551ca2bf77679b27086cc415adeeb03852e369",
  "hash": "c659729a7fea5071361c2c1a68551ca2bf77679b27086cc415adeeb03852e369",
  "version": 2591798512, <----------- "invalid value: integer `2591798512`, expected i32"
  "size": 258,
  "vsize": 258,
  "weight": 1032,
  "locktime": 32767,
  "vin": [
    {
      "txid": "fcdd9029a49414ad4ae294b314e93224220b8b707fdcec711fcff2bbc3e5f5ec",
      "vout": 0,
      "scriptSig": {
        "asm": "3045022100852744642305a99ad74354e9495bf43a1f96ded470c256cd32e129290f1fa191022030c11d294af6a61b3da6ed2c0c296251d21d113cfd71ec11126517034b0dcb70[ALL] 04a0fe6e4a600f859a0932f701d3af8e0ecd4be886d91045f06a5a6b931b95873aea1df61da281ba29cadb560dad4fc047cf47b4f7f2570da4c0b810b3dfa7e500",
        "hex": "483045022100852744642305a99ad74354e9495bf43a1f96ded470c256cd32e129290f1fa191022030c11d294af6a61b3da6ed2c0c296251d21d113cfd71ec11126517034b0dcb70014104a0fe6e4a600f859a0932f701d3af8e0ecd4be886d91045f06a5a6b931b95873aea1df61da281ba29cadb560dad4fc047cf47b4f7f2570da4c0b810b3dfa7e500"
      },
      "prevout": {
        "generated": false,
        "height": 255477,
        "value": 0.01923413,
        "scriptPubKey": {
          "asm": "OP_DUP OP_HASH160 7d5618af4923d24a4edcae088a98da75db83a4cc OP_EQUALVERIFY OP_CHECKSIG",
          "desc": "addr(1CRic4fon11bRk7CwQNfZXy4RhokrH5miy)#s8tph49m",
          "hex": "76a9147d5618af4923d24a4edcae088a98da75db83a4cc88ac",
          "address": "1CRic4fon11bRk7CwQNfZXy4RhokrH5miy",
          "type": "pubkeyhash"
        }
      },
      "sequence": 4294967295
    }
  ],
  "vout": [
    {
      "value": 0.01000000,
      "n": 0,
      "scriptPubKey": {
        "asm": "OP_DUP OP_HASH160 7eeacb8a9265cd68c92806611f704fc55a21e1f5 OP_EQUALVERIFY OP_CHECKSIG",
        "desc": "addr(1Ca5R26xpiQCwjz3aFq1fuCR3fuEe8tmjE)#k6ake6dz",
        "hex": "76a9147eeacb8a9265cd68c92806611f704fc55a21e1f588ac",
        "address": "1Ca5R26xpiQCwjz3aFq1fuCR3fuEe8tmjE",
        "type": "pubkeyhash"
      }
    },
    {
      "value": 0.00913413,
      "n": 1,
      "scriptPubKey": {
        "asm": "OP_DUP OP_HASH160 eb3bd8ccd3ba6f1570f844b59ba3e0a667024a6a OP_EQUALVERIFY OP_CHECKSIG",
        "desc": "addr(1NSoVKD8ciGUUQE5rN4AMbKSg9SEXb34Q3)#dfq48nqq",
        "hex": "76a914eb3bd8ccd3ba6f1570f844b59ba3e0a667024a6a88ac",
        "address": "1NSoVKD8ciGUUQE5rN4AMbKSg9SEXb34Q3",
        "type": "pubkeyhash"
      }
    }
  ],
  "fee": 0.00010000,
  "hex": "f0b47b9a01ecf5e5c3bbf2cf1f71ecdc7f708b0b222432e914b394e24aad1494a42990ddfc000000008b483045022100852744642305a99ad74354e9495bf43a1f96ded470c256cd32e129290f1fa191022030c11d294af6a61b3da6ed2c0c296251d21d113cfd71ec11126517034b0dcb70014104a0fe6e4a600f859a0932f701d3af8e0ecd4be886d91045f06a5a6b931b95873aea1df61da281ba29cadb560dad4fc047cf47b4f7f2570da4c0b810b3dfa7e500ffffffff0240420f00000000001976a9147eeacb8a9265cd68c92806611f704fc55a21e1f588ac05f00d00000000001976a914eb3bd8ccd3ba6f1570f844b59ba3e0a667024a6a88acff7f0000"
},

@@ -6,6 +6,7 @@ edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
bitcoin = { version = "0.32", features = ["serde"] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look if re-exporting bitcoin with serde through rawtx is possible.

To ensure the versions of bitcoin and rawtx-rs::bitcoin are compatible, I've been using the rawtx-rs rexport of bitcoin (i.e. rawtx-rs::bitcoin). If possible, I'd like to keep it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can do that but I would have to update rawtx's bitcoin dep to use the serde feature. I didn't want to update any packages yet for this proof of concept. I think we can actually get rid of rawtx after this PR, since we have all the info from the JSON response so we don't need to serialize the hex.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually get rid of rawtx after this PR, since we have all the info from the JSON response so we don't need to serialize the hex.

Hm I would avoid getting rid of rawtx-rs anytime soon. Next to input and output type detection, rawtx-rs can:

  • count sigops
  • detect multisig
  • extract information about pubkeys and signatures
  • detect op_return flavors
  • detect inscriptions
  • ...

I wouldn't want to re-implement that here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. Hopefully the performance penalty of deserializing the raw hex is not too bad.

All the things you mention can be done with the scripts that are decoded from the JSON response in input.script_sig.script and input.witness. They are also bitcoin::ScriptBuf and bitcoin::Witness types which is what rawtx-rs uses.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Feb 12, 2025

That's probably the version of transaction c659729a7fea5071361c2c1a68551ca2bf77679b27086cc415adeeb03852e369

Interesting. It's using bitcoin:: transaction::Version for that field.

I guess the JSON is returning a u32 instead of a negative number in this case.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Feb 12, 2025

Updated to use u32. Will open an issue in rust-bitcoin.

Edit: rust-bitcoin/rust-bitcoin#4039

0xB10C added a commit to 0xB10C/nix that referenced this pull request Feb 12, 2025
vsize: block.txdata.iter().map(Transaction::vsize).sum::<usize>() as i64,
weight: block.weight().to_wu() as i64,
size: block.size,
stripped_size: block.stripped_size,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on #30, I noticed that this actually fixes a bug in the old stripped_size calculation.

The old one is missing the 80 header bytes and the 1 or 3 bytes for the block transaction count.

@0xB10C
Copy link
Owner

0xB10C commented Feb 13, 2025

I suspect this would be at least an order of magnitude slower than fetching blocks in binary format, but the extra data we get might make it worth it.

My initial run of this is still in progress. Currently around height 584100. While I don't have exact numbers, master usually took only a few hours with the bitcoin data dir on an HDD.

@andrewtoth
Copy link
Contributor Author

I see we are fetching blocks from bitcoind over REST single threaded. While this was faster than processing blocks before, now it is definitely much slower. This is effectively making this single threaded.

I would recommend using async and fetching up to rpcthreads number of blocks concurrently. Thoughts?

@andrewtoth
Copy link
Contributor Author

Added a commit to fetch 16 blocks at a time concurrently from bitcoind. That will utilize bitcoind's rpc thread pool, so should be significantly faster.

@0xB10C
Copy link
Owner

0xB10C commented Feb 14, 2025

I would recommend using async and fetching up to rpcthreads number of blocks concurrently. Thoughts?

I think a possible alternative to adding tokio and other dependencies would be to uses rayon's https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html to spawn 16 block-fetching threads. Slight preference towards this, if practical, over async and adding more dependencies. Happy to compare the speeds. I'd assume async might be a bit faster, but I'm not sure.

@andrewtoth
Copy link
Contributor Author

We can definitely use 16 block fetching threads.
Since they will be running continuously for the duration of the program, I don't think we need a thread pool. We can just spawn all 16 threads in a loop. They will share an atomic height counter and exit when they have fetched to tip.

@andrewtoth
Copy link
Contributor Author

Reverted the commit for fetching concurrently, so you can check it out if you want to compare.
Pushed a commit on top to spawn 15 threads that fetch blocks from bitcoind.

@0xB10C
Copy link
Owner

0xB10C commented Feb 20, 2025

Reverted the commit for fetching concurrently, so you can check it out if you want to compare.
Pushed a commit on top to spawn 15 threads that fetch blocks from bitcoind.

Thanks! Haven't had a chance to take a look yet, but plan to do soonish. I think benchmarking the three options should be interesting.

@0xB10C
Copy link
Owner

0xB10C commented Feb 21, 2025

Did some benchmarking starting at height 700000 https://gist.github.com/0xB10C/34e2771136cce901f319b8ce3c898fa6:

image

As expected, master is faster due to not deserializing JSON, however, I was surprised that 16threads (44d53e5 - Use 16 worker threads to fetch blocks from bitcoind) is significantly faster than async (4f0b5ed - Fetch 16 blocks at a time concurrently from bitcoind).

I noticed that 16threads never finishes - probably a channel isn't closes somewhere. Didn't have a close look at the code yet. Given that it's faster and less changed code and less new dependencies, it might be favorable to use it.

Will continue reviewing soon

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Feb 21, 2025

Are you running bitcoind with rpcthreads=16? That should make it much faster. (unless you are running master with bitcoin/bitcoin#31215).

I would also expect master to run faster with just the last commit cherry-picked as well.

I noticed that 16threads never finishes - probably a channel isn't closes somewhere.

Hmm, I can't see the logic error. I'm unsure how to test this part though.

@0xB10C
Copy link
Owner

0xB10C commented Feb 25, 2025

I did a re-run with --rpcthreads=16 - tiny improvement on the async side, but not too much. Generally, very similar result.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants