Bitcoin

What is the data format layout for txindex LevelDB values?

The key to my understanding is, t + 32 byte hash.

But my problem is value. Understand through sources such as: What are the keys used in blockchain levelDB (e.g. what are the key:value pairs)? The value must encode three values: dat file number, block offset, and tx offset within the block.

However, in the first 1,000 items, I’m not sure how to decode those values ​​into three fields because the size of each value varies between 5 and 10. Are those fields simply 3 varint values?

Here is the Plyvel code to print the length using plyvel==1.5.1, Bitcoin Core v26.0.0 on Ubuntu 23.10:

#!/usr/bin/env python3

import struct

import plyvel

def decode_varint(data):
    """
    https://github.com/alecalve/python-bitcoin-blockchain-parser/blob/c06f420995b345c9a193c8be6e0916eb70335863/blockchain_parser/utils.py#L41
    """
    assert(len(data) > 0)
    size = int(data(0))
    assert(size <= 255)

    if size < 253:
        return size, 1

    if size == 253:
        format_ = '<H'
    elif size == 254:
        format_ = '<I'
    elif size == 255:
        format_ = '<Q'
    else:
        # Should never be reached
        assert 0, "unknown format_ for size : %s" % size

    size = struct.calcsize(format_)
    return struct.unpack(format_, data(1:size+1))(0), size + 1

ldb = plyvel.DB('/home/ciro/snap/bitcoin-core/common/.bitcoin/indexes/txindex/', compression=None)
for key, value in ldb:
    if key(0:1) == b't':
        txid = bytes(reversed(key(1:))).hex()
        print(txid)
        print(len(value))
        print(value.hex(' '))
        file, off = decode_varint(value)
        blk_off, off = decode_varint(value(off:))
        tx_off, off = decode_varint(value(off:))
        print((txid, file, blk_off, tx_off))
        print()

But eventually it explodes in the following situations:

ec4de461b0dd1350b7596f95c0d7576aa825214d9af0e8c54de567ab0ce70800
7
42 ff c0 43 8b 94 35
Traceback (most recent call last):
  File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 39, in <module>
    blk_off, off = decode_varint(value(off:))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 29, in decode_varint
    return struct.unpack(format_, data(1:size+1))(0), size + 1
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 8 bytes

So I’m wondering if I guessed the format wrong or if it’s just a bug in my code. Related:

Related Articles

Back to top button