Bitcoin
What is the data format layout for txindex LevelDB values?
The key to my understanding is, t
+ 32 byte hash.
But my problem is value. Understand through sources such as: What are the keys used in blockchain levelDB (e.g. what are the key:value pairs)? The value must encode three values: dat file number, block offset, and tx offset within the block.
However, in the first 1,000 items, I’m not sure how to decode those values into three fields because the size of each value varies between 5 and 10. Are those fields simply 3 varint values?
Here is the Plyvel code to print the length using plyvel==1.5.1, Bitcoin Core v26.0.0 on Ubuntu 23.10:
#!/usr/bin/env python3
import struct
import plyvel
def decode_varint(data):
"""
https://github.com/alecalve/python-bitcoin-blockchain-parser/blob/c06f420995b345c9a193c8be6e0916eb70335863/blockchain_parser/utils.py#L41
"""
assert(len(data) > 0)
size = int(data(0))
assert(size <= 255)
if size < 253:
return size, 1
if size == 253:
format_ = '<H'
elif size == 254:
format_ = '<I'
elif size == 255:
format_ = '<Q'
else:
# Should never be reached
assert 0, "unknown format_ for size : %s" % size
size = struct.calcsize(format_)
return struct.unpack(format_, data(1:size+1))(0), size + 1
ldb = plyvel.DB('/home/ciro/snap/bitcoin-core/common/.bitcoin/indexes/txindex/', compression=None)
for key, value in ldb:
if key(0:1) == b't':
txid = bytes(reversed(key(1:))).hex()
print(txid)
print(len(value))
print(value.hex(' '))
file, off = decode_varint(value)
blk_off, off = decode_varint(value(off:))
tx_off, off = decode_varint(value(off:))
print((txid, file, blk_off, tx_off))
print()
But eventually it explodes in the following situations:
ec4de461b0dd1350b7596f95c0d7576aa825214d9af0e8c54de567ab0ce70800
7
42 ff c0 43 8b 94 35
Traceback (most recent call last):
File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 39, in <module>
blk_off, off = decode_varint(value(off:))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ciro/bak/git/bitcoin-strings-with-txids/./tmp.py", line 29, in decode_varint
return struct.unpack(format_, data(1:size+1))(0), size + 1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 8 bytes
So I’m wondering if I guessed the format wrong or if it’s just a bug in my code. Related: