The incorrectness of BLOM implementations
Posted: 11 Oct 2012, 12:51
As I was implementing BLOM for ncdc, I compared my results with the test vectors of Jucy (provided by Quicksilver) and of DC++, as provided by poy. I figured out why DC++ and Jucy give different results for the last test vector, and also figured out that both results are actually incorrect.
The algorithm, according to the spec:
Pseudocode of what Jucy does:
And the same for DC++
Both are logically correct. However, in both implementations, the shift operation is performed with 32 bit arithmetic, whereas k (and thus, i) may be larger than 32. Since the resulting value of the incorrect shift has more than one bit set, + and | give different values for 'index' and thus result in different bloom filters. The fix is to ensure that the shift operation is performed with 64 bit arithmetic, in which case the last test vector gives the following filter:
The practical implication of this bug is that, if a hub requests a bloom filter with h>32, no client will currently respond with a correct filter. Are there any hubs that request a filter with h>32? And if so, do they rely on the buggy behaviour?
The algorithm, according to the spec:
The bold part is where the problem lies.The client constructs the bloom filter by creating a bit array of m bits set to 0 (zero). For each file it then sets to "1" k positions constructed from the file hash. Seeing the file hash as a stream of bits (starting from the lowest bit of the first byte, ending at the highest bit of the last byte), the client should use h bits starting at the first bit of the first byte to create an integer and apply modulo m to get the position in the bit array, then redo the process k times advancing the starting position by h each time.
Pseudocode of what Jucy does:
Code: Select all
index = 0;
for(i=0; i<k; i++)
index += is_bit_in_hash_set(i) << i;
Code: Select all
index = 0;
for(i=0; i<k; i++)
index |= is_bit_in_hash_set(i) << i;
Code: Select all
AAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA