The file list need to be modified to be able to contain a hash set attribute. Hashes are used similarly as for files with a TTH attribute (if TTH is the chosen algorithm), but named "hashset". The previously negotiated hash method should be assumed. That means that the Directory element may contain a 'hashset' attribute and the File element may contain a 'hashset' attribute. If file list size is a worrysome matter, simply "set" should be used.
There are various ways a hash set can be created from;
- 1. Selected files' names concatenated.
2. Selected files' names and their absolute paths concatenated (in terms of path in the directory tree of the file list).
3. Selected files' hashes concatenated.
4. Selected files' content concatenated.
5. Random generated data.
6. Combination of above items.
Investigation;
- 1. Likely to be unique with a sufficiently high grade, especially in a high amount of files.
2. While more unique than previous item, it may be inaccurate if two users have the two files "file1.txt" and "file2.txt" if the former have the files at "shared/text" and the other user have the files at "downloaded/text".
3. Likely to be more unique than item 1, as these values are based on the files' content.
4. While likely to be as likely as item 3, not viable to (re-)hash that much content.
5. Not likely to be unique unless a good seed is used. The random range need to be sufficiently large to generate a trustworthy hash.
6. Items 1, 3 and 4 are likely to be similar in their uniqueness (or at least only slighy variations). Therefore, it is unlikely that any combination will yield a better unique hash.
It seems items 1, 3 and 4 are similarly unique. Item 4 is likely not to be used due to its ineffciancy. Choosing between item 1 and 3 should yield a better unique base data when selecting item 3. Therefore;
Selected files' hashes shall be concatenated.