UHFS - UnHashed File Sharing

Ideas for ADC may be presented here for others to review and point out flaws or further improve the idea.
Forum rules
If you have an account on the wiki, remember to update the ADC Proposals page for new ideas.

http://dcbase.org/wiki/ADC_Proposals_list
Locked
klondike
Member
Posts: 73
Joined: 14 Nov 2010, 13:06

UHFS - UnHashed File Sharing

Post by klondike » 03 Nov 2012, 00:56

The UHFS extension is mainly intended for two situations:
  • Lan parties (where the time is limited and thus hashing may delay the acceptance of new clients)
  • Sharing by users with big shares which change from one client to another and as a result may lose their share
The extension is still not written down and the ideas can be read in the wiki: http://www.dcbase.org/wiki/UHFS

The ideas currently being taken into account:
Create a please hash this file command
Send along the query a priority value (currently 0 = I want to download this file because I'm trying to download an unhashed file from your list and 1 = I got a name+size match on that file you haven't hashed but for which I have a TTH, hash it to see if you are an alternate source).
Create a not more requests for this priority RET (with a retry based on time or if desired a push notification), (it is recommended its usage to ensure a client won't request a lot of files to be hashed/vote for those files to be hashed).
Add a hash whilst uploading command which will send the TTH root of the file at the end of the transfer (when the file will be already hashed too), this allows for optimized disk accesses where the data will be hashed as it is sent preventing double accesses. When transferring data in this mode only files with the same path from the same client (same CID) should be attempted to prevent file corruption.

As said this extension is still very open for discussion so please sharpen your tongues and start with it.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: UHFS - UnHashed File Sharing

Post by Pretorian » 12 Nov 2012, 20:21

I guess I'll have to chime in here since it's originally my idea...

As Klondike notes the intention of this extension is to allow situations where a user hasn't hashed their files, but still want to be part of the community as fast as possible. This is of course not only restricted to LANs but it is where I think we'll see the largest benefit.

Basically, people have in the past not upgraded to newer versions of DC++ (or, well, any client that require hashing) because of the hashing requirement. Versions up to a certain point (where you could turn it off) are therefore suggested for LANs by the LAN organizers because it allows the users to get up and share their content immediately. This extension would a) show all files a user has immediately but b) still require hashing to complete the transfer.

The idea is to add a file id to each file in the file list that isn't hashed. Call this "unhashed hash" or whatever you want, as long as it's relatively unique. It can simply be the hash of the file name (and path). It is however important that multiple clients don't share this "unhashed hash" (to avoid clashes).

Each new file id would be added to each file in the file list that isn't hashed yet. A client (user) that wants to download one of these files, sends "GFI UHid"/"GET UHid" indicating to the other client that this file is desired by someone. The uploading client would then increase this file's priority in the hash queue. Klondike is talking about the potential use of a "voting system": each time a file is "voted on" by some client, its queue position changes also. Klondike's suggestion would also entail only allowing a certain number of votes per clients (so as to not abuse) but this is something I think can be left up to the clients. (A sane value would probably be 3 file votes per client.)

When the file is completely hashed, the uploading client could send a "PUSH" command or something similar to denote that the file is now hashed (perhaps a RES?). Altough I don't think this is really necessary; the downloading client would probably retry anyway within X minutes.

It is possible that a queue position flag is sent, similarly how "QP" is done, but this time for the files themselves. (Sort of like, "queue position for the files to be hashed".) Possibly also with the estimated time for hashing...

As a file would get hashed, the "new file id" entry in the file list should be removed, so clients use the proper hash id. However, there'd probably have to be a certain time before the uploading client deletes its "new file id -> hash" list, to not leave clients that have already grabbed the incomplete list.

I want to note that downloading while hashing is Klondike's proposal. I am still unsure of its stability, especially when the full hash (with tree) should be sent. My original proposal was that clients shouldn't be allowed file transfers of these files until they have a proper hash, but Klondike's suggestion means that clients would get the hash during or after the file transfer. The problem with during or after is that there is a possibility that clients go offline before the hash is sent, effectively making the file (and transfer) useless.There is also the suggestion that unhashed files could also be matched against other clients, as alternative sources and that this should be signalled as such in the request for the file hash. I am unsure about this as they wouldn't anyway be able to fully match the file until both sources have hashed their files.

I am aware that clients might need to provide two file lists, but I don't think that's much of a problem... If it is, then they can simply always provide one list. The clients that don't support this extension would simply not be able to do anything with those files (not even adding them to the queue).

Locked