DLTA extension proposal

Ideas for ADC may be presented here for others to review and point out flaws or further improve the idea.
Forum rules
If you have an account on the wiki, remember to update the ADC Proposals page for new ideas.

http://dcbase.org/wiki/ADC_Proposals_list
Locked
darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

DLTA extension proposal

Post by darkKlor » 27 Oct 2009, 05:50

This proposal, Delta (DLTA), is a client-only extension aimed at making indexing of new content more efficient e.g. via release bots. It is advertised by broadcasting DLTA in the SU field of the INF command.

A common problem in hubs is discovery of new content. Two existing methods are:
a) for users to manually add items to a list of releases via a release bot, which then announces the latest x releases to users when they connect to the hub, and
b) for a bot to scan the file list of all connecting users and assess differences with existing known content.

This proposal would add a new file, delta.xml, to the unnamed root of a client, similar to files.xml. It would have the same structure as files.xml; however, it would only include the files which have been added to the share since the last file list refresh. An indexing bot would see the changes via the BINF SSx SFy message, and grab this file off the client. If the client supported BZIP, it would distribute this file as delta.xml.bz2.

Improvements to this idea would include a way to show files that have been removed, enabling a bot to maintain a reference counter and show the total number of sources for a file in the network when all known users are connected. The optimum method for storage and presentation to users would be via a web page, connected to a database that the bot updates.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: DLTA extension proposal

Post by Pretorian » 31 Oct 2009, 23:51

First of all, I would refrain from using "release bots" (or similar) since it can have connotations to pirated material, which we should NOT reference.

On to the actual extension...

1) If a client decide to frequently (or by the user's choice) update its list, we'd get lots and lots of updates to this list, essentially wiping out whatever changes were made a few minutes ago.
2) Is this feature anything else than a mere simplification on the bot's side? (That is, diffing list a and list b means downloading entire file lists over and over.)
3) As you already mentioned, the feature can be a problem when it comes to removal of files etc. However, it may be possible to extend the "delta.xml"'s list with a "removed" attribute. (This could come in handy in other features as well.)
4) It is not requried for clients to send SS or SF in their INF the moment when they update their lists. How would this system handle cases where the client will only send out INF messages every X minute? (Thereby making the bot miss some items.)
5) Could this not be done with using the same list ("files.xml") but with the attribute "New" (or something) for files and directories? "If DLTA is in SU field only send items with attribute 'New' otherwise do as usual".

I can see a diffrent approach to solve this type of problem; use a new message to ask "send me the list of all new files since X" or "send me the list where content X is only used". Of course, a sane default value would be, as per the suggestion, "send me the list of all new files since the last refresh". (I guess this could be additional fields for SCH.)

Locked