Adding dates to filelist

Discussion and questions about clients
Toast

Re: Adding dates to filelist

Post by Toast » 18 Dec 2010, 10:45

so i see that everyone is concerned that dates is gonna take up too much, but the thing i thought of might just fix that issue if we make it as an extension the client checks if the user allows dates if not dates aren't sent a little bit of c-c communication before filelist is sent should solve it.

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Adding dates to filelist

Post by darkKlor » 18 Dec 2010, 13:07

pR0Ps wrote:$Get MyList.DcLst$1|
Careful posting that trash in here :P it's >>ADC<<Portal :D

@Toast: an advertised extension is okay in terms of keeping the file size down for the clients who don't support it, but consider the downside... clients that DO support it must either keep two copies of the file list at all times, or generate one appropriate to the requesting client. The latter approach could amplify the impact of CTM attacks.

The concerns about one field today, another ten tomorrow are certainly valid.. I recall they've come up in discussion of using different hash algorithms in the past too (coincidentally, I notice the final round candidates for SHA-3 came out a week ago). I don't have an answer for what the correct decision is there, but I do think that clients which choose to support features that add to the file list (and probably any other feature that impacts the user experience) should have a simple interface to be disabled by the user.

Concerns about size and general load issues may be an area where ADC can be innovative. Microsoft's Group Policy features have long been a key reason (along with backcompat) for Internet Explorer remaining dominant in corporate networks. Designing the protocol, and a suitable extension, such that a hub can tweak individual client features (some even overriding user preference, though we must be careful here - some kind of 'let the hub tell me what to do' switch could be needed) would potentially be a very desirable addition in some network environments.

Back to the filelist specifically, there must be a general tradeoff between features and performance. In a high-bandwidth environment, performance may be optimised by the client progressively caching peer filelists, and then using DLTA for updates (with the peers potentially caching the last X deltas), such that when the user goes to search for a file, the request may be executed locally. The caching approach would potentially be untenable in a hub with a high number of users who share things. Thus, high-bandwidth may allow a high feature set, given the lower resource limitations of the environment. However, in a low-bandwidth environment you're likely to find the BASE feature set more desirable. In the cases of both the low-bandwidth hub, and the high-bandwidth + high-user hub, efficient and effective search becomes critical for the user.. and hopefully BLOM takes care of that.

In regards to dates being faked... well, yeah. I've noticed a lot of lovely people like to name files to make people think they're something they're not... I think that might be a little more common than changing the dates. The combination is more of a nuisance, but it is at least restricted to the file list, rather than being broadcast by an announcer, in this proposal. I think it comes down to hub owners and their operators managing the citizenship of their hub, rather than the protocol or client trying to defend against every social engineering attack.

Another little note on bandwidth... try put a number of users on an 802.11g network and download a large file list at 60kb/s (all too common) and then see how you feel about bandwidth today vs 5-10 years ago. The wireless LAN was a necessarily solution at one point for us due to the wired network setup, which we had no control over. The creation of ad-hoc wireless networks would allow a DC hub resiliency in local environments against threats from network owners, but it would also drag down the capability list quite quickly.

Toast

Re: Adding dates to filelist

Post by Toast » 18 Dec 2010, 13:23

Seems that the whole filelist part is kinda flawed not really extendable at all without causing too much problems thus dooming clients unless someone has some bright ideas on how to rework it so that it can become extendable without being world war 3 between developers :)

Crise
Senior Member
Posts: 139
Joined: 10 Nov 2007, 21:34

Re: Adding dates to filelist

Post by Crise » 18 Dec 2010, 18:41

Big Muscle wrote:I didn't say that I am concretely against timestamps. I talked in general, because now we add timestamps, next time it will bitrate
I highly doubt that, because bitrate unlike a timestamp is specific to a certain type of file(s) and (A)DC does not favour any specific file types. That said extensions can add anything to filelists, as I noted in my previous post, so this is not really an issue where consensus is needed. What I am implying is that it is more of a question whether DC++ will do it or not, because nothing will stop other apps from doing it.
Big Muscle wrote:but remember that BZIP is only ADC extensions and not all clients must support it.
BZIP is an ADC extension yes, however, does dcpp support uncompressed filelists? (and more importantly does the protocol require this?) I don't have the time to go checking up on this now, just thought I'd bring it up incase it is relevant.
Big Muscle wrote:Also not all users have fast connections - maybe they have in US or in Sweden, but there are tens of other countries. And it will be a pain in ass for such clients and countries.
This depends entirely on how you define fast connection... for me fast connection is anything non-dialup. (ie. generally connection types where the time a file transfer takes is not relevant in terms of costs). As for those unlucky that pay for outside bandwidth (either WAN, or outside country borders), or based on actual data transferred then partial filelists are for you.

Toast's point about named extension is valid (as are the drawbacks mentioned), as is the point concerning trade-offs with additions which applies not only to filelists but any protocol additions in fact.

About the whole filelist format and its (lack of) flexibility... it is flexible and extendable on paper.
More information may be added to the file by extensions, but is not guaranteed to be interpreted by other clients.
That said the xml filelist format as well as the compression extension dcpp uses significantly predate ADC 1.0 (afaik) and thus probably have not been paid much attention lately as the other parts of the protocol have been adjusted and actually extensions have been created.

Plus there are no existing filelist based extensions as far as I am aware and this is an area that needs to be pioneered sooner or later anyways so might as well do it now.

Big Muscle
Junior Member
Posts: 39
Joined: 01 Jul 2008, 19:27

Re: Adding dates to filelist

Post by Big Muscle » 19 Dec 2010, 11:18

Crise wrote:I highly doubt that, because bitrate unlike a timestamp is specific to a certain type of file(s) and (A)DC does not favour any specific file types. That said extensions can add anything to filelists, as I noted in my previous post, so this is not really an issue where consensus is needed. What I am implying is that it is more of a question whether DC++ will do it or not, because nothing will stop other apps from doing it.
I don't understand this much. How is adding more information into filelist connected with ADC favour to file types?
Crise wrote: BZIP is an ADC extension yes, however, does dcpp support uncompressed filelists? (and more importantly does the protocol require this?) I don't have the time to go checking up on this now, just thought I'd bring it up incase it is relevant.
Of course, DC++ supports uncompressed filelists and protocol requires it. How would it work when someone doesn't implement BZIP extension?
Crise wrote: This depends entirely on how you define fast connection... for me fast connection is anything non-dialup. (ie. generally connection types where the time a file transfer takes is not relevant in terms of costs). As for those unlucky that pay for outside bandwidth (either WAN, or outside country borders), or based on actual data transferred then partial filelists are for you.
Fast connection is whatever where you doesn't have to wait for ages to download filelist. I have 20MBps, it's ok. But yesterday, I took filelist from random user. It had compressed 10 MB and it contained over 280 000 files. Most files were MP3s. His speed was about 10 kB/s and I was downloading it for 17 minutes. It's not acceptable, I wanted to see what that user shares and not to fall asleep. What will happen when such filelist will contain timestamps? ... Also, iceman talked me about some 80 MB filelist. Yes, it is extreme and rare but we much consider such cases too.

I would also think how timestamp is important. Yes, it allows sorting files in filelist, but how many users really need to sort files in filelist? Most of users just select files and click download. So is it worth to implement it when it has such negative impact on users?

Crise
Senior Member
Posts: 139
Joined: 10 Nov 2007, 21:34

Re: Adding dates to filelist

Post by Crise » 19 Dec 2010, 20:54

Big Muscle wrote: I don't understand this much. How is adding more information into filelist connected with ADC favour to file types?
I was simply saying that things such as bitrate are not as likely to be added (or proposed to be added by large number of people) because DC does not favour any specific filetypes (ie. my comment here was specific to your mentioned example of bitrates not generally).
Big Muscle wrote: Of course, DC++ supports uncompressed filelists and protocol requires it. How would it work when someone doesn't implement BZIP extension?
So it does, that's good... because it wouldn't be surprising if it didn't considering that every client that supports ADC 1.0 does implement BZIP as far as I know, (besides it wouldn't be surprising if I had missed this because transferring of uncompressed filelist just doesn't happen right now, and it looks like dcpp in fact uncompresses the compressed filelist on the fly before sending if someone indeed does happen to request an uncompressed filelist).
Big Muscle wrote: Fast connection is whatever where you doesn't have to wait for ages to download filelist.
That is about as ambiguous as just saying fast connection, how long is ages is quite relative... and it also has to be relative to the file size in my opinion. Also don't forget filelists are still mostly regular files with few exceptions to rules here and there but that's it.

Regarding the examples given here, yes filelist downloads should ideally be "fast" but if you really just want to go and browse users filelist, in case they happen to have something that interests you, then you should have picked partial filelist to begin with if possible (I always make my pick based on users total share and what he has set as his connection type).

Regarding speed of filelist transfers in general, if we would really be trying our best to guarantee fast transfer times for them should we not start by looking at prioritising upload bandwidth first rather than using it as an excuse to limit the data in a filelist to bare minimum (ie. even if my filelist is 1mb compressed but my upstream is currently used up by other files, the resulting transfer time could be just as bad as in your examples).

Point: why limit expressive power of filelists in an attempt to guarantee something that can never be truly guaranteed. I am not saying we should add new data to the filelist recklessly but at the same time I dislike that impact on size and (thus) transfer speeds are the first counter arguments when there should be better ways than just plain size control to keep the transfer times reasonable.

Also, you mentioned GFI command in your first reply as nice as that would be I am sure you do realise that if we want to collect data on multiple files on users filelist through it then it will quickly become undesirable method not to mention that GFI simply asks for RES ie, to effectively use GFI we would need to extend RES or both of them, through an extension, in order to add more info about files.
Big Muscle wrote: I would also think how timestamp is important. Yes, it allows sorting files in filelist, but how many users really need to sort files in filelist? Most of users just select files and click download. So is it worth to implement it when it has such negative impact on users?
I am not willing to evaluate negative impact without actual data based on real life implementation, speculation is good to start from but it should not end there unless the result is singlehandedly clear.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Adding dates to filelist

Post by Pretorian » 20 Dec 2010, 17:31

As cologic stated, the sample file list in the specification increased by 10 % with BZIP. Witout BZIP, the file increased by 20-25% (different depending on the size of the actual attribute name). This is using the sample file list; I have no data on actual lists. Let us assume that without BZIP, the file list increases by a factor of Y and with BZIP a factor of Z. The values will most likely be constant. Actually, this would be the same case as if you would add "author" or anything equivalent data that is for each file and directory.

Now, since this is something that can be generalized, you have two options; 1) create one file list with everything you can supply or 2) create multiple lists with what you can supply. 2) does not mean you need to have one list for each extension, but it does mean you need to be prepared to supply at least two.

With option 1), you will always supply one list and your concern is bandwidth.

With option 2), you will have to have some signalling (SUP/SU field etc) and the client will have to generate a list depending on which extension you support. Say, if you support timestamp and author, but the other party only support timestamp, you will only send the list with timestamp. You could of course have one list for BASE-only file lists and one list for anything additional to BASE (that is, you send timestamp and author always in this list). The two file list approach would mean the client would still use the same amount of bandwidth for clients that don't implement known extensions. It's possible that you may want to have every type of file list combination that is available. It will make less impact on bandwidth but more so on the user's computer (memory, drive space etc).

It's also possible that the hub could mandate which file list combinations are available...

So, either "send everything to everyone and take the bandwidth hit for those who don't support the extensions" or "send some things to some and take a computation/storage hit for the different extension combinations".

Locked