Page 1 of 1

Clients doing a blom type get

Posted: 20 Feb 2013, 03:12
by klondike
Hi here are some thoughts out of the back of my mind.

In order to prevent the latencies caused by the sending of BLOM filters the hub could send a CTM request to the client to get it to open a second connection and then get the filter in the background. I really couldn't see the standard saying clients can request BLOM filters and this can also be useful for a second thing.

Currently some clients download a user's Filelist in order to match their queues against them and see if they share anything we are interested on having, the heuristics vary by client but I think it will be more eficient to instead download the BLOM and match the queue and then depending on the filter send the appropriate GFIs thus reducing the total load. What do you think?

Re: Clients doing a blom type get

Posted: 26 Mar 2013, 16:57
by maksis
klondike wrote:In order to prevent the latencies caused by the sending of BLOM filters the hub could send a CTM request to the client to get it to open a second connection and then get the filter in the background. I really couldn't see the standard saying clients can request BLOM filters and this can also be useful for a second thing.
This sounds like a complicated workaround which wouldn't fix the actual problem: the size of a bloom filter may be way too big.

I've spoken about this before, but there was a hub where the average size of a bloom filter was more than 6 MB per user. Imagine that an user like this is in 10 hubs, so you can count the amount of data that needs to be transferred each time when the share is updated or the user adds new files in queue (for partial sharing). I know, this is a rather extreme example as there were less than 40 users sharing more than 1 PB in that hub, but it won't change the problem. This is the same reason why AirDC tries to avoid transferring full file lists (which is a lot more worse than a bloom filter in regards of the size... oh, I could write a whole post about this :P).

One way that might make sense would be that the client sends the full filter only when it joins the hub, and after that it only adds new items (or partial filters) in it. This could be done every time when the user adds something new in queue or when new items are added in share. The client can also compare the new TTH set to an old one when the share has been refreshed and get the difference to send. I haven't looked into the bloom filter at all so I don't know whether it's is possible to make it work like that or not.

Re: Clients doing a blom type get

Posted: 26 Mar 2013, 17:17
by poy
on bloom tables being too big, i have written simple solutions on <http://www.dcbase.org/forums/viewtopic.php?f=18&t=798> that would be worth evaluating before trying too hard (alternate conn / partial update / whathaveyou).
klondike wrote:Currently some clients download a user's Filelist in order to match their queues against them and see if they share anything we are interested on having, the heuristics vary by client but I think it will be more eficient to instead download the BLOM and match the queue and then depending on the filter send the appropriate GFIs thus reducing the total load. What do you think?
sounds interesting... i wonder how many files would have to be matched before the bloom+GFI solution costs more than a regular file list download (keep in mind file lists are usually compressed)?

Re: Clients doing a blom type get

Posted: 26 Mar 2013, 18:23
by klondike
maksis wrote:
klondike wrote:In order to prevent the latencies caused by the sending of BLOM filters the hub could send a CTM request to the client to get it to open a second connection and then get the filter in the background. I really couldn't see the standard saying clients can request BLOM filters and this can also be useful for a second thing.
This sounds like a complicated workaround which wouldn't fix the actual problem: the size of a bloom filter may be way too big.
No, on the lan parties where i set these up the link speed is at least 100 Mbit and the other users aren't experiencing big latencies. Even if that were the case, by establishing a new connection we change from a state in which the client keeps queuing commands since he is sending the filter to a state where the client can send commands which would fairly compete with the filter connection and would thus experience only a bit of latency (and that's supposing the client doesn't set ToS).
maksis wrote:I've spoken about this before, but there was a hub where the average size of a bloom filter was more than 6 MB per user. Imagine that an user like this is in 10 hubs, so you can count the amount of data that needs to be transferred each time when the share is updated or the user adds new files in queue (for partial sharing). I know, this is a rather extreme example as there were less than 40 users sharing more than 1 PB in that hub, but it won't change the problem. This is the same reason why AirDC tries to avoid transferring full file lists (which is a lot more worse than a bloom filter in regards of the size... oh, I could write a whole post about this :P).
Usually we speak of user in a single hub, even if they were in more than one the internet uplink would be the limiter. Also in lan parties sending a whole file list isn't a big problem :P
maksis wrote:One way that might make sense would be that the client sends the full filter only when it joins the hub, and after that it only adds new items (or partial filters) in it. This could be done every time when the user adds something new in queue or when new items are added in share. The client can also compare the new TTH set to an old one when the share has been refreshed and get the difference to send. I haven't looked into the bloom filter at all so I don't know whether it's is possible to make it work like that or not.
Yes and no, if you keep the filter parameters constant then it is possible to just send the differences with the old one (i.e. the places were bits have been set to one). Anyway the hub should still have the power to request a new filter from the client since the new shares may require it to change the parameters to make the filter effective again.
poy wrote:on bloom tables being too big, i have written simple solutions on <http://www.dcbase.org/forums/viewtopic.php?f=18&t=798> that would be worth evaluating before trying too hard (alternate conn / partial update / whathaveyou).
Little you can do if the client is traffic limitting the filter sending since you will either end up with either an useless filter if the client has to many files or with old invalid filterchunks you'd have to renew otherwise. Because yes, I have seen clients taking 15 minutes to send the filter over a gigabit link because either of them was limiting the traffic (the captures hinted it was the client).
poy wrote:
klondike wrote:Currently some clients download a user's Filelist in order to match their queues against them and see if they share anything we are interested on having, the heuristics vary by client but I think it will be more eficient to instead download the BLOM and match the queue and then depending on the filter send the appropriate GFIs thus reducing the total load. What do you think?
sounds interesting... i wonder how many files would have to be matched before the bloom+GFI solution costs more than a regular file list download (keep in mind file lists are usually compressed)?
Depends on the size of the compressed filelist and the m parameter you use. Normally since hashes are incomprensible it should be faster though.

Re: Clients doing a blom type get

Posted: 28 Aug 2013, 21:34
by maksis
poy wrote:on bloom tables being too big, i have written simple solutions on <http://www.dcbase.org/forums/viewtopic.php?f=18&t=798> that would be worth evaluating before trying too hard (alternate conn / partial update / whathaveyou).
If the size of the bloom filter would be 50% of the current one used by ADCH++, the amount of false positive would be increased from 0,4% to ~10%. If the filter size would be 25% of the original one, the amount of false positives is almost 60% already. If 10% can be considered as an acceptable level of false positives, how much would the split filter help?

Some calculations on partial filters: increasing the number of shared files by 10% would mean that the percentage of false positives increases from 0,4% to 0,7%. An increase of 25% would mean 1,3% of false positives, and respectively 50% is 3% and doubling the share means 10% of false positives. How many users do really double their shared files during a single session? Sure it's more likely to happen if the original share is really small (<1000 files?), but neither the filter size should be a problem then. Possibly the hub could have a minimum size for a filter so it could request a bit larger filters for such users... or even a completely new filter when the share grows that fast. I have no idea about the users in LAN hubs.

klondike wrote: Usually we speak of user in a single hub, even if they were in more than one the internet uplink would be the limiter. Also in lan parties sending a whole file list isn't a big problem :P
Out of curiosity: how many users there are in those hubs, what's the internet connection speed (the LAN side and the hub) and how much bandwidth is the hub using?

All ADC hubs meant for sharing I'm staying in are hosted on 10 Mbit connection by minimum (12 hubs ranging from 30 to 1000 users, total of ~5000 users). I also did a really quick analyze on the commands broadcasted in the hubs I'm in with bloom filter disabled from my own client. TTH searches accounted less than 10% of the total amount of the command count (BINF accounted more than 80% of the commands). When calculating from the total bytes received, the percentage of TTH searches was 16,5%. I'm not sure how much difference does it make, but more than 90% of the users in those hubs are using AirDC 2.30 or newer, which should significantly reduce the number of TTH searches (I haven't collected any statistics about this though).