Segmented downloading : Why the big deal?

Site Announcements
iceman50
Junior Member
Posts: 26
Joined: 10 Jun 2010, 15:10

Segmented downloading : Why the big deal?

Post by iceman50 » 30 Dec 2010, 17:07

Well it has come to my attention (well has been for quite some time) that people dislike segmented downloading for some of the most ridiculous reasons so I thought I would write this up to debunk some of those silly problems (well the 2 main ones I have seen most people complain about)

#1) "Segmented downloading kills hard drives"

O.k so first off if anyone uses any kind of torrenting program (uTorrent, Transmission, etc.) and you are complaining about using segmented downloading in DC++ (or any other DC client) then you really can't expect to be taken serious, if you were to monitor the I/O usage of your torrent app of choice compared to the I/O usage you would see that the torrent app you are using more then likely is using a loooot more I/O transfers then the DC app you're using (obviously like anything it depends on how many files you are downloading in each app) but more then likely the torrent app is abusing your hard drive a lot more. Secondly hard drives today ca. 2010/2011 have MTBF's of over 1,000,000 hours (which comes to around 113 years or so), so yeah kind of a bad argument against Segmented Downloading. Also consider logging for a moment (yes I know you are probably wondering why I would mention logging but just hold on) .. for everyone who logs mainchats and pm's or anything else in your DC app of choice, consider this, every time you receive a message (mainchat or private message) your client writes that to a log file which is written to the hard disk immediately (although if I recall correctly some clients *may* hold the messages in buffer to be written every x amount of minutes which is stupid in of itself but I digress) which would mean a whole lot more writes to the hard disk then say using Segmented Downloads? (I hope you see where i'm getting to with this.)

#2) "Segmented takes more slots and leaves less for other users"

Well, consider this, majority of internet connections are asymmetrical (Download = faster then Upload speed) so in all reality downloading from one user at a time is more likely to take MORE time to download (and thus holding up other users from downloading the file in the end) so if you have an 8 mbit download does it make more sense to download from one person with 1 mbit upload, or 8 users with 1 mbit upload? I don't think anyone would opt for the slower download speed in the end, so what's the complaint? I don't see any major valid reason to not use Segmented Downloading.

Well that's that on this subject, but I do encourage (and greatly appreciate) any feedback or discussion on this topic.

Quicksilver
Member
Posts: 56
Joined: 17 Aug 2009, 21:32

Re: Segmented downloading : Why the big deal?

Post by Quicksilver » 30 Dec 2010, 19:18

I won't comment on 2 ... I just would rather argue.. Does anyone think here bittorrent is slow ... and thats all about downloading from as many sources as possible,


1.
This was already discussed in the Dev hub. I gave a possible explanation that fits and nobody was yet able to debunk.
Idea is that the implementation of DC++ fights against the caching algorithm of the OS.
e.g.
1)- 1 MiB segment is uploaded ..
2) Os thinks more will be read and will cache the next 20 or x MiB ,
3) DC++ closes the file which the OS takes as a hint that it will no longer be needed therefore discards the cached stuff.
4) go back to 1)
in the end The hdd will have to put up with 20/x times more strain than with normal upload.

Solution: 1. Change the implementation of DC++ to not close the file or
2. enlarge the segment size...

MTBF hours are really worth not much to us ... I would go based on experience..
in a 30-40 user hub I hear about 3-4 died hdd per year.
Which ammounts to lets say 5-10 years lifetime under filesharing conditions...
Probably less.. as some might replace hdds earlier...
if 10 times more strain was put on the hdd that should be pretty much noticeable... though I doubt that 10 times more read/writes will lead to to 10 times sooner hardware failure.

Big Muscle
Junior Member
Posts: 39
Joined: 01 Jul 2008, 19:27

Re: Segmented downloading : Why the big deal?

Post by Big Muscle » 30 Dec 2010, 21:25

DC++ always uses FILE_FLAG_SEQUENTIAL_SCAN flag on Windows to tell that file will be processed sequentially. As second option, Windows support FILE_FLAG_RANDOM_ACCESS flag to hint random access. Problem is that no other OS supports it.

Or there could be possibility to disable caching completely with FILE_FLAG_NO_BUFFERING and handle it in our own way (I'm still not sure about other OS than Win)? Or could memory-mapped files helped in this?

Solution not to close file encounters the same problem as finished uploads logging. When the file is really finished, so it can be closed? Or leave file opened for whole session? It's not correct behaviour, because it locks file completely.

Big Muscle
Junior Member
Posts: 39
Joined: 01 Jul 2008, 19:27

Re: Segmented downloading : Why the big deal?

Post by Big Muscle » 30 Dec 2010, 21:38

It reminds me that RevConnect was using memory-mapped files for segmented downloading. Has it any sense?

arnetheduck
Newbie
Posts: 8
Joined: 17 Mar 2009, 13:37

Re: Segmented downloading : Why the big deal?

Post by arnetheduck » 30 Dec 2010, 21:53

QS: Could you please back up your claims that the file is removed from cache when closed? I see no reason why the OS should do this, modulo memory constraints which usually don't happen...for example create a large file and an app that reads, closes then rereads and post your timings...

cologic
Junior Member
Posts: 41
Joined: 21 Jul 2009, 19:34

Re: Segmented downloading : Why the big deal?

Post by cologic » 30 Dec 2010, 22:31

Quicksilver wrote:MTBF hours are really worth not much to us ... I would go based on experience..
in a 30-40 user hub I hear about 3-4 died hdd per year.
Which ammounts to lets say 5-10 years lifetime under filesharing conditions...
Probably less.. as some might replace hdds earlier...
Post hoc ergo propter hoc? Really?

Flow84
Newbie
Posts: 6
Joined: 18 Oct 2008, 11:05

Re: Segmented downloading : Why the big deal?

Post by Flow84 » 30 Dec 2010, 23:06

Quicksilver wrote:Solution: 1. Change the implementation of DC++ to not close the file or
2. enlarge the segment size...
I have tested solution one and a dynamic segment size in FlowLib. (Thanks Hackward for giving me some pointers).
Solution one gave me a huge performance improvement :)

What I do (Dont know if this solution is checked into SVN) is having a global filehandler
When user to user connection starts i set the segment size to 1 Mib.
When i receive stuff to save i call Write in the global file handler.
If the file is not already opened i open the file and adds a object (including file handle and last used timestamp) to a list.
Then i lock the specific section part of the file i want to write to and write that part.
Then i update the last used timestamp.
The global filehandler has a thread trying to close unused files (Not beeing used for X seconds).
I have also a trigger on file completion (Yes, i know when i have all content in this file) and forces a close of file handle when file is completed.

About the segment size, i have a function that is called after every successful segment.
This function calculates if i could download more if the size was bigger (more or less looking at the time it took to download X in size).

FlowLib is using the none sequal write for writing files and should work on all platforms supporing it (It is a part of .Net so it might work in Mono :))

Quicksilver
Member
Posts: 56
Joined: 17 Aug 2009, 21:32

Re: Segmented downloading : Why the big deal?

Post by Quicksilver » 31 Dec 2010, 15:21

@arnetheduck and cologic

This is a hypothesis I put out. I doubt that as long as nobody comes up with a tool that records actual reads on the hdd that we will have much chance here. (or some documentation of caching alg Windows uses)
Its really pure guesswork to provide a plausible hypothesis for the seemingly increased hdd failures. Primary point is: stay open minded that the reports of higher hdd failure rates may be possible and not just imagination by users.

andyhhp
Junior Member
Posts: 30
Joined: 18 Feb 2010, 17:44
Location: England

Re: Segmented downloading : Why the big deal?

Post by andyhhp » 01 Jan 2011, 16:17

Open/Close operations are very expensive in terms of operating system time. They require setup/taredown of tables in the kernel, as well as ensuring that write buffers are flushed (including any journaling). From that point of view, any code which prevents needlessly opening and closing files will see a performance increase, irrispective of any other contributing factors.

(@iceman50 whether an application itself does write buffering, the operating system certainly will do. It is not a stupid idea at all. It follows exactly the same logic as a L1/L2 cache, by combining many small seek/write/seekback into a single seek/writesector/seekback) I have lost a link to the article but when it was introduced to linux, it resulted in 40% less HD activity under 'average' load.

As for hard drive failures, dont fall into the trap of assuming that higher reports of failure implies higher failure. (In the past, before the days of SMART etc, you wouldnt know about hard drive failures until a failure hit a key file, at which point the chances were that your computer wouldnt boot. Then you just blame the computer and get a new one, without identifing the underlying cause.) On the other hand, the consumer market these days is constantly trying to sell products which have been made in a cheeper way/with cheeper materials, which itself can have negative effects with respect to its longevity.

My personal oppinion is that there are far worse things which happen to disks than segmented downloading.

~Andrew

iceman50
Junior Member
Posts: 26
Joined: 10 Jun 2010, 15:10

Re: Segmented downloading : Why the big deal?

Post by iceman50 » 01 Jan 2011, 19:54

@andyhhp : I agree 100% and like cologic said, they [users] will blame it on the first thing they can i.e a hard drive fails while up/downloading with segmented downloading, they default to blaming it on that and don't actually go deeper in to try and find the actual cause (and this going on the assumption a strong one i might add, that it isn't the segmented downloading causing it) of why their hard drive actually failed, and as human nature goes it starts like a wildfire ... one user tells another that segmented downloading is evil and causes drive failure which leads 1 user to tell 10 and so on and so forth, thankfully we are having a lot of quality posts on this subject to show that , maybe, just maybe segmented downloading isn't such a horribly awful thing. =)

Locked