Metadata extension

Ideas for ADC may be presented here for others to review and point out flaws or further improve the idea.
Forum rules
If you have an account on the wiki, remember to update the ADC Proposals page for new ideas.

http://dcbase.org/wiki/ADC_Proposals_list
Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Metadata extension

Post by Pretorian » 19 Jul 2011, 19:48

Being able to supply metadata has long been an item that has been discussed, but none seem to have arisen out of the talks. Here is my suggestion for metadata support in ADC. What follows is also the discussion I and cologic had about it, so this is basically a summary of that discussion.
Metadata allow users to get more information about files and directories; when they were created, a video's length, a document's author and more. This extension intend to add all of that and hopefully be compatible with future metadata that may be interesting.

All metadata should be sent in the parameter 'MD', which contain metadata name and value. Name and value are separated by a '/'. If a name contain a '/', it should be escaped accordingly. The column names presented in Windows Explorer (see <link> [note: I can't find an actual list available online, so if you do find one, post below]) should be the default that implementations send, but clients may extend the names at their own peril. Implementations should be able to handle unknown names but may ignore them. Implementations may decide when metadata should be sent.

Example
====
MDbit%srate/256 MDauthor/john%sdoe MDdate/2011-01-01
====
Discussion
[2011-06-06 23:16:13] <Pretorian> Also; presumably people would want bitrate etc for audio files. Are there some general meta-data that could be added that would apply to documents, audio files, video files etc.?
[2011-06-06 23:17:13] <cologic> The meta-data thing has come up at least by 2005, probably earlier (when there was a DC++ forum at SF, e.g.)
[2011-06-06 23:17:25] <cologic> The lack of such universal metadata was one of the main problems.
[2011-06-06 23:17:27] <Pretorian> Yeah, well, with ADC, it can actually be done.
[2011-06-06 23:17:44] <Pretorian> (Sending meta-data at all.)
[2011-06-06 23:17:52] <cologic> Indeed, that's a new (hah) development.
[2011-06-06 23:19:04] <Pirre> would be intresting in a few cases
[2011-06-06 23:19:05] <Pretorian> One could handle metadata in one way that is effective, but ugly; "MDname/value". Every 'name' become a column in the search frame etc.
[2011-06-06 23:19:26] <Pretorian> So MDbitrate/256 MDauthor/johndoe
[2011-06-06 23:19:32] <cologic> And dynamically create search frame columns?
[2011-06-06 23:19:34] <Pretorian> Right
[2011-06-06 23:19:53] <cologic> How does that handle different RES's with different MD names?
[2011-06-06 23:19:54] <Pretorian> No need to specify a huge list of things to support.
[2011-06-06 23:20:05] <Pretorian> It wouldn't.
[2011-06-06 23:20:24] <cologic> Just a preset list that it would recognize?
[2011-06-06 23:20:46] <Pretorian> No, rather a preset list that it would send.
[2011-06-06 23:20:58] <cologic> I mean on the receiving/display side
[2011-06-06 23:21:18] <Pretorian> Yes, I would completely ignore any merging etc.
[2011-06-06 23:21:27] <Pretorian> New name, new column.
[2011-06-06 23:21:39] <cologic> Sounds DoSable
[2011-06-06 23:22:08] <Pretorian> Not so more than individual flags for bitrate, author etc.
[2011-06-06 23:23:40] <Pretorian> I suppose supported names could be specified as well.
[2011-06-06 23:24:00] <Pretorian> The idea with using a name would be to avoid individual flags for metadata.
[2011-06-06 23:24:37] <cologic> Right, I get that. Metadata could be useful and your approach seems like a good way to approach it.
[2011-06-06 23:25:38] <Pretorian> (I guess one could use abbreviations etc for names.)
[2011-06-06 23:29:19] <Pirre> will be a nice use of mem size on clients side to store this to

Ayo
Junior Member
Posts: 27
Joined: 23 Feb 2011, 13:50

Re: Metadata extension

Post by Ayo » 20 Jul 2011, 08:55

I am all for the idea of supporting metadata, just a few notes:

- To which commands will this parameter apply? I would assume at least RES, but would searching on these parameters be allowed with SCH?

- How should a '/' be escaped? '\/'? That will cause any conforming client to not only ignore the MD parameter, but the entire message:
This version of the protocol reserves all other escapes for future use; any message containing unknown escapes must be discarded.
- I don't really like the idea of putting another escaping mechanism within an esacping mechanism, or a protocol-within-a-protocol, depending on how you look at it. I'd prefer to use regular parameters instead, though I can understand how it is less flexible.

- IMO, metadata is useless if you can't reliably and automatically interpret it, so each field will have to be specified and documented anyway. The "date" field, for example, is that creation time, last modification time, access time, some file-type-specific date fetched from the file? What are the allowed formats, anything described in http://www.cl.cam.ac.uk/~mgk25/iso-time.html? What is the minimum granularity, one day? I may be going too far here, but I definitely prefer something that is properly specified over a quick hack that can't be relied on.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Metadata extension

Post by Pretorian » 20 Jul 2011, 11:06

Ayo wrote:To which commands will this parameter apply? I would assume at least RES, but would searching on these parameters be allowed with SCH?
I purposefully left that out, as I'm not entirely sure myself. I'm not entirely sure whether it's something clients should be able to request per se. In fact, I can imagine people only getting metadata when they do GFI.
Ayo wrote: - How should a '/' be escaped? '\/'? That will cause any conforming client to not only ignore the MD parameter, but the entire message:
This version of the protocol reserves all other escapes for future use; any message containing unknown escapes must be discarded.
You're right, and on second thought I'd a) use a backslash as delimiter or b) not allow forward slash in names.
Ayo wrote: - I don't really like the idea of putting another escaping mechanism within an esacping mechanism, or a protocol-within-a-protocol, depending on how you look at it. I'd prefer to use regular parameters instead, though I can understand how it is less flexible.
The problem is, as you say, the flexibility. I'd prefer to not 'waste' parameters (one for author, one for date, one for bitrate etc). I know that we have a fairly large span to use, my preference would simply to be to confine the parameters for metadata.
Ayo wrote: - IMO, metadata is useless if you can't reliably and automatically interpret it, so each field will have to be specified and documented anyway. The "date" field, for example, is that creation time, last modification time, access time, some file-type-specific date fetched from the file? What are the allowed formats, anything described in http://www.cl.cam.ac.uk/~mgk25/iso-time.html? What is the minimum granularity, one day? I may be going too far here, but I definitely prefer something that is properly specified over a quick hack that can't be relied on.
I see your concerns and I don't have any immediate response. Do clients want to format data in their own way? (In terms of date I can imagine to simply send the amount of (milli)seconds since the Unix epoch.) Bit-rate for audio files are commonly used with kBit/s, do anyone else e.g. use MBit/s? In any case, I'd imagine (which is why I took Explorer as an example) that most of the metadata is simply retrieved from the system. How the system define each metadata is... well, up to the system. (Do different systems define 'date' differently?)


More people need to chime in, methinks, on a) what metadata people could want, b) how the format of the extension should be (single or multiple parameters) and if a hard specification on each respective metadata is required.

Ayo
Junior Member
Posts: 27
Joined: 23 Feb 2011, 13:50

Re: Metadata extension

Post by Ayo » 20 Jul 2011, 14:29

Pretorian wrote:
Ayo wrote: - How should a '/' be escaped? '\/'? That will cause any conforming client to not only ignore the MD parameter, but the entire message:
This version of the protocol reserves all other escapes for future use; any message containing unknown escapes must be discarded.
You're right, and on second thought I'd a) use a backslash as delimiter or b) not allow forward slash in names.
Both solutions are fine with me. They're actually pretty much equivalent anyway, option (a) would disallow a backslash in names.
Pretorian wrote:
Ayo wrote: - I don't really like the idea of putting another escaping mechanism within an esacping mechanism, or a protocol-within-a-protocol, depending on how you look at it. I'd prefer to use regular parameters instead, though I can understand how it is less flexible.
The problem is, as you say, the flexibility. I'd prefer to not 'waste' parameters (one for author, one for date, one for bitrate etc). I know that we have a fairly large span to use, my preference would simply to be to confine the parameters for metadata.
With the escaping out of the way, the only further parsing required is to simply split off the name from the value. That is too trivial in any programming language to make me complain about it. :-) So I'm fine with the original idea.
Pretorian wrote:I see your concerns and I don't have any immediate response. Do clients want to format data in their own way? (In terms of date I can imagine to simply send the amount of (milli)seconds since the Unix epoch.) Bit-rate for audio files are commonly used with kBit/s, do anyone else e.g. use MBit/s? In any case, I'd imagine (which is why I took Explorer as an example) that most of the metadata is simply retrieved from the system. How the system define each metadata is... well, up to the system. (Do different systems define 'date' differently?)
Not sure what "system" you are refering to, but with the proper libraries a client will be able to retreive *any* file information. I can also imagine that libraries return bitrates in bit/s rather than kbit/s, but I am indeed not aware of any application that displays them in anything other than kbit/s. Also, "bit rate" is ambigious: video files have bit rates for the audio and video streams. And then there is a problem that these rates are often not even constants, in which case different applications display different rates (overall average vs. first few frames, etc...). But specifying all that may be overkill.
That aside, at least the representation of data should be specified: e.g. the string "256kbit/s" can't be interpreted by a client that only expects a number.
Pretorian wrote:More people need to chime in, methinks, on a) what metadata people could want, b) how the format of the extension should be (single or multiple parameters) and if a hard specification on each respective metadata is required.
a) Actually, I am only interested in the last modification time at this point. Zip archives, web browsers and many other applications tend to set the last modifcation date of extracted/downloaded files to the meta data it fetches from the archive/HTTP headers. I'd love to see DC clients do the same. (Yes, I realize last modification dates are not always reliable, but they can be quite useful in certain situations)

b) If you mean combining everything into a single MD parameter vs. having multiple MD parameters, then I prefer the latter since that allows easier code re-use - no need to define and implement more escaping/unescaping.

c) (hard specification): Preferably, yes. At least for a few common fields. We don't have to be very strict on less common fields, otherwise client developers may get scared away by all the specification needed in order to add new fields. >_>

pR0Ps
Junior Member
Posts: 29
Joined: 05 Dec 2010, 11:35

Re: Metadata extension

Post by pR0Ps » 25 Jul 2011, 15:36

I think that common metadata types (eg, date modified, bitrate), should have standard formats that clients are expected to follow. For example, dates are always sent as seconds since the Unix epoch, the bit rate of files (regardless of type) is sent in KiB/s, etc.

Without some kind of standard, values will have to be sent in strings, making it confusing if people use different units, especially KB/s vs. KiB/s.

OCTAGRAM
Junior Member
Posts: 10
Joined: 31 Jul 2011, 07:25
Location: Barnaul, Russia
Contact:

Re: Metadata extension

Post by OCTAGRAM » 13 Aug 2011, 06:32

GreyLink uses GFI requests and seems to provide metadata in a RES reply.

I think it would be nice to include metadata in a filelist as well (like Shareaza)

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Metadata extension

Post by Pretorian » 13 Aug 2011, 23:35

Then the question is how Greylink provide that data...

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Metadata extension

Post by Pretorian » 14 Aug 2011, 11:39

Please keep any discussion of GreyLink here; <http://www.adcportal.com/forums/viewtop ... f=13&t=767>. This thread is ONLY about the protocol data and should not be pestered with other stuff. Next time you post here will irrelevant stuff and not there, I'll remove your post. (this applies to everyone).

OCTAGRAM
Junior Member
Posts: 10
Joined: 31 Jul 2011, 07:25
Location: Barnaul, Russia
Contact:

Re: Metadata extension

Post by OCTAGRAM » 14 Aug 2011, 13:30

OK, then once again, I attach tcpdump of GreyLink's metadata query and response.
Attachments
MediaInfo.bin.zip
(113.01 KiB) Downloaded 233 times

Big Muscle
Junior Member
Posts: 39
Joined: 01 Jul 2008, 19:27

Re: Metadata extension

Post by Big Muscle » 14 Aug 2011, 15:33

If anyone is interested how metadata (esp. movie and music info) can be implemented, he can look at FlylinkDC++ source which is public. We don't have to deal with license violating then and avoid any illegal client.

http://code.google.com/p/flylinkdc/source/browse/

Locked