DCBase

Posted: **30 Jul 2009, 00:45**

I've been looking at the latest protocol spec to tighten the conformance of Netfraction. When I looked at the NI and DE fields of the INF message and saw that valid text is "all characters in the Unicode character set with code point equal to or greater than 32 [U+0020]", I went about crafting a regular expression to match this.

Using C# and the .Net Framework as I am, the Framework defines a number of character classes that correspond to Unicode character classes (http://msdn.microsoft.com/en-us/library ... egory.aspx and http://www.unicode.org/Public/UNIDATA/U ... ory_Values).

The category 'Cc' in the Unicode specification captures 'control' characters U+0000-U+001F (C0 controls, http://www.unicode.org/charts/PDF/U0000.pdf) and U+007F-U+009F (C1 controls, http://www.unicode.org/charts/PDF/U0080.pdf) which originate in ISO/IEC 6429.

Thus, the ADC Protocol currently restricts the use of C0 controls, plus the space character U+0020. Unicode also defines a Separator category 'Z' which includes Line (Zl), Paragraph (Zp) and Space (Zs) components. I'm having trouble finding the exact definitions of these on the official Unicode site, but they appear to be listed here: http://www.fileformat.info/info/unicode ... /index.htm

I propose that the ADC Protocol exclude, for the purposes of nicknames and descriptions, not just C0 controls and U+0020; but, all controls in the Cc category and all separator characters in the Z category.

I include here a C# function containing a regular expression which verifies text against this restriction:

Code: Select all

private static bool ContainsUnicodeControlOrSeperator(string text)
{
    return !System.Text.RegularExpressions.Regex.Match(text, @"^[^\p{Cc}\p{Z}]+$").Success;
}

Also, I believe the Protocol should specify a minimum Nickname length of at least one character.

Posted: **31 Jul 2009, 11:13**

darkKlor wrote:Also, I believe the Protocol should specify a minimum Nickname length of at least one character.

Why? That's a hub rule.

(Edit: BTW, NI already states "although hubs may limit this further as they like with an appropriate error message." [not DE, though, apparantly].)

Posted: **31 Jul 2009, 11:38**

Maybe it's just me, but a blank nickname seems silly.

Personally, I'm not opposed to allowing spaces in nicknames, but I think if you're going to do it, you might as well do it properly and therefore all spaces should be ignored, and the same goes for the control characters. By just disallowing the first 32 codepoints the spec is a little bit Euro-centric, I think.

Posted: **02 Aug 2009, 10:26**

darkKlor wrote:Maybe it's just me, but a blank nickname seems silly.

A blank NI parameter means "I don't have a nick". That is, it's a hub requirement that you should have nick. It's as if we required (per the protocol) that DE should be present. (It's just that nick has grown to mean something significant.)

Posted: **02 Aug 2009, 10:45**

Mmm. Anyway, all this blank nickname chatter is detracting from the main point of my original article

Posted: **03 Aug 2009, 07:12**

Pretorian wrote:
darkKlor wrote:Maybe it's just me, but a blank nickname seems silly.
A blank NI parameter means "I don't have a nick". That is, it's a hub requirement that you should have nick. It's as if we required (per the protocol) that DE should be present. (It's just that nick has grown to mean something significant.)

I totally agree with Pret. Nick is just something to make calling easier. Nick, sid ( cid ) can be compared with name and social security number. Perhaps a client could allow users not to fill in any nick. This way they can use filesharing more anonymously ( in bittorrent networks you have a nick? )
And it's up to the hub to restrict whatever the hubowner wants to.

DCBase

Unicode Text Limits

Unicode Text Limits

Re: Unicode Text Limits

Re: Unicode Text Limits

Re: Unicode Text Limits

Re: Unicode Text Limits

Re: Unicode Text Limits