Unicode Text Limits

Ideas for ADC may be presented here for others to review and point out flaws or further improve the idea.
Forum rules
If you have an account on the wiki, remember to update the ADC Proposals page for new ideas.

http://dcbase.org/wiki/ADC_Proposals_list
Locked
darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Unicode Text Limits

Post by darkKlor » 30 Jul 2009, 00:45

I've been looking at the latest protocol spec to tighten the conformance of Netfraction. When I looked at the NI and DE fields of the INF message and saw that valid text is "all characters in the Unicode character set with code point equal to or greater than 32 [U+0020]", I went about crafting a regular expression to match this.

Using C# and the .Net Framework as I am, the Framework defines a number of character classes that correspond to Unicode character classes (http://msdn.microsoft.com/en-us/library ... egory.aspx and http://www.unicode.org/Public/UNIDATA/U ... ory_Values).

The category 'Cc' in the Unicode specification captures 'control' characters U+0000-U+001F (C0 controls, http://www.unicode.org/charts/PDF/U0000.pdf) and U+007F-U+009F (C1 controls, http://www.unicode.org/charts/PDF/U0080.pdf) which originate in ISO/IEC 6429.

Thus, the ADC Protocol currently restricts the use of C0 controls, plus the space character U+0020. Unicode also defines a Separator category 'Z' which includes Line (Zl), Paragraph (Zp) and Space (Zs) components. I'm having trouble finding the exact definitions of these on the official Unicode site, but they appear to be listed here: http://www.fileformat.info/info/unicode ... /index.htm

I propose that the ADC Protocol exclude, for the purposes of nicknames and descriptions, not just C0 controls and U+0020; but, all controls in the Cc category and all separator characters in the Z category.

I include here a C# function containing a regular expression which verifies text against this restriction:

Code: Select all

private static bool ContainsUnicodeControlOrSeperator(string text)
{
    return !System.Text.RegularExpressions.Regex.Match(text, @"^[^\p{Cc}\p{Z}]+$").Success;
}
Also, I believe the Protocol should specify a minimum Nickname length of at least one character.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Unicode Text Limits

Post by Pretorian » 31 Jul 2009, 11:13

darkKlor wrote:Also, I believe the Protocol should specify a minimum Nickname length of at least one character.
Why? That's a hub rule.

(Edit: BTW, NI already states "although hubs may limit this further as they like with an appropriate error message." [not DE, though, apparantly].)

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Unicode Text Limits

Post by darkKlor » 31 Jul 2009, 11:38

Maybe it's just me, but a blank nickname seems silly.

Personally, I'm not opposed to allowing spaces in nicknames, but I think if you're going to do it, you might as well do it properly and therefore all spaces should be ignored, and the same goes for the control characters. By just disallowing the first 32 codepoints the spec is a little bit Euro-centric, I think.

Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

Re: Unicode Text Limits

Post by Pretorian » 02 Aug 2009, 10:26

darkKlor wrote:Maybe it's just me, but a blank nickname seems silly.
A blank NI parameter means "I don't have a nick". That is, it's a hub requirement that you should have nick. It's as if we required (per the protocol) that DE should be present. (It's just that nick has grown to mean something significant.)

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Unicode Text Limits

Post by darkKlor » 02 Aug 2009, 10:45

Mmm. Anyway, all this blank nickname chatter is detracting from the main point of my original article :P

Pietry
Senior Member
Posts: 328
Joined: 04 Dec 2007, 07:25
Location: Bucharest
Contact:

Re: Unicode Text Limits

Post by Pietry » 03 Aug 2009, 07:12

Pretorian wrote:
darkKlor wrote:Maybe it's just me, but a blank nickname seems silly.
A blank NI parameter means "I don't have a nick". That is, it's a hub requirement that you should have nick. It's as if we required (per the protocol) that DE should be present. (It's just that nick has grown to mean something significant.)
I totally agree with Pret. Nick is just something to make calling easier. Nick, sid ( cid ) can be compared with name and social security number. Perhaps a client could allow users not to fill in any nick. This way they can use filesharing more anonymously ( in bittorrent networks you have a nick? )
And it's up to the hub to restrict whatever the hubowner wants to.
Just someone

Locked