Using C# and the .Net Framework as I am, the Framework defines a number of character classes that correspond to Unicode character classes (http://msdn.microsoft.com/en-us/library ... egory.aspx and http://www.unicode.org/Public/UNIDATA/U ... ory_Values).
The category 'Cc' in the Unicode specification captures 'control' characters U+0000-U+001F (C0 controls, http://www.unicode.org/charts/PDF/U0000.pdf) and U+007F-U+009F (C1 controls, http://www.unicode.org/charts/PDF/U0080.pdf) which originate in ISO/IEC 6429.
Thus, the ADC Protocol currently restricts the use of C0 controls, plus the space character U+0020. Unicode also defines a Separator category 'Z' which includes Line (Zl), Paragraph (Zp) and Space (Zs) components. I'm having trouble finding the exact definitions of these on the official Unicode site, but they appear to be listed here: http://www.fileformat.info/info/unicode ... /index.htm
I propose that the ADC Protocol exclude, for the purposes of nicknames and descriptions, not just C0 controls and U+0020; but, all controls in the Cc category and all separator characters in the Z category.
I include here a C# function containing a regular expression which verifies text against this restriction:
Code: Select all
private static bool ContainsUnicodeControlOrSeperator(string text)
{
return !System.Text.RegularExpressions.Regex.Match(text, @"^[^\p{Cc}\p{Z}]+$").Success;
}