Unicode strings

Post by **Big Muscle** » 25 Jan 2011, 11:25

I would like to discuss one thing which I have had in my mind for a long time and I can't understand why it is done in such way. So maybe someone will explain it to me.

All strings in core are saved as single byte strings (std::string), all strings in GUI are handled as wide char strings (std::wstring), so conversion (Text::toT/fromT) between these two types must be done for every GUI-core co-operation. Why not to rewrite core so it handles std::wstring properly so no conversion is needed? (The same pays for char<->wchar_t)

There is a lot of places where the "stupidity" of conversion can be seen. For example, log files - user enters log file name in GUI (std::wstring), it is passed to SettingsManager where it is stored as std::string (so conversion needs to be done). When LogManager wants to save the log, it passes the filename (std::string) to File class where it is converted back into original std::wstring. So there are two conversions (std::wstring->std::string, std::string->std::wstring) for such simple operation. Another example can be taken from listview updates where simple item update needs to process Text::toT at first (although almost strings where std::wstring at their origin but they were converted to std::string to store in GUI). It doesn't look to be correct.

My proposal is to get rid of all single byte strings for core and replace them with wide char strings (or with tstring to support non-unicode builds). For network communication - since ADC is fully unicode its command can be handled as wstrings, and conversion would be needed for NMDC only (and why to care of performance drop in obsolete protocol). This change would increase DC++ performance a lot, because I profiled the code and functions Text::toT/fromT seem to be the biggest brake in the code.

And at the end, when application is fully unicode, it shouldn't use std::string at all.

Post by **poy** » 26 Jan 2011, 19:33

ADC requires UTF-8 whereas GUI strings are UTF-16, so i doubt this could ever be possible. even if all core strings were to be changed to std::wstring (if std::wstring supports UTF-8 at all?), the conversion between UTF-8 and UTF-16 would still be necessary.

there have however been ideas about storing both versions of the string in high usage cases, such as nicks.

Post by **Big Muscle** » 27 Jan 2011, 13:33

True, conversion would still be necessary but only for receiving/sending protocol commands (network communication in general). Now it is needed everytime you want some co-operation between GUI and core - displaying something in GUI, storing some GUI string to core etc. - it's really at a lot of places, so rewriting all strings to wstrings (or tstrings) and calling Text::toT/fromT on protocol communication would clean code a lot and bring some performance improvement as bonus.

I don't think that storing both version strings is good idea. It could increase memory usage a lot.

Post by **Pretorian** » 28 Jan 2011, 17:12

Note that sizeof(wchar_t) on Windows is 2 bytes while on Linux it's 4 bytes...

grooup · Post by **grooup** » 17 Apr 2015, 18:34

This change would increase DC++ performance a lot, because I profiled the code and functions Text::toT/fromT seem to be the biggest brake in the code?

DCBase

Unicode strings

Unicode strings

Re: Unicode strings

Re: Unicode strings

Re: Unicode strings

Re: Unicode strings