Netfraction Inception

Site Announcements
Pietry
Senior Member
Posts: 328
Joined: 04 Dec 2007, 07:25
Location: Bucharest
Contact:

Re: Netfraction Inception

Post by Pietry » 01 Mar 2009, 15:13

Perhaps a Bloom filter would help on the SID checking, but that also means that you periodically need to refresh the filter and also the performance bonus won't be that big and you still need to use the random factor.
I was thinking to use SIDs in some preestablished order until the maximum SID number ( around 65000) and then start over. However, after this step when the given SIDs restart from 0, the hub still needs to check what SIDs are still used and which are not. Then , performance is highly degraded.
Or you can keep a queue and add to the queue the removed sids, so you dont need to check all the users on the hub eg:
- add all possible sids to queue in ascending order
- client logs in, assign the first sid from end of the queue
- client logs out, add his sid to the start of the queue

Seems a little more scalable , but for few users you still need to use some memory for keeping all the sids in some form of queue ( or the maximum user count ), and the startup will take a little more time ( to generate the queue ).

This queue idea was one of my plans for DSHub enhancement :roll:
Just someone

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Netfraction Inception

Post by darkKlor » 01 Mar 2009, 16:12

Could you explain to me why the maximum SID number is around 2^16 aka 65000? With a 20-bit SID I would expect 2^20 unique values (approx a million), which is also what 32^4 is i.e. the number of permutations of 4 base32 characters.

I store the SID in it's base32 encoded form as a string. In C#, strings occupy 16 bytes of memory + 2 bytes per character allocated + 2 bytes for the final null character, meaning 26 bytes per SID (http://www.codeproject.com/KB/dotnet/strings.aspx).

So maintaining a queue of even 2^16 SIDs is going to use approx 1.7mb of memory. A queue of 2^20 SIDs would use 27mb of memory :o Storing the 20-bit number the SID is based on would obviously be more efficient for this purpose, but I still think it's going to use a lot of memory for such a very small part of the hub. I wonder if there's any useful computer science literature relating to this task...

Storing the encoded string for all other operations enables me to avoid constant base32 encode calls and only costs 250kb of memory for 10,000 connected users, so I'm happy with that choice at least.

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Netfraction Inception

Post by darkKlor » 02 Mar 2009, 02:35

@Pietry:
Well I see I was wrong about the million unique SIDs, it is indeed approx 65000.

I realised that the unique SID issue was the same one a filesystem faces when looking for a free block of space, so I went and looked at how they handle it. A bitmap is a common method, but doesn't scale well and, more importantly, is difficult to deal with in C# without using unsafe code. The method used by FAT et al is a list of blocks which handles both allocated and free space in the same list. Since this was overkill, your queue idea looked like the way to go, so I decided to implement it.

Initially it was using a bit more memory than I would have liked, but I realised that there was no point making the queue larger than the maximum number of allowed users, because there would still be a unique SID for everyone. This is my implementation:

Code: Select all

        private void InitialiseSIDs()
        {
            // The size of this queue only really needs to be MAX_USERS, not 2^16.
            // A Queue is a first-in, first-out (FIFO) collection implemented as a circular array.
            // Objects are inserted at one end and removed from the other.
            int limit = Settings.HubSettings.HubConnection.MaximumUsersCount < 65536 ? Settings.HubSettings.HubConnection.MaximumUsersCount : 65536;
            _sids = new Queue<int>(limit);
            for (int i = 0; i <= (limit - 1); i++)
            {
                _sids.Enqueue(i);
            }
        }

        public static string GenerateNewSessionId(RemoteMachine.Node node)
        {
            string sidString = string.Empty;
            bool creationSuccessful = false;

            while (!creationSuccessful)
            {
                int newSid = _sids.Dequeue();
                creationSuccessful = true;
                sidString = Base32.Encode(BitConverter.GetBytes(newSid)).Substring(0, 4);

                // by default, use ABCD for operator chat and DCBA for the security bot
                if (sidString == Settings.HubSettings.OperatorBot.SessionId || sidString == Settings.HubSettings.SecurityBot.SessionId)
                {
                    creationSuccessful = false;
                    continue;
                }
            }
            _instance._clients[_instance._clients.IndexOf(node as RemoteMachine.Client)].SessionId = sidString;
            return sidString;
        }
        
        internal static void RemoveDisconnectedClient(NetfractionHub.RemoteMachine.Node disconnectedClient)
        {
            _instance._clients.Remove(disconnectedClient);
            _sids.Enqueue(BitConverter.ToUInt16(Base32.Decode(disconnectedClient.SessionId), 0));
        }
Since the majority of hubs would only have a few hundred users, the average memory usage of this implementation turns out to be quite small, and the performance hit is shifted to a user disconnecting, where the SID must be re-queued, rather than a search through all existing users when somebody connects. The time spent to initialise the SIDs queue is negligible.

Thanks again to Pietry for this idea.

Pietry
Senior Member
Posts: 328
Joined: 04 Dec 2007, 07:25
Location: Bucharest
Contact:

Re: Netfraction Inception

Post by Pietry » 02 Mar 2009, 09:37

but I realised that there was no point making the queue larger than the maximum number of allowed users
That was my idea too :
keeping all the sids in some form of queue ( or the maximum user count )
You also need to implement the following situation : while the hub is running, the maximum user count changes via the settings or some command. Then, the queue needs adding/removing.

This method limits the pool of usable SIDs but I do not see any harm in that.

Now it depends on the queue implementation if adding to the last position is cheaper in time and space then iterating through all users ( I do hope Microsoft guys use some pointer to the end ), but again that is .NET implementation which hopefully has been optimized at maximum.

Could you explain to me why the maximum SID number is around 2^16 aka 65000? With a 20-bit SID I would expect 2^20 unique values (approx a million), which is also what 32^4 is i.e. the number of permutations of 4 base32 characters.
Last time I checked SID processing I remember there were around 65000 possible SIDs although I'm slipping the reason right now.

The queue implementation has the following advantages:
  • No random generation penalty
    No iteration performance penalty
    Same sid cannot occur ( unless you have a bug )
    You don't need to generate random sids until you get a free one ( also checking if free means iteration again) ( like dshub horribly does :D )
Just someone

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Netfraction Inception

Post by darkKlor » 02 Mar 2009, 12:43

You also need to implement the following situation : while the hub is running, the maximum user count changes via the settings or some command.
I don't really want to... I'd much rather force a hub restart for this. In recreating the queue you would have to check that each element was not a SID that was in use as it was added to the queue. So this means scanning the entire user list x times, where x is the new maximum users count. Also, what if there are more than x users currently connected? You either have to decline the request or do a mass kick, because there wont be enough SIDs available for them.

Generally I'd like to avoid hub restarts, but this will create some messy code using the queue implementation. I realised this when I decided to implement it, but I thought it was a reasonable trade-off.

Pietry
Senior Member
Posts: 328
Joined: 04 Dec 2007, 07:25
Location: Bucharest
Contact:

Re: Netfraction Inception

Post by Pietry » 02 Mar 2009, 13:06

If you keep the queue sorted : eg for 3200 users use SIDs like EAAA-FAAA (E***), then:
- if max_users_count is increased, you just add another bunch of sids to the queue
- if max_users_count is decreased, you do nothing about the queue, you keep more users entering via some other way, and your queue will still have elements in it even when hub full.

This is the only idea that springs to my mind now...
Just someone

darkKlor
Senior Member
Posts: 100
Joined: 30 Dec 2008, 14:59

Re: Netfraction Inception

Post by darkKlor » 03 Mar 2009, 16:08

Too much bother for not enough reward. It's not a feature that would be used very often, and if people can't handle 2 minutes downtime... well, I'm sure they can find something else to do, like find some fresh air.

On a completely different topic: It appears that the project actually contains build scripts, usable by anybody with the .Net Framework installed... at least if what I read is correct.

*.csproj files are actually MSBuild scripts, and MSBuild is included with the .Net Framework. So, if you have the Netfraction source extracted to C:\Netfraction\, then opening a command prompt window and doing the following:

Code: Select all

cd C:\Netfraction\NetfractionHub\
c:\Windows\Microsoft.NET\Framework\v3.5\msbuild.exe NetfractionHub.csproj
Should result in a compiled exectuable in the bin directory. I'm not sure if this does a Debug or Release build, or anything else about the process really, but perhaps changing the Configuration element on line 4 of the .csproj file to Release would change the active configuration.

Dj_Offset
Member
Posts: 53
Joined: 15 Sep 2008, 21:48
Location: adcs://adcs.uhub.org:1511
Contact:

Re: Netfraction Inception

Post by Dj_Offset » 15 Apr 2009, 23:22

About the maximum allowed SIDs.

The SIDs are a 20 bit integer encoded in BASE32 alphabet (A-Z + 2-7) as 4 bytes. As such,
0 becomes AAAA, 1 becomes AAAB ... (2^20)-1 becomes 7777. That means, there can be 1048576 different SIDs.

I just reviewed the wiki page article "SID generation" which I just cleaned up, and
I found the initial comparison of the algorithm discussed here to the "random SID allocator" to be very odd. I'd like to argue that the random SID allocator at worst is better than the one described in this thread.

First of all, why is the random SID allocator bad?
Generating pseudo random numbers is very cheap since there are no requirements for good entropy in this case. A simple, multiplication plus modulus calculation is good enough.

Okay, so generating a random SID is cheap, but looking for collisions are not, right?
It can be, but that all depends on how fast the algorithm for looking up users based on a SID is. A fast way to do that is to insert users into a balanced binary tree, using the SID as the key, making the lookup process O(log n) instead of O(n).
Having this essential piece of code in place will also increase performance when routing messages, such as direct messages (D and E) between users.

Third, the algorithm described in this thread, requires either a large chunk of memory allocated, or an even larger piece of memory depending on how the "queue" is implemented.
If it is implemented as a queue, or linked list, it will require memory for both the SID plus pointers to previous (and/or next) element.
If it is implemented as an array it just requires the memory for the pre-allocated SIDs, but one have to do alot of memory moving/swapping everytime the queue needs to be cycled.

Personally, I went for a different algorithm, but that's all described in the wiki.

Pietry
Senior Member
Posts: 328
Joined: 04 Dec 2007, 07:25
Location: Bucharest
Contact:

Re: Netfraction Inception

Post by Pietry » 16 Apr 2009, 08:46

In the solution with the balanced binary tree, you will use O(log n ) for looking up and insertion/removing. And I'm not sure, but I think that balancing the tree of keeping it balanced uses more computing resources ( correct me if I'm wrong ), also keeping the tree with the least possible levels ( otherwise you fall in the linked list again )
In SID recycling you will only use O(1) for adding/removing and no looking up for SID checking.
However, from other points of view you can use the tree for finding users for some other purpose ( dispatching direct messages, like you said eg. )
But strictly speaking for the SID only problem, having a pointer ( 4 bytes ) for around 1000 users is around 4 kB of memory which I consider negligible ( hubs tend to use hundreds of megabytes ).
Your tree solution improves the hub performance in the long run ( if you use it for all the looking up )
Just someone

Dj_Offset
Member
Posts: 53
Joined: 15 Sep 2008, 21:48
Location: adcs://adcs.uhub.org:1511
Contact:

Re: Netfraction Inception

Post by Dj_Offset » 16 Apr 2009, 11:15

Inserting and deleting in a balanced tree is fairly cheap, depending on the tree.

About memory issues, one will need to use an array for compactness, that leads to many memory moves when removing users in order to shift the queue. This can of course be alleaviated by using a sort of linked list, however, when using the linked list approach one has to do many iterations on the list in order to find which element to remove, depending on where in the queue it was located in the first place, and depending on how the list is created.

So, keeping a max_user limit at 10000 users, with the SID being a 32 bit, one would have to keep 40K just for the SIDs, then an additional 40K (or 80K on 64 bit architectures) for a singly linked list. That's 80-120K of allocations even for an empty hub.

uHub currently use about 3-5MB for a 1000 user hub.

Locked