ADC wireshark dissector

Here is the sub forum used for talking about ideas, implementations and suggestions or typical guidelines.

Further info on extension or the protocol is found at our Wiki
Locked
Pretorian
Site Admin
Posts: 214
Joined: 21 Jul 2009, 10:21

ADC wireshark dissector

Post by Pretorian » 22 Dec 2012, 13:08

This is a future text for the ADC wireshark dissector project: https://launchpad.net/adcwireshark The text is not final (plenty of spelling errors etc) and this is simply me wanting to put this text somewhere (and I think more people will see it here and be able to comment on if something is incredingly wrong).
Wireshark is a netowkr monitor and analyser. It uses 'dissectors' (plugins) to parse the netwrok trafffic and present it to a user based on the underlaying protocol. Wireshark has dissectors multiple levels on the OSI 7 layers. The topmost level, applications, exist protocols such as FTP, HTTP, BitTorrent as well as NMDC and ADC.

ADC is essentially broken up into three sections: what is being sent (type and command), who sending/the intended recipient and parameters for the command.

ADC has two delimiters: space and newline. The former separates commands, parameters etc. Each messageends in a enwline character. Is is allowed to send an empty message, consisting only of a newline.

Length and version is not specified in messages.

Messages are escaped using the following table
|\s |A space
|\n |A newline
|\\ |The '\' character.

The message type initiates all messages and determines the routing (who is sending and who should be the recipient). This is a metter of parsing cfor clients while the hub is the one in charge of the actual routing. The type may be one of the following (copied from the spec).

The command is always three characters and specify what to do. This also implies what parameters shall follow. There are two types of paramters: named and positional. Positional parameters always come first in the specified order followed by the named parameters that can come in any order. Positional parameters consist of only a value and may be empty (no value). A named parameter consist of a name and value where the value may be empty. A command need not have any parameters at all. The parameters themselves are not relevant from a simplisitc parsing model view.

The sender and recipient always consist of four characters. As noted earlier, the presence of a sender or receiver is implied from the type.

This relatively simple structure of a message in ADC means that the protocol has a minimum bound of five characters (type, command and newline). The use of a sender increase the message by 4 characters and a recipient by an additional 5 characters (includes the space). Each parameter may be boundless but named parameters consist of at least two characters (their name).

As the length of a message is not specified, it means that any parser must traverse the message until a newline has been found. TCP is nice enough to order messages so it's possible to assembly broken packages while UDP is not so kind. A relatively low amount of data is sent with UDP although the parser should be agnostic.

Parameter values may differ and the following table shows all possible types of data to expect
Integer
Float
String
IPv4/IPv6 address
Enumeration
Bitfield

The ADC wireshark dissector is divded in three parts: basic message parsing (dividing commands from parameters), parameter parsing according to type and a generic structure of all commands, parameters and descriptions of each. This structure allows future versions of this dissector to simply add a command or parameter without having to care about parsing or other components.

The following is a command, CTM, with its parameters

It is easy to see that any additional parameter can be added without knowledge of how the protocol handles the data. The positional parameters are required per the specific and are not subject for revision unless a new version of the specification is created. This structure doesn't necessarily care about the order of the parameters or how it is treated by normal implementatiosn that the type is correct.

The initial parsing of the message is simply to discover what the command is and if there is a sender and recipient. The sender and recipient is not stored further as it does not yeild additional information of worth. The dissector could create a list fo clients logging in (a BINF message with a nickname) but that might never be complete so the dissector simply do not try this. Additionally, saving any iformation will increase the memory footprint by 4 bytes for the SID and additional bytes for the client's nick.

Wireshark recommends that dissectors use TCP-foo for messages of known length and TCP-bar for others. ADC falls in the latter category. One must not be blinded by commands that, for now, only contain positional parameters as that a) may change tomorrow and b) the positional parameters may be of variable length.

It is also important to make sure that messages aren't broken or partial. It is therefore important that the dissector waits until it has receivd a full message chain that it can fully parse. Wireshark doesn't allowed a dissector to display partial messages anyway.

The dissector does not treat extensions differently from mandated commands or parameters as they all have the same structure. It therefore makes no sense to parse or display them differently, aparent from showing e.g. "extension" next to the command or parameter.

The basic parsing in the dissector works as the following:
Scan for the next newline. This is one message.
Check the first character, this will determine if there is a sender and receiver.
Keep the characters in position 1-3 as these indidcate the command.
Remove everything before the start of the parameters.
Split the remaining string per the sperator.
The first n parameters are the positional parameters as per the command structure. Parse the value according to its type.
The remaining parameters are the named parameters. The first two characters indicate the name of the parameter and the remaining characters is its value. parse the value according to the type.

This is a fairly simple algorithm but it has one major drawback: data needs to be stored almost always as string in the dissector.

Locked