tech_report.tex 22.6 KB
Newer Older
Sven Robertz's avatar
Sven Robertz committed
1
2
3
% *** en embryo of a technical report describing the labcomm design rationale and implementation ***

\documentclass[a4paper]{article}
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
4
\usepackage{listings}
Sven Robertz's avatar
Sven Robertz committed
5
6
7
8
9
10
%\usepackage{verbatims}
%\usepackage{todo}

\begin{document}
\title{Labcomm tech report}
\author{Anders Blomdell and Sven Gesteg\aa{}rd Robertz }
11
\date{draft, \today}
Sven Robertz's avatar
Sven Robertz committed
12

13
\maketitle
Sven Robertz's avatar
Sven Robertz committed
14
15
16
17
18
19
20
21
22

\begin{abstract}

LabComm is a binary protocol suitable for transmitting and storing samples of
process data. It is self-describing and independent of programming language,
processor, and network used (e.g., byte order, etc).  It is primarily intended
for situations where the overhead of communication has to be kept at a minimum,
hence LabComm only requires one-way communication to operate. The one-way
operation also has the added benefit of making LabComm suitable as a storage
23
format.
Sven Robertz's avatar
Sven Robertz committed
24
25
26
27
28
29
30
31

LabComm provides self-describing channels, as communication starts with the
transmission of an encoded description of all possible sample types that can
occur, followed by any number of actual samples in any order the sending
application sees fit.

The LabComm system is based on a binary protocol and
and a compiler that generates encoder/decoder routines for popular languages
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
32
33
34
35
including C, Java, and Python. There is also a standard library for the
languages supported by the compiler, containing generic routines for
encoding and decoding types and samples, and interaction with
application code.
Sven Robertz's avatar
Sven Robertz committed
36
37

The LabComm compiler accepts type and sample declarations in a small language
38
that is similar to C or Java type-declarations.
Sven Robertz's avatar
Sven Robertz committed
39
40
41
42
43
44
45
\end{abstract}
\section{Introduction}

%[[http://rfc.net/rfc1057.html|Sun RPC]]
%[[http://asn1.org|ASN1]].

LabComm has got it's inspiration from Sun RPC~\cite{SunRPC}
46
and ASN1~\cite{ASN1}. LabComm is primarily intended for situations
Sven Robertz's avatar
Sven Robertz committed
47
48
where the overhead of communication has to be kept at a minimum, hence LabComm
only requires one-way communication to operate. The one-way operation also has
49
the added benefit of making LabComm suitable as a storage format.
Sven Robertz's avatar
Sven Robertz committed
50

Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
51
52
53
54
Two-way comminication adds complexity, in particular for hand-shaking
during establishment of connections, and the LabComm library provides
support for (for instance) avoiding deadlocks during such hand-shaking.

55
\pagebreak
Sven Robertz's avatar
Sven Robertz committed
56
57
58
\section{Communication model}

LabComm provides self-describing communication channels, by always transmitting
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
59
a machine readable description of the data before actual data is
60
sent\footnote{Sometimes referred to as \emph{in-band} self-describing}.
Sven Robertz's avatar
Sven Robertz committed
61
62
63
64
65
66
67
68
69
70
71
72
73
Therefore, communication on a LabComm channel has two phases

\begin{enumerate}
\item the transmission of signatures (an encoded description including data
types and names, see appendix~\ref{sec:ProtocolGrammar} for details) for all sample types
that can be sent on the channel
\item the transmission of any number of actual samples in any order
\end{enumerate}

During operation, LabComm will ensure (i.e., monitor) that a communication
channel is fully configured, meaning that both ends agree on what messages that
may be passed over that channel.  If an unregistered sample type is sent or
received, the LabComm encoder or decoder will detect it and take action.
74
In more dynamic applications, it is possible to reconfigure a channel in order to add,
75
76
remove, or change the set of registered sample types. This is discussed
in Section~\ref{sec:reconfig}.
77

Sven Robertz's avatar
Sven Robertz committed
78
79
80
81
82
83
84
85
86
87
88
89
The roles in setting up, and maintaining, the configuration of a channel are as follows:

\paragraph{The application software} (or higher-level protocol) is required to

\begin{itemize}
\item register all samples to be sent on a channel with the encoder
\item register handlers for all samples to be received  on a channel with the decoder
\end{itemize}

\paragraph{The transmitter (encoder)}

\begin{itemize}
90
 \item ensures that the signature of a sample is transmitted on the channel before samples are
Sven Robertz's avatar
Sven Robertz committed
91
92
93
94
95
96
97
98
99
100
       written to that channel
\end{itemize}

\paragraph{The receiver (decoder)}

\begin{itemize}
 \item checks, for each signature, that the application has registered a handler for that sample type
 \item if an unhandled signature is received, pauses the channel and informs the application
\end{itemize}

101
102
103
104
105
106
107
108
109
110
111
112
113
The fundamental communication model applies to all LabComm channels and
deals with the individual unidirectional channels. In addition to that,
the labcomm libraries support the implementation of higher-level
handshaking and establishment of bidirectional channels both through
means of interacting with the underlying transport layer (e.g., for
marking packets containing signatures as \emph{important}, for
transports that handle resending of dropped packets selectively), or
requesting retransmission of signatures.

In order to be both lean and generic, LabComm does not provide a
complete protocol for establishing and negotiating bidirectional
channels, but does provide support for building such protocols on top
of LabComm.
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
114
\subsection{Reconfiguration}
115
\label{sec:reconfig}
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

The fundamental communication model can be generalized to the life-cycle
of a concrete communication channel, including the transport layer,
between two end-points. Then, the communication phases are
\begin{enumerate}
  \item \emph{Establishment} of communication channel at the transport layer
  \item \emph{Configuration} of the LabComm channel (registration of sample
    types)
  \item \emph{Operation} (transmission of samples)
\end{enumerate}
where it is possible to \emph{reconfigure} a channel by transitioning
back from phase 3 to phase 2. That allows dynamic behaviour, as a sample
type can be added or redefined at run-time. It also facilitates error
handling in two-way channels.

131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
One example of this, more dynamic, view of a labcomm channel is that the
action taken when an unregistered sample is sent or received is to
revert back to the configuration phase and redo the handshaking to
ensure that both sides agree on the set of sample types (i.e.,
signatures) that are currently configured for the channel.

From the system perspective, the LabComm protocol is involved in
phases 2 and 3. The establishement of the \emph{transport-layer}
channels is left to external application code. In the Labcomm library,
that application code is connected to the LabComm routines through
the \emph{reader} and \emph{writer} interfaces,
with default implementations for sockets or file descriptors (i.e.,
files and streams).


Sven Robertz's avatar
Sven Robertz committed
146
147
\section{The Labcomm language}

Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
148
149
150
151
152
153
154
The LabComm language is used to describe data types, and from such data
descriptions, the compiler generates code for encoding and decoding
samples. The language is quite similar to C struct declarations, with
some exceptions. We will now introduce the language through a set of
examples.

These examples do not cover the entire language
155
specification (see appendix~\ref{sec:LanguageGrammar} for the complete
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
156
157
grammar), but serve as a gentle introduction to the LabComm
language covering most common use-cases.
Sven Robertz's avatar
Sven Robertz committed
158
159
160
161
162
163
164
165
166
167
168
169
170
171

\subsection{Primitive types}

\begin{verbatim}
  sample boolean a_boolean;
  sample byte a_byte;
  sample short a_short;
  sample int an_int;
  sample long a_long;
  sample float a_float;
  sample double a_double;
  sample string a_string;
\end{verbatim}

172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
\subsection{The void type}

There is a type, \verb+void+, which can be used to send
a sample that contains no data. 

\begin{verbatim}
typedef void an_empty_type;

sample an_empty_type no_data1;
sample void no_data2;
\end{verbatim}

\verb+void+ type can may not be used as a field in a struct or
the element type of an array.


Sven Robertz's avatar
Sven Robertz committed
188
189
190
191
192
193
194
195
196
197
198
199
200
\subsection{Arrays}

\begin{verbatim}
  sample int fixed_array[3];
  sample int variable_array[_];                // Note 1
  sample int fixed_array_of_array[3][4];       // Note 2
  sample int fixed_rectangular_array[3, 4];    // Note 2
  sample int variable_array_of_array[_][_];    // Notes 1 & 2
  sample int variable_rectangular_array[_, _]; // Notes 1 & 2
\end{verbatim}

\begin{enumerate}
\item In contrast to C, LabComm supports both fixed and variable (denoted
201
by~\verb+_+) sized arrays.
Sven Robertz's avatar
Sven Robertz committed
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216

\item In contrast to Java, LabComm supports multidimensional arrays and not
only arrays of arrays.

\end{enumerate}

\subsection{Structures}

\begin{verbatim}
  sample struct {
    int an_int_field;
    double a_double_field;
  } a_struct;
\end{verbatim}

217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
\subsection{Sample type refereces}

In addition to the primitive types, a sample may contain
a reference to a sample type. References are declared using
the \verb+sample+ keyword.

Examples:

\begin{verbatim}
sample sample a_ref;

sample sample ref_list[4];

sample struct { 
    sample ref1;
    sample ref2;
    int x;
    int y;
} refs_and_ints;
\end{verbatim}

Sample references are need to be registered on both encoder and decoder
side, using the functions

\begin{verbatim}
int labcomm_decoder_sample_ref_register(
    struct labcomm_decoder *decoder\nonumber
    const struct labcomm_signature *signature);

int labcomm_encoder_sample_ref_register(
    struct labcomm_encoder *encoder\nonumber
    const struct labcomm_signature *signature);
\end{verbatim}

The value of an unregistered sample reference will be decoded as \verb+null+.

\subsection{User defined types}

User defined types are declared with the \verb+typedef+ reserved word,
and can then be used in type and sample declarations.
Sven Robertz's avatar
Sven Robertz committed
257
258
259
260
261
262
263
264
265
266

\begin{verbatim}
  typedef struct {
    int field_1;
    byte field_2;
  } user_type[_];
  sample user_type a_user_type_instance;
  sample user_type another_user_type_instance;
\end{verbatim}

267
\section{The LabComm system}
Sven Robertz's avatar
Sven Robertz committed
268

269
270
The LabComm system consists of a compiler for generating code from the data
descriptions, and libraries providing LabComm communication facilities in,
271
272
currently, C, Java, Python, C\#, and RAPID\footnote{excluding variable
size samples, as RAPID has limited support for dynamic memory allocation}.
Sven Robertz's avatar
Sven Robertz committed
273

274
275
276

\subsection{The LabComm compiler}

Sven Robertz's avatar
Sven Robertz committed
277
278
279
280
The LabComm compiler generates code for the declared samples, including marshalling and
demarshalling code, in the supported target languages.

The compiler itself is implemented in Java using the JastAdd~\cite{jastadd} compiler compiler.
281

282
283
284
285
\subsection{The LabComm library}

The LabComm libraries contain functionality for the end-to-end transmission
of samples. They are divided into two layers, where the upper layer implements
286
the general encoding and decoding of samples, and the lower one deals with
287
288
289
the transmission of the encoded byte stream on a particular transport layer.

Thus, the LabComm communication stack looks like this:
290
\begin{figure}[h!]
291
292
293
294
295
296
297
298
299
300
301
\begin{verbatim}
    _______________________
    |     Application     |
    +---------------------+
    | encoder  | decoder  |    to/from labcomm encoded byte stream
    +----------+----------+
    | writer   | reader   |    transmit byte stream over particular transport
    +----------+----------+
    | transport layer / OS|
    +---------------------+
\end{verbatim}
302
\end{figure}
303
304
\subsubsection{LabComm actions}

305
(similar to ioctl())
306
307
308
309
310
311
312
The encoder/writer and decoder/reader interfaces consist of a set of actions

One example of this is that there is a a separate writer action for
transmitting signatures, allowing the writer to treat a signature differently
from encoded samples, e.g., to allow handshaking during channel setup.

User actions allow the application or a higher level
Sven Robertz's avatar
Sven Robertz committed
313
protocol to communicate with the underlying transport layer through the LabComm
314
encoder.
Sven Robertz's avatar
Sven Robertz committed
315

316
317
318
One example is reliable communication, which is controlled from the application
but needs to be implemented for each transport at at the reader/writer level.
(Or not, e.g., TCP)
Sven Robertz's avatar
Sven Robertz committed
319
320
321
322
323

\section{LabComm is not...}

\begin{itemize}
\item a protocol for two-way connections
324
\item intrinsically supporting reliable communication
Sven Robertz's avatar
Sven Robertz committed
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
\item providing semantic service-descriptions
\end{itemize}

But

\begin{itemize}
\item it is suitable for the individual channels of a structured connection
\item the user action mechanism allows using feature of different transport layers
  through labcomm (i.e., it allows encapsulation of the transport layer)
\item the names of samples can be chosen and mapped according to a suitable taxonomy or ontology
\end{itemize}



\section{Example and its encoding}

With the following `example.lc` file:

343
\lstinputlisting[basicstyle=\footnotesize\ttfamily]{../examples/wiki_example/example.lc}
344
and this \verb+example_encoder.c+ file
345
\lstinputlisting[basicstyle=\footnotesize\ttfamily,language=C]{../examples/wiki_example/example_encoder.c}
Sven Robertz's avatar
Sven Robertz committed
346

347
\newpage
Sven Robertz's avatar
Sven Robertz committed
348
349
350

Running \verb+./example_encoder one two+, will yield the following result in example.encoded:
\begin{verbatim}
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
351
352
353
354
355
356
357
358
359
00000000  01 0c 0b 4c 61 62 43 6f  6d 6d 32 30 31 34 02 30 |...LabComm2014.0|
00000010  40 0b 6c 6f 67 5f 6d 65  73 73 61 67 65 22 11 02 |@.log_message"..|
00000020  08 73 65 71 75 65 6e 63  65 23 04 6c 69 6e 65 10 |.sequence#.line.|
00000030  01 00 11 02 04 6c 61 73  74 20 04 64 61 74 61 27 |.....last .data'|
00000040  02 08 41 04 64 61 74 61  01 25 40 04 00 00 00 01 |..A.data.%@.....|
00000050  00 40 09 00 00 00 02 01  01 03 6f 6e 65 40 0e 00 |.@........one@..|
00000060  00 00 03 02 00 03 6f 6e  65 01 03 74 77 6f 41 04 |......one..twoA.|
00000070  00 00 00 00 41 04 3f 80  00 00 41 04 40 00 00 00 |....A.?...A.@...|
00000080
Sven Robertz's avatar
Sven Robertz committed
360
361
\end{verbatim}

Sven Robertz's avatar
Sven Robertz committed
362
363
i.e.,
\begin{verbatim}
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
364
<version> <length: 12> <string: <len: 11> <"LabComm2014">>
365
366
367
<sample_decl> <length: 48 
              <user_id: 0x40> 
              <string: <len: 11> <"log_message">
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
368
  <signature_length: 34>
369
  <struct_decl:
Sven Robertz's avatar
Sven Robertz committed
370
371
    <number_of_fields: 2>
    <string: <len: 8> <"sequence"> <type: <integer_type>>
372
373
374
    <string: <len: 4> <"line">> <type: <array_decl
      <number_indices: 1> <variable_index>
      <type: <struct_decl:
Sven Robertz's avatar
Sven Robertz committed
375
        <number_of_fields:2>
376
        <string: <len: 4> <"last">> <type: <boolean_type>>
Sven Robertz's avatar
Sven Robertz committed
377
378
379
380
        <string: <len: 4> <"data">> <type: <string_type>>
      >>
   >
>
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
381
382
383
384
385
<sample_decl> <length: 8> <user_id: 0x41> <string: <len: 4> <"data">>
  <signature_length: 1> <float_type>
<sample_data> <user_id: 40> <length: 4>  <packed_sample_data>
<sample_data> <user_id: 40> <length: 9>  <packed_sample_data>
<sample_data> <user_id: 40> <length: 14>  <packed_sample_data>
Sven Robertz's avatar
Sven Robertz committed
386
\end{verbatim}
Sven Robertz's avatar
Sven Robertz committed
387

Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
388
\section{Type and sample declarations}
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439

LabComm has two constructs for declaring sample types, \emph{sample
declarations} and \emph{type declarations}. A sample declaration is used
for the concrete sample types that may be transmitted, and is always
encoded as a \emph{flattened} signature. That means that a sample
containing user types, like

\begin{verbatim}
typedef struct {
  int x;
  int y;
} point;

sample struct {
  point start;
  point end;
} line;
\end{verbatim}

is flattened to 

\begin{verbatim}
sample struct {
  struct {
    int x;
    int y;
  } start;
  struct {
    int x;
    int y;
  } end;
} line;
\end{verbatim}

Sample declarations are always sent, and is the fundamental identity of
a type in LabComm. 

Type declarations is the hierarchical counterpart to sample
declarations: here, fields of user types are encoded as a reference to
the type instead of being flattened. As the flattened sample decl is the
fundamental identity of a type, type declarations can be regarded as
meta-data, describing the internal structure of a sample. They are
intended to be read by higher-level software and human system developers
and integrators.

Sample declarations and type declarations have separate name-spaces in
the sense that the numbers assigned to them by a labcomm encoder 
come from two independent number series. To identify which
\verb+TYPE_DECL+ a particular \verb+SAMPLE_DECL+ corresponds to, the
\verb+TYPE_BINDING+ packet is used.

Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
440
\subsection{Example}
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471

The labcomm declaration
\lstinputlisting[basicstyle=\footnotesize\ttfamily]{../examples/user_types/test.lc}
can be is encoded as
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
TYPE_DECL 0x40 "coord" <int> val
TYPE_DECL 0x41 "point" <struct> <2 fields> 
                                "x" <type: 0x40> 
                                "y" <type: 0x40>
TYPE_DECL 0x42 "line" <struct> <2 fields> 
                                "start" <type: 0x41> 
                                "end" <type: 0x41>
TYPE_DECL 0x43 "foo" <struct> <3 fields> 
                                "a" <int> 
                                "b" <int> 
                                "c" <boolean>
TYPE_DECL 0x44 "twolines" <struct> <3 fields> 
                                "l1" <type:0x42> 
                                "l2" <type:0x42> 
                                "f" <type:0x43>

SAMPLE_DECL 0x40 "twolines" <flat signature>

TYPE_BINDING 0x40 0x44
\end{lstlisting}

Note that the id 0x40 is used both for the \verb+TYPE_DECL+ of
\verb+coord+ and the \verb+SAMPLE_DECL+ of \verb+twoline+, and that the
\verb+TYPE_BINDING+ binds the sample id \verb+0x40+ to the type id
\verb+0x44+.

Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
\subsection{Run-time behaviour}

When a sample type is registered on an encoder, a \verb+SAMPLE_DECL+
(i.e., the flat signature) is always generated on that encoder channel.

If the sample depends on user types (i.e., typedefs), \verb+TYPE_DECL+
packets are encoded, recursively, for the dependent types and a 
corresponding \verb+TYPE_BINDING+ is encoded.

If a \verb+TYPE_DECL+ is included via multiple sample types, or
dependency paths, an encoder may choose to only encode it once, but is
not required to do so. However, if multiple \verb+TYPE_DECL+ packets are
sent for the same \verb+typedef+, the encoder must use the same
\verb+type_id+.


488

Sven Robertz's avatar
Sven Robertz committed
489
490
491
492
493
494
495
496
497
498
499
500
\section{Ideas/Discussion}:

The labcomm language is more expressive than its target languages regarding data types.
E.g., labcomm can declare both arrays of arrays and matries where Java only has arrays of arrays
In the generated Java code, a labcomm matrix is implemented as an array of arrays.

Another case (not yet included) is unsigned types, which Java doesn't have. If we include
unsigned long in labcomm, that has a larger range of values than is possible to express using
Java primitive types. However, it is unlikely that the entire range is actually used, so one
way of supporting the common cases is to include run-time checks for overflow in the Java encoders
and decoders.

501
502
503
\bibliography{refs}{}
\bibliographystyle{plain}

Sven Robertz's avatar
Sven Robertz committed
504
\appendix
505
\newpage
506

Sven Robertz's avatar
Sven Robertz committed
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
\section{The LabComm language}
\label{sec:LanguageGrammar}

\subsection{Abstract syntax}
\begin{verbatim}
Program ::= Decl*;

abstract Decl ::= Type <Name:String>;
TypeDecl : Decl;
SampleDecl : Decl;

Field ::= Type <Name:String>;

abstract Type;
VoidType          : Type;
522
SampleRefType     : Type;
Sven Robertz's avatar
Sven Robertz committed
523
524
525
526
527
528
529
530
531
532
533
534
535
536
PrimType          : Type ::= <Name:String> <Token:int>;
UserType          : Type ::= <Name:String>;
StructType        : Type ::= Field*;
ParseArrayType    : Type ::= Type Dim*;
abstract ArrayType : Type ::= Type Exp*;
VariableArrayType : ArrayType;
FixedArrayType    : ArrayType;

Dim ::= Exp*;

abstract Exp;
IntegerLiteral : Exp ::= <Value:String>;
VariableSize : Exp;
\end{verbatim}
Sven Robertz's avatar
Sven Robertz committed
537

538
\newpage
Sven Robertz's avatar
Sven Robertz committed
539
540
541
\section{The LabComm protocol}
\label{sec:ProtocolGrammar}

542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
Each LabComm2014 packet has the layout
\begin{verbatim}
<id> <length> <data...>
\end{verbatim}
where \verb+length+ is the number of bytes of the \verb+data+ part
(i.e., excluding the \verb+id+ and \verb+length+ fields), and 
the \verb+id+ gives the layout of the \verb+data+ part as defined 
in \ref{sec:ConcreteGrammar}
\subsection{Data encoding}
LabComm primitive types are encoded as fixed width values, sent in
network order.  Type fields, user IDs, number of indices and lengths are
sent in a variable length (\emph{varint}) form.  A varint integer value
is sent as a sequence of bytes where the lower seven bits contain a
chunk of the actual number and the high bit indicates if more chunks
follow. The sequence of chunks are sent with the least significant chunk
first.  

The built-in data types are encoded as follows:
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
||Type      ||Encoding/Size                                      ||
||----------||---------------------------------------------------||
||boolean   ||  8 bits                                           ||
||byte      ||  8 bits                                           ||
||short     || 16 bits                                           ||
||integer   || 32 bits                                           ||
||long      || 64 bits                                           ||
||float     || 32 bits                                           ||
||double    || 64 bits                                           ||
||string    || length (varint), followed by UTF8 encoded string  ||
||array     || each variable index (varint),                     ||
||          || followed by encoded elements                      ||
||struct    || concatenation of encoding of each element         ||
||          || in declaration order                              ||
\end{lstlisting}
Sven Robertz's avatar
Sven Robertz committed
576

577
578
\subsection{Protocol grammar}
\label{sec:ConcreteGrammar}
579
580
581
582
583
584
585
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
<packet>       := <id> <length> ( <version>      | 
                                  <type_decl>    | 
                                  <sample_decl>  |
                                  <type_binding> |
                                  <sample_data> )
<version>      := <string>
Sven Gestegård Robertz's avatar
Sven Gestegård Robertz committed
586
587
<sample_decl>  := <sample_id> <string> <type>
<type_decl>    := <type_id> <string> <type>
588
589
590
591
592
593
594
595
<type_binding> := <sample_id> <type_id>
<user_id>      := 0x40..0xffffffff  
<sample_id> : <user_id>
<type_id>   : <user_id>
<string>       := <string_length> <char>*
<string_length>:= 0x00..0xffffffff  
<char>         := any UTF-8 char
<type>         := <length> ( <basic_type> | <array_decl> | <struct_decl> | <type_id> )
596
<basic_type>   := ( <void_type> | <boolean_type> | <byte_type> | <short_type> |
597
                  <integer_type> | <long_type> | <float_type> |
598
599
                  <double_type> | <string_type> | <sample_ref>)
<void_type>    := <struct_decl> 0 //void is encoded as empty struct
600
601
602
603
604
605
606
607
<boolean_type> := 0x20 
<byte_type>    := 0x21 
<short_type>   := 0x22 
<integer_type> := 0x23 
<long_type>    := 0x24 
<float_type>   := 0x25 
<double_type>  := 0x26 
<string_type>  := 0x27 
608
<sample_ref>   := 0x28 
609
610
611
612
613
614
615
616
<array_decl>   := 0x10  <number_of_indices> <indices> <type>
<number_of_indices> := 0x00..0xffffffff  
<indices>      := ( <variable_index> | <fixed_index> )*
<variable_index> := 0x00  
<fixed_index>  := 0x01..0xffffffff  
<struct_decl>  := 0x11  <number_of_fields> <field>*
<number_of_fields> := 0x00..0xffffffff  
<field>        := <string> <type>
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
<sample_data>  := packed sample data sent in network order, with
                  primitive type elements encoded according to
                  the sizes above
\end{lstlisting}
where the \verb+<id>+ in \verb+<packet>+ signals the type of payload,
and may be either a \verb+<sample_id>+ or a system packet id.
The labcomm sytem packet ids are:
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
version:      0x01 
sample_decl:  0x02 
type_decl:    0x03 
type_binding: 0x04          
\end{lstlisting}
Note that since the signature transmitted in a \verb+<sample_def>+ is
flattened, the \verb+<type>+ transmitted in a \verb+<sample_def>+ may
not contain any \verb+<type_id>+ fields.
Sven Robertz's avatar
Sven Robertz committed
633
\end{document}