tech_report.tex 13.2 KB
Newer Older
Sven Robertz's avatar
Sven Robertz committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
% *** en embryo of a technical report describing the labcomm design rationale and implementation ***

\documentclass[a4paper]{article}
%\usepackage{verbatims}
%\usepackage{todo}

\begin{document}
\title{Labcomm tech report}
\author{Anders Blomdell and Sven Gesteg\aa{}rd Robertz }
\date{embryo of draft, \today}

\maketitle 

\begin{abstract}

LabComm is a binary protocol suitable for transmitting and storing samples of
process data. It is self-describing and independent of programming language,
processor, and network used (e.g., byte order, etc).  It is primarily intended
for situations where the overhead of communication has to be kept at a minimum,
hence LabComm only requires one-way communication to operate. The one-way
operation also has the added benefit of making LabComm suitable as a storage
format. 

LabComm provides self-describing channels, as communication starts with the
transmission of an encoded description of all possible sample types that can
occur, followed by any number of actual samples in any order the sending
application sees fit.

The LabComm system is based on a binary protocol and
and a compiler that generates encoder/decoder routines for popular languages
including C, Java, and Python.

The LabComm compiler accepts type and sample declarations in a small language
that is similar to C or Java type-declarations. 
\end{abstract}
\section{Introduction}

%[[http://rfc.net/rfc1057.html|Sun RPC]]
%[[http://asn1.org|ASN1]].

LabComm has got it's inspiration from Sun RPC~\cite{SunRPC}
and ASN1~\cite{ANS1}. LabComm is primarily intended for situations
where the overhead of communication has to be kept at a minimum, hence LabComm
only requires one-way communication to operate. The one-way operation also has
the added benefit of making LabComm suitable as a storage format. 

\section{Communication model}

LabComm provides self-describing communication channels, by always transmitting
a machine readable description of the data before actual data is sent.
Therefore, communication on a LabComm channel has two phases

\begin{enumerate}
\item the transmission of signatures (an encoded description including data
types and names, see appendix~\ref{sec:ProtocolGrammar} for details) for all sample types
that can be sent on the channel
\item the transmission of any number of actual samples in any order
\end{enumerate}

During operation, LabComm will ensure (i.e., monitor) that a communication
channel is fully configured, meaning that both ends agree on what messages that
may be passed over that channel.  If an unregistered sample type is sent or
received, the LabComm encoder or decoder will detect it and take action.

The roles in setting up, and maintaining, the configuration of a channel are as follows:

\paragraph{The application software} (or higher-level protocol) is required to

\begin{itemize}
\item register all samples to be sent on a channel with the encoder
\item register handlers for all samples to be received  on a channel with the decoder
\end{itemize}

\paragraph{The transmitter (encoder)}

\begin{itemize}
 \item ensures that the signature of a sample is transmitted on the channel before samples are 
       written to that channel
\end{itemize}

\paragraph{The receiver (decoder)}

\begin{itemize}
 \item checks, for each signature, that the application has registered a handler for that sample type
 \item if an unhandled signature is received, pauses the channel and informs the application
\end{itemize}

\section{The Labcomm language}

The following examples do not cover the entire language
specification (see appendix~\ref{language_grammar}), but might serve as a
gentle introduction to the LabComm language.

\subsection{Primitive types}

\begin{verbatim}
  sample boolean a_boolean;
  sample byte a_byte;
  sample short a_short;
  sample int an_int;
  sample long a_long;
  sample float a_float;
  sample double a_double;
  sample string a_string;
\end{verbatim}

\subsection{Arrays}

\begin{verbatim}
  sample int fixed_array[3];
  sample int variable_array[_];                // Note 1
  sample int fixed_array_of_array[3][4];       // Note 2
  sample int fixed_rectangular_array[3, 4];    // Note 2
  sample int variable_array_of_array[_][_];    // Notes 1 & 2
  sample int variable_rectangular_array[_, _]; // Notes 1 & 2
\end{verbatim}

\begin{enumerate}
\item In contrast to C, LabComm supports both fixed and variable (denoted
by \verb+_+) sized arrays.

\item In contrast to Java, LabComm supports multidimensional arrays and not
only arrays of arrays.

\end{enumerate}

\subsection{Structures}

\begin{verbatim}
  sample struct {
    int an_int_field;
    double a_double_field;
  } a_struct;
\end{verbatim}

\section{User defined types}

\begin{verbatim}
  typedef struct {
    int field_1;
    byte field_2;
  } user_type[_];
  sample user_type a_user_type_instance;
  sample user_type another_user_type_instance;
\end{verbatim}

147
\section{The LabComm system}
Sven Robertz's avatar
Sven Robertz committed
148

149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
The LabComm system consists of a compiler for generating code from the data
descriptions, and libraries providing LabComm communication facilities in,
currently, C, Java, Python, and C#.

\subsection{The LabComm compiler}

\subsection{The LabComm library}

The LabComm libraries contain functionality for the end-to-end transmission
of samples. They are divided into two layers, where the upper layer implements
the general encoding and decoding of samples, and the lower one deals with 
the transmission of the encoded byte stream on a particular transport layer.

Thus, the LabComm communication stack looks like this:
\begin{verbatim}
    _______________________
    |     Application     |
    +---------------------+
    | encoder  | decoder  |    to/from labcomm encoded byte stream
    +----------+----------+
    | writer   | reader   |    transmit byte stream over particular transport
    +----------+----------+
    | transport layer / OS|
    +---------------------+
\end{verbatim}

\subsubsection{LabComm actions}

(similar to ioctl()) 
The encoder/writer and decoder/reader interfaces consist of a set of actions

One example of this is that there is a a separate writer action for
transmitting signatures, allowing the writer to treat a signature differently
from encoded samples, e.g., to allow handshaking during channel setup.

User actions allow the application or a higher level
Sven Robertz's avatar
Sven Robertz committed
185
protocol to communicate with the underlying transport layer through the LabComm
186
encoder. 
Sven Robertz's avatar
Sven Robertz committed
187

188
189
190
One example is reliable communication, which is controlled from the application
but needs to be implemented for each transport at at the reader/writer level.
(Or not, e.g., TCP)
Sven Robertz's avatar
Sven Robertz committed
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292

\section{LabComm is not...}

\begin{itemize}
\item a protocol for two-way connections
\item intrinsically supporting reliable communication 
\item providing semantic service-descriptions
\end{itemize}

But

\begin{itemize}
\item it is suitable for the individual channels of a structured connection
\item the user action mechanism allows using feature of different transport layers
  through labcomm (i.e., it allows encapsulation of the transport layer)
\item the names of samples can be chosen and mapped according to a suitable taxonomy or ontology
\end{itemize}



\section{Example and its encoding}

With the following `example.lc` file:

\begin{verbatim}
sample struct {
  int sequence;
  struct {
    boolean last;
    string data;
  } line[_];
} log_message;
sample float data;
\end{verbatim}

and this \verb+example_encoder.c+ file 

\begin{verbatim}
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <labcomm_fd_reader.h>
#include <labcomm_fd_writer.h>
#include "example.h"

int main(int argc, char *argv[]) {
  int fd;
  struct labcomm_encoder *encoder;
  int i, j;

  fd = open("example.encoded", O_WRONLY|O_CREAT|O_TRUNC, 0644);
  encoder = labcomm_encoder_new(labcomm_fd_writer, &fd);
  labcomm_encoder_register_example_log_message(encoder);
  labcomm_encoder_register_example_data(encoder);
  for (i = 0 ; i < argc ; i++) {
    example_log_message message;

    message.sequence = i + 1;
    message.line.n_0 = i;
    message.line.a = malloc(message.line.n_0*sizeof(message.line));
    for (j = 0 ; j < i ; j++) {
      message.line.a[j].last = (j == message.line.n_0 - 1);
      message.line.a[j].data = argv[j + 1];
    }
    labcomm_encode_example_log_message(encoder, &message);
    free(message.line.a);
  }
  for (i = 0 ; i < argc ; i++) {
    float f = i;
    labcomm_encode_example_data(encoder, &f);
  }
}
\end{verbatim}

Running \verb+./example_encoder one two+, will yield the following result in example.encoded:


\begin{verbatim}
00000000  02 40 0b 6c 6f 67 5f 6d  65 73 73 61 67 65 11 02  |.@.log_message..|
00000010  08 73 65 71 75 65 6e 63  65 23 04 6c 69 6e 65 10  |.sequence#.line.|
00000020  01 00 11 02 04 6c 61 73  74 20 04 64 61 74 61 27  |.....last .data'|
00000030  02 41 04 64 61 74 61 25  40 00 00 00 01 00 40 00  |.A.data%@.....@.|
00000040  00 00 02 01 01 03 6f 6e  65 40 00 00 00 03 02 00  |......one@......|
00000050  03 6f 6e 65 01 03 74 77  6f 41 00 00 00 00 41 3f  |.one..twoA....A?|
00000060  80 00 00 41 40 00 00 00                           |...A@...|
00000068
\end{verbatim}


\section{Ideas/Discussion}:

The labcomm language is more expressive than its target languages regarding data types.
E.g., labcomm can declare both arrays of arrays and matries where Java only has arrays of arrays
In the generated Java code, a labcomm matrix is implemented as an array of arrays.

Another case (not yet included) is unsigned types, which Java doesn't have. If we include
unsigned long in labcomm, that has a larger range of values than is possible to express using
Java primitive types. However, it is unlikely that the entire range is actually used, so one
way of supporting the common cases is to include run-time checks for overflow in the Java encoders
and decoders.

\appendix
Sven Robertz's avatar
Sven Robertz committed
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
\section{The LabComm language}
\label{sec:LanguageGrammar}

\subsection{Abstract syntax}
\begin{verbatim}
Program ::= Decl*;

abstract Decl ::= Type <Name:String>;
TypeDecl : Decl;
SampleDecl : Decl;

Field ::= Type <Name:String>;

abstract Type;
VoidType          : Type;
PrimType          : Type ::= <Name:String> <Token:int>;
UserType          : Type ::= <Name:String>;
StructType        : Type ::= Field*;
ParseArrayType    : Type ::= Type Dim*;
abstract ArrayType : Type ::= Type Exp*;
VariableArrayType : ArrayType;
FixedArrayType    : ArrayType;

Dim ::= Exp*;

abstract Exp;
IntegerLiteral : Exp ::= <Value:String>;
VariableSize : Exp;
\end{verbatim}
Sven Robertz's avatar
Sven Robertz committed
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380

\section{The LabComm protocol}
\label{sec:ProtocolGrammar}

\begin{verbatim}
<packet> := ( <type_decl> | <sample_decl> | <sample_data> )*
<type_decl> := 0x01 ''(packed)'' <user_id> <string> <type>
<sample_decl> := 0x02 ''(packed)''<user_id> <string> <type>
<user_id> := 0x60..0xffffffff  ''(packed)''
<string> := <string_length> <char>*
<string_length> := 0x00..0xffffffff  ''(packed)''
<char> := any UTF-8 char
<type> := ( <basic_type> | <user_id> | <array_decl> | <struct_decl> )
<basic_type> := ( <boolean_type> | <byte_type> | <short_type> |
                  <integer_type> | <long_type> | <float_type> | 
                  <double_type> | <string_type> )
<boolean_type> := 0x20 ''(packed)''
<byte_type> := 0x21 ''(packed)''
<short_type> := 0x22 ''(packed)''
<integer_type> := 0x23 ''(packed)''
<long_type> := 0x24 ''(packed)''
<float_type> := 0x25 ''(packed)''
<double_type> := 0x26 ''(packed)''
<string_type> := 0x27 ''(packed)''
<array_decl> := 0x10 ''(packed)'' <number_of_indices> <indices> <type>
<number_of_indices> := 0x00..0xffffffff  ''(packed)''
<indices> := ( <variable_index> | <fixed_index> )*
<variable_index> := 0x00  ''(packed)''
<fixed_index> := 0x01..0xffffffff  ''(packed)''
<struct_decl> := 0x11 ''(packed)'' <number_of_fields> <field>*
<number_of_fields> := 0x00..0xffffffff  ''(packed)''
<field> := <string> <type>
<sample_data> := <user_id> <packed_sample_data>
<packed_sample_data> := is sent in network order, sizes are as follows:

||Type           ||Encoding/Size                                         ||
||---------------||------------------------------------------------------||
||boolean        ||  8 bits                                              ||
||byte           ||  8 bits                                              ||
||short          || 16 bits                                              ||
||integer        || 32 bits                                              ||
||long           || 64 bits                                              ||
||float          || 32 bits                                              ||
||double         || 64 bits                                              ||
||string         || length ''(packed)'', followed by UTF8 encoded string ||
||array          || each variable index ''(packed)'',                    ||
||               || followed by encoded elements                         ||
||struct         || concatenation of encoding of each element            ||
||               || in declaration order                                 ||
\end{verbatim}

Type fields, user IDs, number of indices and lengths are sent in a packed, or
variable length, form.  An integer is sent as a sequence of bytes where the
lower seven bits contain a chunk of the actual number and the high bit
indicates if more chunks follow. The sequence of chunks are sent with the least
significant chunk first.  (The numbers encoded in this form are indicated above
with \textit{(packed)}.)

\end{document}