tech_report.tex 11.8 KB
 Sven Robertz committed May 17, 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 % *** en embryo of a technical report describing the labcomm design rationale and implementation *** \documentclass[a4paper]{article} %\usepackage{verbatims} %\usepackage{todo} \begin{document} \title{Labcomm tech report} \author{Anders Blomdell and Sven Gesteg\aa{}rd Robertz } \date{embryo of draft, \today} \maketitle \begin{abstract} LabComm is a binary protocol suitable for transmitting and storing samples of process data. It is self-describing and independent of programming language, processor, and network used (e.g., byte order, etc). It is primarily intended for situations where the overhead of communication has to be kept at a minimum, hence LabComm only requires one-way communication to operate. The one-way operation also has the added benefit of making LabComm suitable as a storage format. LabComm provides self-describing channels, as communication starts with the transmission of an encoded description of all possible sample types that can occur, followed by any number of actual samples in any order the sending application sees fit. The LabComm system is based on a binary protocol and and a compiler that generates encoder/decoder routines for popular languages including C, Java, and Python. The LabComm compiler accepts type and sample declarations in a small language that is similar to C or Java type-declarations. \end{abstract} \section{Introduction} %[[http://rfc.net/rfc1057.html|Sun RPC]] %[[http://asn1.org|ASN1]]. LabComm has got it's inspiration from Sun RPC~\cite{SunRPC} and ASN1~\cite{ANS1}. LabComm is primarily intended for situations where the overhead of communication has to be kept at a minimum, hence LabComm only requires one-way communication to operate. The one-way operation also has the added benefit of making LabComm suitable as a storage format. \section{Communication model} LabComm provides self-describing communication channels, by always transmitting a machine readable description of the data before actual data is sent. Therefore, communication on a LabComm channel has two phases \begin{enumerate} \item the transmission of signatures (an encoded description including data types and names, see appendix~\ref{sec:ProtocolGrammar} for details) for all sample types that can be sent on the channel \item the transmission of any number of actual samples in any order \end{enumerate} During operation, LabComm will ensure (i.e., monitor) that a communication channel is fully configured, meaning that both ends agree on what messages that may be passed over that channel. If an unregistered sample type is sent or received, the LabComm encoder or decoder will detect it and take action. The roles in setting up, and maintaining, the configuration of a channel are as follows: \paragraph{The application software} (or higher-level protocol) is required to \begin{itemize} \item register all samples to be sent on a channel with the encoder \item register handlers for all samples to be received on a channel with the decoder \end{itemize} \paragraph{The transmitter (encoder)} \begin{itemize} \item ensures that the signature of a sample is transmitted on the channel before samples are written to that channel \end{itemize} \paragraph{The receiver (decoder)} \begin{itemize} \item checks, for each signature, that the application has registered a handler for that sample type \item if an unhandled signature is received, pauses the channel and informs the application \end{itemize} \section{The Labcomm language} The following examples do not cover the entire language specification (see appendix~\ref{language_grammar}), but might serve as a gentle introduction to the LabComm language. \subsection{Primitive types} \begin{verbatim} sample boolean a_boolean; sample byte a_byte; sample short a_short; sample int an_int; sample long a_long; sample float a_float; sample double a_double; sample string a_string; \end{verbatim} \subsection{Arrays} \begin{verbatim} sample int fixed_array[3]; sample int variable_array[_]; // Note 1 sample int fixed_array_of_array[3][4]; // Note 2 sample int fixed_rectangular_array[3, 4]; // Note 2 sample int variable_array_of_array[_][_]; // Notes 1 & 2 sample int variable_rectangular_array[_, _]; // Notes 1 & 2 \end{verbatim} \begin{enumerate} \item In contrast to C, LabComm supports both fixed and variable (denoted by \verb+_+) sized arrays. \item In contrast to Java, LabComm supports multidimensional arrays and not only arrays of arrays. \end{enumerate} \subsection{Structures} \begin{verbatim} sample struct { int an_int_field; double a_double_field; } a_struct; \end{verbatim} \section{User defined types} \begin{verbatim} typedef struct { int field_1; byte field_2; } user_type[_]; sample user_type a_user_type_instance; sample user_type another_user_type_instance; \end{verbatim} \section{User actions} User actions (similar to ioctl()) allowing the application or a higher level protocol to communicate with the underlying transport layer through the LabComm encoder A special case of this is a specific action informing the underlying transport that a signature is being sent (to allow handshaking) \section{LabComm is not...} \begin{itemize} \item a protocol for two-way connections \item intrinsically supporting reliable communication \item providing semantic service-descriptions \end{itemize} But \begin{itemize} \item it is suitable for the individual channels of a structured connection \item the user action mechanism allows using feature of different transport layers through labcomm (i.e., it allows encapsulation of the transport layer) \item the names of samples can be chosen and mapped according to a suitable taxonomy or ontology \end{itemize} \section{Example and its encoding} With the following example.lc file: \begin{verbatim} sample struct { int sequence; struct { boolean last; string data; } line[_]; } log_message; sample float data; \end{verbatim} and this \verb+example_encoder.c+ file \begin{verbatim} #include #include #include #include #include #include "example.h" int main(int argc, char *argv[]) { int fd; struct labcomm_encoder *encoder; int i, j; fd = open("example.encoded", O_WRONLY|O_CREAT|O_TRUNC, 0644); encoder = labcomm_encoder_new(labcomm_fd_writer, &fd); labcomm_encoder_register_example_log_message(encoder); labcomm_encoder_register_example_data(encoder); for (i = 0 ; i < argc ; i++) { example_log_message message; message.sequence = i + 1; message.line.n_0 = i; message.line.a = malloc(message.line.n_0*sizeof(message.line)); for (j = 0 ; j < i ; j++) { message.line.a[j].last = (j == message.line.n_0 - 1); message.line.a[j].data = argv[j + 1]; } labcomm_encode_example_log_message(encoder, &message); free(message.line.a); } for (i = 0 ; i < argc ; i++) { float f = i; labcomm_encode_example_data(encoder, &f); } } \end{verbatim} Running \verb+./example_encoder one two+, will yield the following result in example.encoded: \begin{verbatim} 00000000 02 40 0b 6c 6f 67 5f 6d 65 73 73 61 67 65 11 02 |.@.log_message..| 00000010 08 73 65 71 75 65 6e 63 65 23 04 6c 69 6e 65 10 |.sequence#.line.| 00000020 01 00 11 02 04 6c 61 73 74 20 04 64 61 74 61 27 |.....last .data'| 00000030 02 41 04 64 61 74 61 25 40 00 00 00 01 00 40 00 |.A.data%@.....@.| 00000040 00 00 02 01 01 03 6f 6e 65 40 00 00 00 03 02 00 |......one@......| 00000050 03 6f 6e 65 01 03 74 77 6f 41 00 00 00 00 41 3f |.one..twoA....A?| 00000060 80 00 00 41 40 00 00 00 |...A@...| 00000068 \end{verbatim} \section{Ideas/Discussion}: The labcomm language is more expressive than its target languages regarding data types. E.g., labcomm can declare both arrays of arrays and matries where Java only has arrays of arrays In the generated Java code, a labcomm matrix is implemented as an array of arrays. Another case (not yet included) is unsigned types, which Java doesn't have. If we include unsigned long in labcomm, that has a larger range of values than is possible to express using Java primitive types. However, it is unlikely that the entire range is actually used, so one way of supporting the common cases is to include run-time checks for overflow in the Java encoders and decoders. \appendix  Sven Robertz committed May 17, 2013 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 \section{The LabComm language} \label{sec:LanguageGrammar} \subsection{Abstract syntax} \begin{verbatim} Program ::= Decl*; abstract Decl ::= Type ; TypeDecl : Decl; SampleDecl : Decl; Field ::= Type ; abstract Type; VoidType : Type; PrimType : Type ::= ; UserType : Type ::= ; StructType : Type ::= Field*; ParseArrayType : Type ::= Type Dim*; abstract ArrayType : Type ::= Type Exp*; VariableArrayType : ArrayType; FixedArrayType : ArrayType; Dim ::= Exp*; abstract Exp; IntegerLiteral : Exp ::= ; VariableSize : Exp; \end{verbatim}  Sven Robertz committed May 17, 2013 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343  \section{The LabComm protocol} \label{sec:ProtocolGrammar} \begin{verbatim} := ( | | )* := 0x01 ''(packed)'' := 0x02 ''(packed)'' := 0x60..0xffffffff ''(packed)'' := * := 0x00..0xffffffff ''(packed)'' := any UTF-8 char := ( | | | ) := ( | | | | | | | ) := 0x20 ''(packed)'' := 0x21 ''(packed)'' := 0x22 ''(packed)'' := 0x23 ''(packed)'' := 0x24 ''(packed)'' := 0x25 ''(packed)'' := 0x26 ''(packed)'' := 0x27 ''(packed)'' := 0x10 ''(packed)'' := 0x00..0xffffffff ''(packed)'' := ( | )* := 0x00 ''(packed)'' := 0x01..0xffffffff ''(packed)'' := 0x11 ''(packed)'' * := 0x00..0xffffffff ''(packed)'' := := := is sent in network order, sizes are as follows: ||Type ||Encoding/Size || ||---------------||------------------------------------------------------|| ||boolean || 8 bits || ||byte || 8 bits || ||short || 16 bits || ||integer || 32 bits || ||long || 64 bits || ||float || 32 bits || ||double || 64 bits || ||string || length ''(packed)'', followed by UTF8 encoded string || ||array || each variable index ''(packed)'', || || || followed by encoded elements || ||struct || concatenation of encoding of each element || || || in declaration order || \end{verbatim} Type fields, user IDs, number of indices and lengths are sent in a packed, or variable length, form. An integer is sent as a sequence of bytes where the lower seven bits contain a chunk of the actual number and the high bit indicates if more chunks follow. The sequence of chunks are sent with the least significant chunk first. (The numbers encoded in this form are indicated above with \textit{(packed)}.) \end{document}