Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
L
LabComm
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Sven Gestegård Robertz
LabComm
Commits
019e5a77
Commit
019e5a77
authored
10 years ago
by
Sven Gestegård Robertz
Browse files
Options
Downloads
Patches
Plain Diff
started comparing with Avro and EDN
parent
5f273d10
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/tech_report.tex
+107
-0
107 additions, 0 deletions
doc/tech_report.tex
with
107 additions
and
0 deletions
doc/tech_report.tex
+
107
−
0
View file @
019e5a77
...
@@ -539,6 +539,113 @@ Java primitive types. However, it is unlikely that the entire range is actually
...
@@ -539,6 +539,113 @@ Java primitive types. However, it is unlikely that the entire range is actually
way of supporting the common cases is to include run-time checks for overflow in the Java encoders
way of supporting the common cases is to include run-time checks for overflow in the Java encoders
and decoders.
and decoders.
\section
{
Related work
}
Two in-band self-descibing communication protocols are Apache
Avro
\cite
{
avro
}
and EDN, the extensible data notation developed for
Clojure and Datomic
\cite
{
EDN
}
.
EDN encodes
\emph
{
values
}
as UTF-8 strings. The documentation says
``edn is a system for the conveyance of values. It is not a type system,
and has no schemas.'' That said, it is
\emph
{
extensible
}
in the sense
that it has a special
\emph
{
dispatch charachter
}
,
\verb
+
#
+
, which can
be used to add a
\emph
{
tag
}
to a value. A tag indicates a semantic
interpretation of a value, and that allows the reader to support
handlers for specific tags, enabling functionality similar to that of
labcomm.
\subsection
{
Apache Avro
}
Apache Avro is similar to LabComm in that it has a textual language
for declaring data, a binary protocol for transmitting data, and code
generation for several languages.
\subsubsection*
{
Data types
}
In the table, the Avro type names are listed, and matched to the
corresponding LabComm type:
\begin{tabular}
{
|l|c|c|
}
\hline
Type
&
Labcomm
&
Avro
\\
\hline
Primitive types
\\
\hline
int
&
4 bytes
&
varint
\\
long
&
8 bytes
&
varint
\\
float
&
4 bytes
&
4 bytes
\\
long
&
8 bytes
&
8 bytes
\\
string
&
varint + utf8[]
&
varint + utf8[]
\\
bytes
&
varint + byte[]
&
varint + byte[]
\\
\hline
Complex types
\\
\hline
struct/record
&
concat of fields
&
concat of fields
\\
arrays
&
varIdx[] : elements
&
block[]
\\
map
&
n/a
&
block[]
\\
union
&
n/a
&
(varint idx) : value
\\
fixed
&
byte[n]
&
the number of bytes declared in
the schema
\\
\hline
\end{tabular}
where
\begin{verbatim}
block ::= (varint count) : elem[count] [*1]
count == 0 --> no more blocks
[*1] for arrays, count == 0 --> end of array
if count < 0, there are |count| elements
preceded by a varint block
_
size to allow
fast skipping
\end{verbatim}
In maps, keys are strings, and values according to the schema.
In unions, the index indicates the kind of value and the
value is encoded according to the schema.
Note that the Avro data type
\verb
+
bytes
+
corresponds to the
LabComm declaration
\verb
+
byte[_]
+
, i.e. a varaible length byte array.
\subsubsection*
{
the wire protocol
}
\begin{tabular}
{
|l|c|c|
}
\hline
What
&
LabComm
&
Avro
\\
\hline
Data description
&
Binary signature
&
JSON schema
\\
Signature sent only once
&
posible
&
possible (stateful)
\\
Signature sent with each sample
&
possible
&
possible (stateless)
\\
Data encoding
&
binary
&
binary
\\
\hline
\end{tabular}
Both avro and labcomm use varints when encoding data, similar in that
they both send a sequence of bytes containing 7 bit chunks (with the
eight bit signalling more chunks to come), but they differ in range,
endianness and signedness.
\begin{verbatim}
LabComm Avro
unsigned 32 bit signed zig-zag coding
most significant chunk least significant chunk
first first
0 -> 00 0 -> 00
1 -> 01 -1 -> 01
2 -> 02 1 -> 02
... -2 -> 03
2 -> 04
...
127 -> 7f -64 -> 7f
128 -> 81 00 64 -> 80 01
129 -> 81 01 -65 -> 81 01
130 -> 81 02 65 -> 82 01
... ...
\end{verbatim}
\bibliography
{
refs
}{}
\bibliography
{
refs
}{}
\bibliographystyle
{
plain
}
\bibliographystyle
{
plain
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment