You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Jorge Bay Gondra <jo...@gmail.com> on 2017/10/24 13:27:05 UTC

[DISCUSS] Binary serialization format

Hi,
I wanted to bring up the possibility to include a specific (as in
non-generic) serialization formal for Graph data. I didn't want to
bump the last
dev discussion on serialization formats
<https://lists.apache.org/thread.html/f27d5cad1382a57c91a4e6486329d9d2d0146a571dc66f973c9ee9c5@%3Cdev.tinkerpop.apache.org%3E>
(I'm
late to the party :) ) as I wanted to dedicate a separate thread.

GraphSON2 and GraphSON3 have some nice features (readable, extensibility,
...) but it has some disadvantages, mainly around verbosity and
performance. About Gryo, I think the interoperability issues with Kryo
outside the JVM have already been discussed.

I'm proposing a binary serialization format that is specifically designed
for the types we handle (+ room for extensibility), with a compact payload
and fast serialization.

For example:
a) GraphSON3: {"@type": "g:Int32","@value":1}
byte length = 31
b) GraphBinary: <type_id><value> (a byte representing the type and 4 bytes
representing the value), in bytes: 0x01 0 0 0 0x2
byte length = 5

The serialization logic for each format is:
a) GraphSON3: perform generic json serialization; navigate through object
tree to read the type name; convert the string value to int32
representation.
b) GraphBinary: read first byte to get the type; move offset + 1; convert 4
bytes into a int32 representation.

Any thoughts?
Thanks,
Jorge

Re: [DISCUSS] Binary serialization format

Posted by Jorge Bay Gondra <jo...@gmail.com>.
I think there are inherent benefits into creating our own serialization
format, which can be based on existing conventions (like big endian format
for any numeric) but has the advantages of being minimally descriptive. For
complex composite types, we don't have to specify the meta properties that
are expected for a type, ie:

Vertex is currently serialized as:
{"@type": "g:Vertex, "@value": { "id": { /**/ }, "label": { /**/ },
properties: { /**/ }}}

As all vertices have an id, a label and properties, it can be described in
a binary format as 3 place holders: <id><label><properties> (when id
finishes, the label starts and so on...)

Furthermore, in the case of the label as it's always a string the
serialization format doesn't have to include the meta-type information.
That way it can be only as descriptive as needed, keeping it very compact
and easy to read.

On Wed, Oct 25, 2017 at 2:17 PM, Stephen Mallette <sp...@gmail.com>
wrote:

> We might want to use TinkerPop 3.x to experiment with the "perfect"
> serialization format for some distant future TinkerPop 4.x. We haven't
> quite gotten it completely right with anything that we have thus far, but
> on the other hand I don't think we'd conceived of the manner in which we
> use IO today back at the start of TinkerPop 3.x. One of the bad things we
> have going on is the proliferation of types that comes from coring our
> serialization type system around Java. That would be hopefully something
> resolved more nicely in TinkerPop 4.x, but I don't think there's much we
> could do about it for 3.x - we're sorta stuck with supporting all the stuff
> we have now. I do think it's worth taking a look at existing serialization
> formats before building our own, like the one Marko mention in the link
> initially posted in this thread, Neo4j's Bolt protocol, etc.
>
>
>
>
> On Tue, Oct 24, 2017 at 3:37 PM, David Brown <da...@gmail.com> wrote:
>
> > JSON is comfortable and easy, but something like this makes sense to
> > me. This idea could be easily extended to to request/response messages
> > as well. For example, the desired op ('eval', 'bytecode', 'close'
> > etc.) could be represented with a 4 bit group, etc. etc. This would
> > allow driver authors to do a lot of optimizations for better
> > performance. Python for instance, can use C extensions or Cython to
> > get huge performance gains working with binary data...
> >
> > On Tue, Oct 24, 2017 at 6:27 AM, Jorge Bay Gondra
> > <jo...@gmail.com> wrote:
> > > Hi,
> > > I wanted to bring up the possibility to include a specific (as in
> > > non-generic) serialization formal for Graph data. I didn't want to
> > > bump the last
> > > dev discussion on serialization formats
> > > <https://lists.apache.org/thread.html/f27d5cad1382a57c91a4e6486329d9
> > d2d0146a571dc66f973c9ee9c5@%3Cdev.tinkerpop.apache.org%3E>
> > > (I'm
> > > late to the party :) ) as I wanted to dedicate a separate thread.
> > >
> > > GraphSON2 and GraphSON3 have some nice features (readable,
> extensibility,
> > > ...) but it has some disadvantages, mainly around verbosity and
> > > performance. About Gryo, I think the interoperability issues with Kryo
> > > outside the JVM have already been discussed.
> > >
> > > I'm proposing a binary serialization format that is specifically
> designed
> > > for the types we handle (+ room for extensibility), with a compact
> > payload
> > > and fast serialization.
> > >
> > > For example:
> > > a) GraphSON3: {"@type": "g:Int32","@value":1}
> > > byte length = 31
> > > b) GraphBinary: <type_id><value> (a byte representing the type and 4
> > bytes
> > > representing the value), in bytes: 0x01 0 0 0 0x2
> > > byte length = 5
> > >
> > > The serialization logic for each format is:
> > > a) GraphSON3: perform generic json serialization; navigate through
> object
> > > tree to read the type name; convert the string value to int32
> > > representation.
> > > b) GraphBinary: read first byte to get the type; move offset + 1;
> > convert 4
> > > bytes into a int32 representation.
> > >
> > > Any thoughts?
> > > Thanks,
> > > Jorge
> >
> >
> >
> > --
> > David M. Brown
> > R.A. CulturePlex Lab, Western University
> >
>

Re: [DISCUSS] Binary serialization format

Posted by Stephen Mallette <sp...@gmail.com>.
We might want to use TinkerPop 3.x to experiment with the "perfect"
serialization format for some distant future TinkerPop 4.x. We haven't
quite gotten it completely right with anything that we have thus far, but
on the other hand I don't think we'd conceived of the manner in which we
use IO today back at the start of TinkerPop 3.x. One of the bad things we
have going on is the proliferation of types that comes from coring our
serialization type system around Java. That would be hopefully something
resolved more nicely in TinkerPop 4.x, but I don't think there's much we
could do about it for 3.x - we're sorta stuck with supporting all the stuff
we have now. I do think it's worth taking a look at existing serialization
formats before building our own, like the one Marko mention in the link
initially posted in this thread, Neo4j's Bolt protocol, etc.




On Tue, Oct 24, 2017 at 3:37 PM, David Brown <da...@gmail.com> wrote:

> JSON is comfortable and easy, but something like this makes sense to
> me. This idea could be easily extended to to request/response messages
> as well. For example, the desired op ('eval', 'bytecode', 'close'
> etc.) could be represented with a 4 bit group, etc. etc. This would
> allow driver authors to do a lot of optimizations for better
> performance. Python for instance, can use C extensions or Cython to
> get huge performance gains working with binary data...
>
> On Tue, Oct 24, 2017 at 6:27 AM, Jorge Bay Gondra
> <jo...@gmail.com> wrote:
> > Hi,
> > I wanted to bring up the possibility to include a specific (as in
> > non-generic) serialization formal for Graph data. I didn't want to
> > bump the last
> > dev discussion on serialization formats
> > <https://lists.apache.org/thread.html/f27d5cad1382a57c91a4e6486329d9
> d2d0146a571dc66f973c9ee9c5@%3Cdev.tinkerpop.apache.org%3E>
> > (I'm
> > late to the party :) ) as I wanted to dedicate a separate thread.
> >
> > GraphSON2 and GraphSON3 have some nice features (readable, extensibility,
> > ...) but it has some disadvantages, mainly around verbosity and
> > performance. About Gryo, I think the interoperability issues with Kryo
> > outside the JVM have already been discussed.
> >
> > I'm proposing a binary serialization format that is specifically designed
> > for the types we handle (+ room for extensibility), with a compact
> payload
> > and fast serialization.
> >
> > For example:
> > a) GraphSON3: {"@type": "g:Int32","@value":1}
> > byte length = 31
> > b) GraphBinary: <type_id><value> (a byte representing the type and 4
> bytes
> > representing the value), in bytes: 0x01 0 0 0 0x2
> > byte length = 5
> >
> > The serialization logic for each format is:
> > a) GraphSON3: perform generic json serialization; navigate through object
> > tree to read the type name; convert the string value to int32
> > representation.
> > b) GraphBinary: read first byte to get the type; move offset + 1;
> convert 4
> > bytes into a int32 representation.
> >
> > Any thoughts?
> > Thanks,
> > Jorge
>
>
>
> --
> David M. Brown
> R.A. CulturePlex Lab, Western University
>

Re: [DISCUSS] Binary serialization format

Posted by David Brown <da...@gmail.com>.
JSON is comfortable and easy, but something like this makes sense to
me. This idea could be easily extended to to request/response messages
as well. For example, the desired op ('eval', 'bytecode', 'close'
etc.) could be represented with a 4 bit group, etc. etc. This would
allow driver authors to do a lot of optimizations for better
performance. Python for instance, can use C extensions or Cython to
get huge performance gains working with binary data...

On Tue, Oct 24, 2017 at 6:27 AM, Jorge Bay Gondra
<jo...@gmail.com> wrote:
> Hi,
> I wanted to bring up the possibility to include a specific (as in
> non-generic) serialization formal for Graph data. I didn't want to
> bump the last
> dev discussion on serialization formats
> <https://lists.apache.org/thread.html/f27d5cad1382a57c91a4e6486329d9d2d0146a571dc66f973c9ee9c5@%3Cdev.tinkerpop.apache.org%3E>
> (I'm
> late to the party :) ) as I wanted to dedicate a separate thread.
>
> GraphSON2 and GraphSON3 have some nice features (readable, extensibility,
> ...) but it has some disadvantages, mainly around verbosity and
> performance. About Gryo, I think the interoperability issues with Kryo
> outside the JVM have already been discussed.
>
> I'm proposing a binary serialization format that is specifically designed
> for the types we handle (+ room for extensibility), with a compact payload
> and fast serialization.
>
> For example:
> a) GraphSON3: {"@type": "g:Int32","@value":1}
> byte length = 31
> b) GraphBinary: <type_id><value> (a byte representing the type and 4 bytes
> representing the value), in bytes: 0x01 0 0 0 0x2
> byte length = 5
>
> The serialization logic for each format is:
> a) GraphSON3: perform generic json serialization; navigate through object
> tree to read the type name; convert the string value to int32
> representation.
> b) GraphBinary: read first byte to get the type; move offset + 1; convert 4
> bytes into a int32 representation.
>
> Any thoughts?
> Thanks,
> Jorge



-- 
David M. Brown
R.A. CulturePlex Lab, Western University