You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2018/06/04 21:29:12 UTC

Re: [DISCUSS] Serialization Symmetry [was: [DISCUSS] Handling of problematic GraphSON types]

Jorge, you sound like you have a pretty strong feeling about this issue so
I'm fine to stick to your direction. I really don't feel strongly about it
either way and since .NET isn't my strong suit I'll defer to you on this
one.

On Wed, May 30, 2018 at 11:18 AM Jorge Bay Gondra <jo...@gmail.com>
wrote:

> > There are also some cases where it logically makes some sense to safely
> have one but not the other. A GLV likely doesn't need to support a Bytecode
> deserializer because it doesn't receive bytecode from the server
>
> Agree, I don't think it's necessary to support types deserialization for
> types that are never going to be sent from the server, like some of the
> types under Graph Process
> <http://tinkerpop.apache.org/docs/current/dev/io/#_graph_process>.
>
> I also agree that for GLVs that have a more limited type system (like
> JavaScript or Python), we should do what is best for the user and solve it
> case by case.
>
> I wanted to stress the need for symmetry for GLVs where we have rich type
> systems (in this case .NET/Java) for Core and Extended types, for which
> supporting deserialization and not serialization can cause obscure errors,
> like:
> TinkerPop "Type A" is deserialized as "Type C1" in Gremlin-X GLV, but "Type
> C1" instances can't be serialized to "Type A".
>
> TypeC1 value = g.V().has("name", "jorge").value("propA").next();
> // The following would fail
> g.V().has("name", "jorge").property("propA", value).next();
>
> I think in this case, it's preferred to have a 1-to-1 mappings or no
> mapping at all (implementors/vendors could support it, if interested).
>
>
> 2018-05-30 12:45 GMT+02:00 Stephen Mallette <sp...@gmail.com>:
>
> > I think the original thread spread off in too many different directions.
> > I'm going to leave that original one to talk about future binary format
> > stuff, type deprecation, etc. and make this new one to focus on getting
> > this PR to close:
> >
> > https://github.com/apache/tinkerpop/pull/842
> >
> > which is currently stuck on whether or not it is important for us to have
> > symmetry in serialization (i.e. everything a GLV can serialize must also
> be
> > deserialized). I'll paste up my last thoughts on that from my previous
> post
> > below:
> >
> > >  Regarding serialization and deserialization asymmetry on GLVs (for
> Core
> > and Extended types), I think we should avoid it as it could lead to
> > obscure error
> > messages on the user side.
> >
> > In the past, I think TinkerPop (going back to 2.x) has been ok with it
> and
> > I'm not so sure that I recall any specific problems that were every
> voiced
> > by users on the subject. As it stands, I think we already have some
> > asymmetry in gremlin-python so there is some precedent for it. There are
> > also some cases where it logically makes some sense to safely have one
> but
> > not the other. A GLV likely doesn't need to support a Bytecode
> deserializer
> > because it doesn't receive bytecode from the server. It only needs to
> send
> > bytecode and thus only has a serializer - at least until we have GVMs
> > instead of GLVs :)  Does that change your thinking at all Jorge?
> >
> >
> >
> >
> >
> > On Tue, May 29, 2018 at 12:45 PM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > >  Regarding serialization and deserialization asymmetry on GLVs (for
> > > Core and Extended types), I think we should avoid it as it could lead
> to
> > > obscure error messages on the user side.
> > >
> > > In the past, I think TinkerPop (going back to 2.x) has been ok with it
> > and
> > > I'm not so sure that I recall any specific problems that were every
> > voiced
> > > by users on the subject. As it stands, I think we already have some
> > > asymmetry in gremlin-python so there is some precedent for it. There
> are
> > > also some cases where it logically makes some sense to safely have one
> > but
> > > not the other. A GLV likely doesn't need to support a Bytecode
> > deserializer
> > > because it doesn't receive bytecode from the server. It only needs to
> > send
> > > bytecode and thus only has a serializer - at least until we have GVMs
> > > instead of GLVs :)  Does that change your thinking at all Jorge?
> > >
> > > >   First would be: Gremlin should not concern itself with storage
> > > schemas.....
> > >
> > > I like all of Robert's first paragraph because it makes Jorge's binary
> > > format proposal that much easier to get right. JanusGraph, DSE Graph
> and
> > > others won't have any trouble with this approach because the backend
> will
> > > simply know that the particular property that this number is going into
> > > will be a float and will coerce it as such on storage. I just wonder
> > > exactly how graphs that don't ' have schemas like neo4j/tinkergraph
> will
> > > deal with someone sending a "Number". What happens in that case?
> > >
> > > On Mon, May 28, 2018 at 4:20 AM, Florian Hockmann <
> > fh@florian-hockmann.de>
> > > wrote:
> > >
> > >> > these should be dropped: Class (unless this is used for something
> > >> important? Too many results on 'Class'
> > >> in the codebase.
> > >>
> > >> 'Class' is for example used for 'withoutStrategies' but I agree that
> > this
> > >> would probably better handled just as a string. 'Class' is
> Java-specific
> > >> which doesn't make much sense when graph providers want to implement
> > >> TinkerPop in another language than Java.
> > >>
> > >> Apart from that, I'm not sure I get your reasoning behind dropping
> types
> > >> like Date, Int32, and float. It's really trivial in most languages to
> > add
> > >> serializers for more numerical types so I don't really see why we
> should
> > >> drop them when they make the storage more efficient and reduce the
> need
> > for
> > >> type castings in user code.
> > >> For Date, you say that it's just a long. Sure, but how does the
> receiver
> > >> know that the long should be deserialized to a Date in this case? As a
> > user
> > >> I want to work with a Date object and not just with a long. Also, we
> > >> nevertheless need a convention of what this long represents:
> > Milliseconds
> > >> since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since
> December
> > >> 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is
> > >> basically just this convention which is why I would keep it.
> > >>
> > >> > There should be a boolean (which seems to be completely missing??).
> > >>
> > >> Yeah, boolean and string are both just serialized without type
> > >> information right now. Maybe we want to change that if we ever
> introduce
> > >> GraphSON 4.
> > >>
> > >>
> > >> Jorge's suggestion to drop all extended types except for the five he
> > >> listed sounds like a good idea to me. I would only add dropping of
> > either
> > >> Timestamp or Date from Core and probably also Class, like Robert
> > suggested.
> > >>
> > >> [1]
> > >> https://en.wikipedia.org/wiki/Epoch_%28reference_date%29#
> > Notable_epoch_dates_in_computing
> > >>
> > >> -----Ursprüngliche Nachricht-----
> > >> Von: Robert Dale <ro...@gmail.com>
> > >> Gesendet: Freitag, 25. Mai 2018 15:43
> > >> An: dev@tinkerpop.apache.org
> > >> Betreff: Re: [DISCUSS] Handling of problematic GraphSON types
> > >>
> > >> There should be a guiding principle on this to make these decisions
> > >> clearer.  First would be: Gremlin should not concern itself with
> storage
> > >> schemas. As an extension of that, Gremlin should not concern itself
> with
> > >> storage size. Next would be: Gremlin should not be Java-specific.
> > Finally,
> > >> it should be hard to add a new type, i.e. it's demonstratively
> > difficult to
> > >> do a real world traversal without this type, how GLVs would map it,
> what
> > >> functions on that type should be a part of Gremlin, and n>1 people
> > >> positively affirm this direction.
> > >>
> > >> Thus, there should be a minimal Core on which most else can be built.
> > >> All extended types should be dropped. From Core, these should be
> > dropped:
> > >> Class (unless this is used for something important? Too many results
> on
> > >> 'Class'
> > >> in the codebase. Otherwise, it's just a string), Date (is a long),
> > >> Timestamp (is a long, what's the diff to Date anyway?).  There should
> be
> > >> one floating point type which is 64-bit. There should be one integer
> > type
> > >> which is 64-bit. There should be a boolean (which seems to be
> completely
> > >> missing??).
> > >>
> > >>
> > >> Robert Dale
> > >>
> > >> On Fri, May 25, 2018 at 3:37 AM, Jorge Bay Gondra <
> > >> jorgebaygondra@gmail.com>
> > >> wrote:
> > >>
> > >> > Thanks Florian for starting the discussion on this topic!
> > >> >
> > >> > I think its a good exercise to evaluate which types are necessary
> for
> > >> > a GLV to support.
> > >> >
> > >> > I went through a similar exercise when designing the binary
> > >> > serialization format. I'll go ahead and propose:
> > >> > All types that are considered "Core", "Graph Structure" and "Graph
> > >> Process"
> > >> > in GraphSON3
> > >> > <http://tinkerpop.apache.org/docs/current/dev/io/#_core_2>
> > >> > plus the following from the "Extended" list:
> > >> > - Short
> > >> > - Byte
> > >> > - ByteBuffer
> > >> > - BigInteger
> > >> > - BigDecimal
> > >> >
> > >> > The rationale is to select types that *can't be represented and
> > >> > stored* using other types.
> > >> > For example:
> > >> > - Short can be stored using an int backing field, but it would take
> > >> > twice the space.
> > >> > - BigDecimal can be stored using a ByteBuffer but ordering on a
> buffer
> > >> > doesn't align with decimal ordering.
> > >> >
> > >> > Regarding serialization and deserialization asymmetry on GLVs (for
> > >> > Core and Extended types), I think we should avoid it as it could
> lead
> > >> > to obscure error messages on the user side.
> > >> >
> > >> > I think we should provide a comprehensive type representation but it
> > >> > doesn't have to be contain any type imaginable. The Gremlin Server
> and
> > >> > the GLVs provide extension mechanisms that vendors and users can use
> > >> > to support other types.
> > >> >
> > >> > 2018-05-24 14:31 GMT+02:00 Florian Hockmann <fh@florian-hockmann.de
> >:
> > >> >
> > >> > > As part of the discussion for the pull request by Daniel C. Weber
> > >> > > that
> > >> > adds
> > >> > > support for more extended GraphSON types to Gremlin.Net [1] we
> > >> > > identified several of those types to be problematic for non-Java
> > >> > > languages (or at least for .NET in this case) as they don't really
> > >> > > have counterparts in other languages and for some it was even
> > >> > > difficult to say where they differ
> > >> > from
> > >> > > each other.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Now the question is basically what we want to do with those
> > >> > > problematic types.
> > >> > >
> > >> > >
> > >> > >
> > >> > > My suggestion would be an approach like this:
> > >> > >
> > >> > > 1.      Identify types that are problematic and that we therefore
> > >> don't
> > >> > > want
> > >> > > to support across all GLVs.
> > >> > > 2.      Communicate to users somehow which types are problematic
> > >> > (something
> > >> > > like a deprecation) as we won't support them in all GLVs and maybe
> > >> > > even stop supporting them at all at some point in the future.
> > >> > > 3.      Support the remaining types in all GLVs.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Does that sound like a good plan? Are there any good ideas for the
> > >> > > deprecation of those problematic types? My first idea would be to
> > >> > > put
> > >> > them
> > >> > > in a different section in the I/O docs [2] that explains at the
> > >> > > beginning that and why they are deprecated, but maybe someone here
> > >> > > has a better
> > >> > idea.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Another question that was brought up during the review of the
> > >> > > mentioned
> > >> > PR
> > >> > > by Jorge was whether types should only be supported symmetrically
> or
> > >> > > whether GLVs should try to support types as good as they can. If
> > >> > > someone has good arguments or a strong opinion for either side
> then
> > >> > > it would of course
> > >> > also
> > >> > > be good to hear them.
> > >> > >
> > >> > > To give a concrete example of what is meant by symmetric support:
> > >> > >
> > >> > > In its current form the PR deserializes both GraphSON types
> > >> > > gx:Duration
> > >> > and
> > >> > > gx:Period to the .NET type TimeSpan and it serializes TimeSpan
> back
> > >> > > to gx:Duration. This means that gx:Duration is supported
> > >> > > symmetrically, but gx:Period is not as there exists no .NET
> > >> > > serializer that create a gx:Period.
> > >> > >
> > >> > >
> > >> > >
> > >> > > [1] https://github.com/apache/tinkerpop/pull/842
> > >> > >
> > >> > > [2] http://tinkerpop.apache.org/docs/current/dev/io/#_extended_2
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >
> >
>