You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by David Bechberger <da...@bechberger.com> on 2020/12/08 03:16:02 UTC

[DISCUSS] Add functions for casting properties

One of the recent items I have heard quite a few requests for, and had
several on and off conversations about, is the ability to cast values
within a Gremlin traversal.  Due to the schemaless nature of Gremlin it is
very easy to get into a situation where you have a single property that
contains different datatypes across elements and it would be very nice to
be able to create a consistent datatype to allow comparisons.  I'd like to
throw this proposal out here to see what people think.

What if we added a step to handle converting to a defined subset of
datatypes:


Function Signature:

cast(Enum, string)

Enum - This would be an enum containing the datatypes you can cast to:
* DT.short
* DT.int
* DT.long
* DT.float
* DT.double
* DT.char
* DT.boolean
* DT.string

string - The property name to cast

Example usages:

g.V().project('name', 'age').by(cast(DT.String, 'name')).by(cast(DT.int,
'age'))
g.V().order().by(cast(DT.int, 'age'))
g.V().group().by(cast(DT.int, 'age'))

So what happens if the value cannot be cast to the defined type:

I think that the default behavior in this case should be to throw an error.

I would like to provide two configuration options (via with)

with(CAST.remove) - in any case where the value cannot be cast the
traverser would be filtered out

with(CAST.default, -999) - in any case the value cannot be cast the
specified default value (e.g. -999) would be used.  This would reduce the
complexity and increase the readability of many queries by no longer
requiring the use of a coalesce() to handle creation of default values .
This would also enable customers to handle use cases where data is of
varying types which cannot be handled by the current coalesce() pattern.

Dave

Re: [DISCUSS] Add functions for casting properties

Posted by Kelvin Lawrence <gf...@yahoo.com.INVALID>.
The subject of casting does seem to come up fairly often and it does mainly seem to come up in cases where a property value type has been "overloaded". There is also an argument here for having some (optional) more formal  schema that could prevent things like a property called age from having values that are sometimes integers and sometimes strings in the same graph instantiation.
The other use case I have seen that supports casting is where all the values for a given property were initially created as, say, strings, and later people find they need to do true numeric comparisons on those values but are not able (for some reason) to run a process over the graph to just fix up all of the property values.

Cheers,Kelvin
Kelvin R. Lawrence 



    On Monday, December 14, 2020, 09:20:26 AM CST, Stephen Mallette <sp...@gmail.com> wrote:  
 
 Thanks for leading up another important Gremlin step discussion. Some
initial thoughts inline below:

On Mon, Dec 7, 2020 at 10:16 PM David Bechberger <da...@bechberger.com>
wrote:

> One of the recent items I have heard quite a few requests for, and had
> several on and off conversations about, is the ability to cast values
> within a Gremlin traversal.  Due to the schemaless nature of Gremlin it is
> very easy to get into a situation where you have a single property that
> contains different datatypes across elements and it would be very nice to
> be able to create a consistent datatype to allow comparisons.  I'd like to
> throw this proposal out here to see what people think.
>

I agree that type conversion is a consistent problem with Gremlin. Having a
form of casting makes sense to me.


> What if we added a step to handle converting to a defined subset of
> datatypes:
>
> Function Signature:
>
> cast(Enum, string)
>
> Enum - This would be an enum containing the datatypes you can cast to:
> * DT.short
> * DT.int
> * DT.long
> * DT.float
> * DT.double
> * DT.char
> * DT.boolean
> * DT.string
>

I'm a bit concerned with doubling down on Java number representations, but
perhaps we are stuck with that?? I'd love to think of a direction that
somehow makes Gremlin better with numbers and therefore less weird with
variants that don't align well with Java's view of numerics, rather than
even more formally encoding numbers to Java.


> string - The property name to cast
>

I'm not sure that i like this argument because it binds the cast() to an
incoming Element (and maybe Map) type. Any reason to not allow cast() to
work more generally?


> Example usages:
>
> g.V().project('name', 'age').by(cast(DT.String, 'name')).by(cast(DT.int,
> 'age'))
> g.V().order().by(cast(DT.int, 'age'))
> g.V().group().by(cast(DT.int, 'age'))
>

I suppose your reasoning is represented in the examples. You're considering
their use in the by() modulation and thus a succinct syntax as opposed to:

g.V().project('name',
'age').by(values('name').cast(DT.String)).by(values('age').cast(DT.int))

I don't mind that syntax myself all that much, especially since I don't
imagine we'd see this syntax used for every by() in a
project()/group()/etc. I would imagine that it would typically be used for
specific property keys where there were some mixed types. If we only cared
about Elements (and Map) for casting and we went with DT as you suggested
then I would prefer that we introduce a new by(String,DT) for this use
case, thus:

g.V().project('name', 'age').by('name', DT.String).by('age', DT.int)
g.V().order().by('age',DT.int)
g.V().group().by('age',DT.int)

which let's us avoid cast() step all together, but right now I sense cast()
as a step sorta needs to exist though as a general step.


> So what happens if the value cannot be cast to the defined type:
>
> I think that the default behavior in this case should be to throw an error.
>

agreed


> I would like to provide two configuration options (via with)
>
> with(CAST.remove) - in any case where the value cannot be cast the
> traverser would be filtered out
>
> with(CAST.default, -999) - in any case the value cannot be cast the
> specified default value (e.g. -999) would be used.  This would reduce the
> complexity and increase the readability of many queries by no longer
> requiring the use of a coalesce() to handle creation of default values .
> This would also enable customers to handle use cases where data is of
> varying types which cannot be handled by the current coalesce() pattern.
>

I see where you're going but I think I'll reserve further thoughts on this
part until some more discussion comes up on the points i've already made.


>
> Dave
>
  

Re: [DISCUSS] Add functions for casting properties

Posted by Stephen Mallette <sp...@gmail.com>.
Thanks for leading up another important Gremlin step discussion. Some
initial thoughts inline below:

On Mon, Dec 7, 2020 at 10:16 PM David Bechberger <da...@bechberger.com>
wrote:

> One of the recent items I have heard quite a few requests for, and had
> several on and off conversations about, is the ability to cast values
> within a Gremlin traversal.  Due to the schemaless nature of Gremlin it is
> very easy to get into a situation where you have a single property that
> contains different datatypes across elements and it would be very nice to
> be able to create a consistent datatype to allow comparisons.  I'd like to
> throw this proposal out here to see what people think.
>

I agree that type conversion is a consistent problem with Gremlin. Having a
form of casting makes sense to me.


> What if we added a step to handle converting to a defined subset of
> datatypes:
>
> Function Signature:
>
> cast(Enum, string)
>
> Enum - This would be an enum containing the datatypes you can cast to:
> * DT.short
> * DT.int
> * DT.long
> * DT.float
> * DT.double
> * DT.char
> * DT.boolean
> * DT.string
>

I'm a bit concerned with doubling down on Java number representations, but
perhaps we are stuck with that?? I'd love to think of a direction that
somehow makes Gremlin better with numbers and therefore less weird with
variants that don't align well with Java's view of numerics, rather than
even more formally encoding numbers to Java.


> string - The property name to cast
>

I'm not sure that i like this argument because it binds the cast() to an
incoming Element (and maybe Map) type. Any reason to not allow cast() to
work more generally?


> Example usages:
>
> g.V().project('name', 'age').by(cast(DT.String, 'name')).by(cast(DT.int,
> 'age'))
> g.V().order().by(cast(DT.int, 'age'))
> g.V().group().by(cast(DT.int, 'age'))
>

I suppose your reasoning is represented in the examples. You're considering
their use in the by() modulation and thus a succinct syntax as opposed to:

g.V().project('name',
'age').by(values('name').cast(DT.String)).by(values('age').cast(DT.int))

I don't mind that syntax myself all that much, especially since I don't
imagine we'd see this syntax used for every by() in a
project()/group()/etc. I would imagine that it would typically be used for
specific property keys where there were some mixed types. If we only cared
about Elements (and Map) for casting and we went with DT as you suggested
then I would prefer that we introduce a new by(String,DT) for this use
case, thus:

g.V().project('name', 'age').by('name', DT.String).by('age', DT.int)
g.V().order().by('age',DT.int)
g.V().group().by('age',DT.int)

which let's us avoid cast() step all together, but right now I sense cast()
as a step sorta needs to exist though as a general step.


> So what happens if the value cannot be cast to the defined type:
>
> I think that the default behavior in this case should be to throw an error.
>

agreed


> I would like to provide two configuration options (via with)
>
> with(CAST.remove) - in any case where the value cannot be cast the
> traverser would be filtered out
>
> with(CAST.default, -999) - in any case the value cannot be cast the
> specified default value (e.g. -999) would be used.  This would reduce the
> complexity and increase the readability of many queries by no longer
> requiring the use of a coalesce() to handle creation of default values .
> This would also enable customers to handle use cases where data is of
> varying types which cannot be handled by the current coalesce() pattern.
>

I see where you're going but I think I'll reserve further thoughts on this
part until some more discussion comes up on the points i've already made.


>
> Dave
>