You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Dmitry Novikov <dm...@neueda.com> on 2019/06/03 08:49:05 UTC

Re: [DISCUSS] Null Handling 3.5.x

Hello Stephen,

Sounds like a great idea!

One more use case, returning `null` object in case property does not exist:

g.V().limit(1).coalesce(values('notSureIfExists'), constant(Null.instance()))

This would be very useful when working with steps that may fail on not existing value. For example `project` step:

gremlin> g.V().limit(1).project('a', 'b').by(values('name')).by(values('notSureIfExists'))
The provided traverser does not map to a value: v[1]->[PropertiesStep([notSureIfExists],value)]

Could be improved:

gremlin> g.V().limit(1).project('a', 'b').by(values('name')).by(coalesce(values('notSureIfExists'), constant(Null.instance())))
==>[a:marko,b:null]

`null` is better than any custom constant here, because it clearly represents a missing value.

To avoid necessity for `null` guards, it could be defined: "steps that take `null` as input will produce `null` as output":

gremlin> g.inject(Null.instance()).id()
==> null
gremlin> g.inject(Null.instance()).math("_ + 1")
==> null
gremlin> g.inject(Null.instance()).properties().as('a').key()
==> null

I see two approaches how to handle `null` in aggregation steps:

Steps like `max`, `count`... may either fail on `null` object, requiring to use predicate:

gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
Max step does not work with `null` values
gremlin> g.inject(1).inject(Null.instance()).inject(3).is(neq(Null.instance())).max()
==>3

Alternatively exclude `null` values from calculation:

gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
==>3
gremlin> g.inject(1).inject(Null.instance()).inject(3).count()
==>2

On 2019/05/31 17:01:34, Stephen Mallette <sp...@gmail.com> wrote: 
> I just spent some time fixing:
> 
> https://issues.apache.org/jira/browse/TINKERPOP-2099
> 
> which dealt with inconsistencies in null handling for property() step when
> there is a null value. That's all nice now, but null handling still isn't
> so good overall. It's generally inconsistent in how it behaves in a variety
> of uses in Gremlin - here's a couple examples:
> 
> gremlin> g.inject(null)
> java.lang.NullPointerException
> Type ':help' or ':h' for help.
> Display stack trace? [yN]n
> gremlin> g.V().constant(null)
> gremlin>
> 
> I've also heard the concern on several occasions that mutation traversals
> are often difficult to write when you want to remove a property and update
> others at the same time, because it forces you into conditional logic where
> you have to somehow work in a side effect of property("name").drop() as
> opposed to just inlining property('name',null).
> 
> I think we should be a bit more respectful of the concept of null with
> Gremlin and while we probably shouldn't allow a literal null into the
> traversal stream, it seems like we could provide for our own Null class
> that could be used in it's place where users/providers needed it, so that
> we could do:
> 
> gremlin> g.inject(Null.instance())
> ==> null
> gremlin> g.V(1).property("x", 1).property("y",
> Null.instance()).property("z", 2))
> ==> v[1]
> 
> Perhaps we'd add a new Graph.Feature to allow providers to specify how they
> handle such things. Taking this approach creates a position where we aren't
> really changing core engine behavior. Instead, we're just adding a marker
> that can be used by providers/Gremlin to identify the notion of null and
> then updating serialization/GLVs to support it.
> 
> Haven't thought much past that point. Any other implications of taking this
> direction?
> 

Re: [DISCUSS] Null Handling 3.5.x

Posted by Stephen Mallette <sp...@gmail.com>.
For further discussion/planning:

https://issues.apache.org/jira/browse/TINKERPOP-2235

UNSET is interesting. Perhaps it's too specific to cassandra for us to
include in TinkerPop though. I'd be curious if there are other graph
systems that are not backed by cassandra that might make use of such a
feature.

On Mon, Jun 3, 2019 at 6:21 AM Jorge Bay Gondra <jo...@gmail.com>
wrote:

> I think having a null literal makes sense. It plays well with existent SQL
> providers, where there are representations for null:
> https://docs.microsoft.com/en-us/dotnet/api/system.dbnull.value
>
> I would propose another literal: UNSET. On some db providers, there's a
> distinction between NULL, that causes a write to occur (overwriting
> existing value), and UNSET, which is ignored at write time.
>
> See UNSET ticket in Cassandra:
> https://issues.apache.org/jira/browse/CASSANDRA-7304
>
> On Mon, Jun 3, 2019 at 10:49 AM Dmitry Novikov <dm...@neueda.com>
> wrote:
>
> > Hello Stephen,
> >
> > Sounds like a great idea!
> >
> > One more use case, returning `null` object in case property does not
> exist:
> >
> > g.V().limit(1).coalesce(values('notSureIfExists'),
> > constant(Null.instance()))
> >
> > This would be very useful when working with steps that may fail on not
> > existing value. For example `project` step:
> >
> > gremlin> g.V().limit(1).project('a',
> > 'b').by(values('name')).by(values('notSureIfExists'))
> > The provided traverser does not map to a value:
> > v[1]->[PropertiesStep([notSureIfExists],value)]
> >
> > Could be improved:
> >
> > gremlin> g.V().limit(1).project('a',
> > 'b').by(values('name')).by(coalesce(values('notSureIfExists'),
> > constant(Null.instance())))
> > ==>[a:marko,b:null]
> >
> > `null` is better than any custom constant here, because it clearly
> > represents a missing value.
> >
> > To avoid necessity for `null` guards, it could be defined: "steps that
> > take `null` as input will produce `null` as output":
> >
> > gremlin> g.inject(Null.instance()).id()
> > ==> null
> > gremlin> g.inject(Null.instance()).math("_ + 1")
> > ==> null
> > gremlin> g.inject(Null.instance()).properties().as('a').key()
> > ==> null
> >
> > I see two approaches how to handle `null` in aggregation steps:
> >
> > Steps like `max`, `count`... may either fail on `null` object, requiring
> > to use predicate:
> >
> > gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
> > Max step does not work with `null` values
> > gremlin>
> >
> g.inject(1).inject(Null.instance()).inject(3).is(neq(Null.instance())).max()
> > ==>3
> >
> > Alternatively exclude `null` values from calculation:
> >
> > gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
> > ==>3
> > gremlin> g.inject(1).inject(Null.instance()).inject(3).count()
> > ==>2
> >
> > On 2019/05/31 17:01:34, Stephen Mallette <sp...@gmail.com> wrote:
> > > I just spent some time fixing:
> > >
> > > https://issues.apache.org/jira/browse/TINKERPOP-2099
> > >
> > > which dealt with inconsistencies in null handling for property() step
> > when
> > > there is a null value. That's all nice now, but null handling still
> isn't
> > > so good overall. It's generally inconsistent in how it behaves in a
> > variety
> > > of uses in Gremlin - here's a couple examples:
> > >
> > > gremlin> g.inject(null)
> > > java.lang.NullPointerException
> > > Type ':help' or ':h' for help.
> > > Display stack trace? [yN]n
> > > gremlin> g.V().constant(null)
> > > gremlin>
> > >
> > > I've also heard the concern on several occasions that mutation
> traversals
> > > are often difficult to write when you want to remove a property and
> > update
> > > others at the same time, because it forces you into conditional logic
> > where
> > > you have to somehow work in a side effect of property("name").drop() as
> > > opposed to just inlining property('name',null).
> > >
> > > I think we should be a bit more respectful of the concept of null with
> > > Gremlin and while we probably shouldn't allow a literal null into the
> > > traversal stream, it seems like we could provide for our own Null class
> > > that could be used in it's place where users/providers needed it, so
> that
> > > we could do:
> > >
> > > gremlin> g.inject(Null.instance())
> > > ==> null
> > > gremlin> g.V(1).property("x", 1).property("y",
> > > Null.instance()).property("z", 2))
> > > ==> v[1]
> > >
> > > Perhaps we'd add a new Graph.Feature to allow providers to specify how
> > they
> > > handle such things. Taking this approach creates a position where we
> > aren't
> > > really changing core engine behavior. Instead, we're just adding a
> marker
> > > that can be used by providers/Gremlin to identify the notion of null
> and
> > > then updating serialization/GLVs to support it.
> > >
> > > Haven't thought much past that point. Any other implications of taking
> > this
> > > direction?
> > >
> >
>

Re: [DISCUSS] Null Handling 3.5.x

Posted by Jorge Bay Gondra <jo...@gmail.com>.
I think having a null literal makes sense. It plays well with existent SQL
providers, where there are representations for null:
https://docs.microsoft.com/en-us/dotnet/api/system.dbnull.value

I would propose another literal: UNSET. On some db providers, there's a
distinction between NULL, that causes a write to occur (overwriting
existing value), and UNSET, which is ignored at write time.

See UNSET ticket in Cassandra:
https://issues.apache.org/jira/browse/CASSANDRA-7304

On Mon, Jun 3, 2019 at 10:49 AM Dmitry Novikov <dm...@neueda.com>
wrote:

> Hello Stephen,
>
> Sounds like a great idea!
>
> One more use case, returning `null` object in case property does not exist:
>
> g.V().limit(1).coalesce(values('notSureIfExists'),
> constant(Null.instance()))
>
> This would be very useful when working with steps that may fail on not
> existing value. For example `project` step:
>
> gremlin> g.V().limit(1).project('a',
> 'b').by(values('name')).by(values('notSureIfExists'))
> The provided traverser does not map to a value:
> v[1]->[PropertiesStep([notSureIfExists],value)]
>
> Could be improved:
>
> gremlin> g.V().limit(1).project('a',
> 'b').by(values('name')).by(coalesce(values('notSureIfExists'),
> constant(Null.instance())))
> ==>[a:marko,b:null]
>
> `null` is better than any custom constant here, because it clearly
> represents a missing value.
>
> To avoid necessity for `null` guards, it could be defined: "steps that
> take `null` as input will produce `null` as output":
>
> gremlin> g.inject(Null.instance()).id()
> ==> null
> gremlin> g.inject(Null.instance()).math("_ + 1")
> ==> null
> gremlin> g.inject(Null.instance()).properties().as('a').key()
> ==> null
>
> I see two approaches how to handle `null` in aggregation steps:
>
> Steps like `max`, `count`... may either fail on `null` object, requiring
> to use predicate:
>
> gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
> Max step does not work with `null` values
> gremlin>
> g.inject(1).inject(Null.instance()).inject(3).is(neq(Null.instance())).max()
> ==>3
>
> Alternatively exclude `null` values from calculation:
>
> gremlin> g.inject(1).inject(Null.instance()).inject(3).max()
> ==>3
> gremlin> g.inject(1).inject(Null.instance()).inject(3).count()
> ==>2
>
> On 2019/05/31 17:01:34, Stephen Mallette <sp...@gmail.com> wrote:
> > I just spent some time fixing:
> >
> > https://issues.apache.org/jira/browse/TINKERPOP-2099
> >
> > which dealt with inconsistencies in null handling for property() step
> when
> > there is a null value. That's all nice now, but null handling still isn't
> > so good overall. It's generally inconsistent in how it behaves in a
> variety
> > of uses in Gremlin - here's a couple examples:
> >
> > gremlin> g.inject(null)
> > java.lang.NullPointerException
> > Type ':help' or ':h' for help.
> > Display stack trace? [yN]n
> > gremlin> g.V().constant(null)
> > gremlin>
> >
> > I've also heard the concern on several occasions that mutation traversals
> > are often difficult to write when you want to remove a property and
> update
> > others at the same time, because it forces you into conditional logic
> where
> > you have to somehow work in a side effect of property("name").drop() as
> > opposed to just inlining property('name',null).
> >
> > I think we should be a bit more respectful of the concept of null with
> > Gremlin and while we probably shouldn't allow a literal null into the
> > traversal stream, it seems like we could provide for our own Null class
> > that could be used in it's place where users/providers needed it, so that
> > we could do:
> >
> > gremlin> g.inject(Null.instance())
> > ==> null
> > gremlin> g.V(1).property("x", 1).property("y",
> > Null.instance()).property("z", 2))
> > ==> v[1]
> >
> > Perhaps we'd add a new Graph.Feature to allow providers to specify how
> they
> > handle such things. Taking this approach creates a position where we
> aren't
> > really changing core engine behavior. Instead, we're just adding a marker
> > that can be used by providers/Gremlin to identify the notion of null and
> > then updating serialization/GLVs to support it.
> >
> > Haven't thought much past that point. Any other implications of taking
> this
> > direction?
> >
>