You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2017/10/25 12:47:53 UTC

[DISCUSS] bothE() on self-reference

The test suite doesn't seem to enforce behavior related to self-relating
edges. TinkerGraph does this:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().as('a').addE('self').to('a')
==>e[1][0-self->0]
gremlin> g.E()
==>e[1][0-self->0]
gremlin> g.V().bothE().count()
==>2

Should bothE() return 2 in this case or 1? I think that we've said in the
past that g.E() is the same as g.V().outE() or g.V().inE(), but not
necessarily g.V().bothE().  Thoughts?

Re: [DISCUSS] bothE() on self-reference

Posted by Stephen Mallette <sp...@gmail.com>.
This might have been a false alarm. I think the current logic exhibited by
TinkerGraph is the right way and we do have a test in the Structure suite
that enforces it. I didn't see it there when I first looked as it is in a
bit of an odd place: StarGraphTest. I will add a process test to help
enforce it there as I think traversal strategies can be used to subvert the
structure API sometimes.

On Thu, Oct 26, 2017 at 9:55 AM, Robert Dale <ro...@gmail.com> wrote:

> Maybe outE() and inE() should only return non-self-referencing edges. Then
> there should be a new step called selfE(). And every time it's called, it
> posts a picture of Gremlin holding that edge to Facebook.  :-)
>
> Hmm.. it seems like you made the engineering case.  What's the user
> perspective?  I don't know. I don't use self-referencing edges.  I think
> logically it would return what I expected.  But then again, I don't think
> like most users.  Why would a user expect or even want it to show up only
> once? So a user would expect bothE() to return the set of inE and outE
> edges (unique) not the union of inE,outE?  I could see that. I'm not
> against it.  Just need to make sure that's clearly documented.
>
>
>
> Robert Dale
>
> On Thu, Oct 26, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru> wrote:
>
> > Well, if you want to get it duplicated, you can just do union(outE(),
> > inE()),
> > that's easy and inexpensive. However, any way to get rid of the
> duplicates
> > can be expensive:
> >
> >    - local(bothE().dedup())                                     // needs
> to
> >    keep track of all edges; requires internal memory structures
> >    - union(outE(), __.as("a").inE().not(where(outV().as("a")))) //
> enables
> >    partial path tracking; again that requires internal memory structures
> >
> > If we / providers implement the deduplication though, we wouldn't require
> > any extra memory structures. Returning the duplicates makes sense to me
> > from an engineering perspective, but not from a user perspective.
> >
> > Cheers,
> > Daniel
> >
> >
> > On Wed, Oct 25, 2017 at 6:53 PM, Robert Dale <ro...@gmail.com> wrote:
> >
> > > I think that bothE() == union(outE(),inE()) and outE().count() +
> > > inE().count() == bothE().count().  If you don't want the
> self-referencing
> > > edge to be returned twice, then either make it a unidirected edge (if
> > > supported) so that it would still satisfy the two previous condition or
> > > dedup(). In either case, it's left to the user to determine what edges
> > are
> > > returned.
> > >
> > > Also, I think it makes sense that g.E() == g.V().outE().  It should not
> > be
> > > g.V().inE() due to potential for unidirected edges.
> > >
> > >
> > > Robert Dale
> > >
> > > On Wed, Oct 25, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru>
> wrote:
> > >
> > > > IMO it should return the edge only once.
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > >
> > > > On Wed, Oct 25, 2017 at 5:47 AM, Stephen Mallette <
> > spmallette@gmail.com>
> > > > wrote:
> > > >
> > > > > The test suite doesn't seem to enforce behavior related to
> > > self-relating
> > > > > edges. TinkerGraph does this:
> > > > >
> > > > > gremlin> g = TinkerGraph.open().traversal()
> > > > > ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
> > > > > gremlin> g.addV().as('a').addE('self').to('a')
> > > > > ==>e[1][0-self->0]
> > > > > gremlin> g.E()
> > > > > ==>e[1][0-self->0]
> > > > > gremlin> g.V().bothE().count()
> > > > > ==>2
> > > > >
> > > > > Should bothE() return 2 in this case or 1? I think that we've said
> in
> > > the
> > > > > past that g.E() is the same as g.V().outE() or g.V().inE(), but not
> > > > > necessarily g.V().bothE().  Thoughts?
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] bothE() on self-reference

Posted by Robert Dale <ro...@gmail.com>.
Maybe outE() and inE() should only return non-self-referencing edges. Then
there should be a new step called selfE(). And every time it's called, it
posts a picture of Gremlin holding that edge to Facebook.  :-)

Hmm.. it seems like you made the engineering case.  What's the user
perspective?  I don't know. I don't use self-referencing edges.  I think
logically it would return what I expected.  But then again, I don't think
like most users.  Why would a user expect or even want it to show up only
once? So a user would expect bothE() to return the set of inE and outE
edges (unique) not the union of inE,outE?  I could see that. I'm not
against it.  Just need to make sure that's clearly documented.



Robert Dale

On Thu, Oct 26, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru> wrote:

> Well, if you want to get it duplicated, you can just do union(outE(),
> inE()),
> that's easy and inexpensive. However, any way to get rid of the duplicates
> can be expensive:
>
>    - local(bothE().dedup())                                     // needs to
>    keep track of all edges; requires internal memory structures
>    - union(outE(), __.as("a").inE().not(where(outV().as("a")))) // enables
>    partial path tracking; again that requires internal memory structures
>
> If we / providers implement the deduplication though, we wouldn't require
> any extra memory structures. Returning the duplicates makes sense to me
> from an engineering perspective, but not from a user perspective.
>
> Cheers,
> Daniel
>
>
> On Wed, Oct 25, 2017 at 6:53 PM, Robert Dale <ro...@gmail.com> wrote:
>
> > I think that bothE() == union(outE(),inE()) and outE().count() +
> > inE().count() == bothE().count().  If you don't want the self-referencing
> > edge to be returned twice, then either make it a unidirected edge (if
> > supported) so that it would still satisfy the two previous condition or
> > dedup(). In either case, it's left to the user to determine what edges
> are
> > returned.
> >
> > Also, I think it makes sense that g.E() == g.V().outE().  It should not
> be
> > g.V().inE() due to potential for unidirected edges.
> >
> >
> > Robert Dale
> >
> > On Wed, Oct 25, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru> wrote:
> >
> > > IMO it should return the edge only once.
> > >
> > > Cheers,
> > > Daniel
> > >
> > >
> > > On Wed, Oct 25, 2017 at 5:47 AM, Stephen Mallette <
> spmallette@gmail.com>
> > > wrote:
> > >
> > > > The test suite doesn't seem to enforce behavior related to
> > self-relating
> > > > edges. TinkerGraph does this:
> > > >
> > > > gremlin> g = TinkerGraph.open().traversal()
> > > > ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
> > > > gremlin> g.addV().as('a').addE('self').to('a')
> > > > ==>e[1][0-self->0]
> > > > gremlin> g.E()
> > > > ==>e[1][0-self->0]
> > > > gremlin> g.V().bothE().count()
> > > > ==>2
> > > >
> > > > Should bothE() return 2 in this case or 1? I think that we've said in
> > the
> > > > past that g.E() is the same as g.V().outE() or g.V().inE(), but not
> > > > necessarily g.V().bothE().  Thoughts?
> > > >
> > >
> >
>

Re: [DISCUSS] bothE() on self-reference

Posted by Daniel Kuppitz <me...@gremlin.guru>.
Well, if you want to get it duplicated, you can just do union(outE(), inE()),
that's easy and inexpensive. However, any way to get rid of the duplicates
can be expensive:

   - local(bothE().dedup())                                     // needs to
   keep track of all edges; requires internal memory structures
   - union(outE(), __.as("a").inE().not(where(outV().as("a")))) // enables
   partial path tracking; again that requires internal memory structures

If we / providers implement the deduplication though, we wouldn't require
any extra memory structures. Returning the duplicates makes sense to me
from an engineering perspective, but not from a user perspective.

Cheers,
Daniel


On Wed, Oct 25, 2017 at 6:53 PM, Robert Dale <ro...@gmail.com> wrote:

> I think that bothE() == union(outE(),inE()) and outE().count() +
> inE().count() == bothE().count().  If you don't want the self-referencing
> edge to be returned twice, then either make it a unidirected edge (if
> supported) so that it would still satisfy the two previous condition or
> dedup(). In either case, it's left to the user to determine what edges are
> returned.
>
> Also, I think it makes sense that g.E() == g.V().outE().  It should not be
> g.V().inE() due to potential for unidirected edges.
>
>
> Robert Dale
>
> On Wed, Oct 25, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru> wrote:
>
> > IMO it should return the edge only once.
> >
> > Cheers,
> > Daniel
> >
> >
> > On Wed, Oct 25, 2017 at 5:47 AM, Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > The test suite doesn't seem to enforce behavior related to
> self-relating
> > > edges. TinkerGraph does this:
> > >
> > > gremlin> g = TinkerGraph.open().traversal()
> > > ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
> > > gremlin> g.addV().as('a').addE('self').to('a')
> > > ==>e[1][0-self->0]
> > > gremlin> g.E()
> > > ==>e[1][0-self->0]
> > > gremlin> g.V().bothE().count()
> > > ==>2
> > >
> > > Should bothE() return 2 in this case or 1? I think that we've said in
> the
> > > past that g.E() is the same as g.V().outE() or g.V().inE(), but not
> > > necessarily g.V().bothE().  Thoughts?
> > >
> >
>

Re: [DISCUSS] bothE() on self-reference

Posted by Robert Dale <ro...@gmail.com>.
I think that bothE() == union(outE(),inE()) and outE().count() +
inE().count() == bothE().count().  If you don't want the self-referencing
edge to be returned twice, then either make it a unidirected edge (if
supported) so that it would still satisfy the two previous condition or
dedup(). In either case, it's left to the user to determine what edges are
returned.

Also, I think it makes sense that g.E() == g.V().outE().  It should not be
g.V().inE() due to potential for unidirected edges.


Robert Dale

On Wed, Oct 25, 2017 at 9:32 AM, Daniel Kuppitz <me...@gremlin.guru> wrote:

> IMO it should return the edge only once.
>
> Cheers,
> Daniel
>
>
> On Wed, Oct 25, 2017 at 5:47 AM, Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > The test suite doesn't seem to enforce behavior related to self-relating
> > edges. TinkerGraph does this:
> >
> > gremlin> g = TinkerGraph.open().traversal()
> > ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
> > gremlin> g.addV().as('a').addE('self').to('a')
> > ==>e[1][0-self->0]
> > gremlin> g.E()
> > ==>e[1][0-self->0]
> > gremlin> g.V().bothE().count()
> > ==>2
> >
> > Should bothE() return 2 in this case or 1? I think that we've said in the
> > past that g.E() is the same as g.V().outE() or g.V().inE(), but not
> > necessarily g.V().bothE().  Thoughts?
> >
>

Re: [DISCUSS] bothE() on self-reference

Posted by Daniel Kuppitz <me...@gremlin.guru>.
IMO it should return the edge only once.

Cheers,
Daniel


On Wed, Oct 25, 2017 at 5:47 AM, Stephen Mallette <sp...@gmail.com>
wrote:

> The test suite doesn't seem to enforce behavior related to self-relating
> edges. TinkerGraph does this:
>
> gremlin> g = TinkerGraph.open().traversal()
> ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
> gremlin> g.addV().as('a').addE('self').to('a')
> ==>e[1][0-self->0]
> gremlin> g.E()
> ==>e[1][0-self->0]
> gremlin> g.V().bothE().count()
> ==>2
>
> Should bothE() return 2 in this case or 1? I think that we've said in the
> past that g.E() is the same as g.V().outE() or g.V().inE(), but not
> necessarily g.V().bothE().  Thoughts?
>