You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/10/07 21:36:17 UTC

[DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Hello,

Check it:
	https://issues.apache.org/jira/browse/TINKERPOP3-872

GraphTraversal.groupCount() just uses GroupStep. Why is this so cool? Well, cause there are two groupCounts() -- GroupCountStep and GroupSideEffectStep. Two steps we don't have to maintain and we can focus our energies on further optimizing GroupStep and GroupSideEffectStep.

Thoughts?
 
<drop mic/>,
Marko.

http://markorodriguez.com


Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Stephen Mallette <sp...@gmail.com>.
"Graph Systems" does have increased bad-ass factor over "implementations"

On Thu, Oct 8, 2015 at 8:50 AM, Marko Rodriguez <ok...@gmail.com>
wrote:

> I was thinking of what we have to change to remove Vendor.
>
> In the code:
>
>         VendorOptimizationStrategy
>                 => ImplementerOptimizationStrategy ?
>                 => ProviderOptimizationStrategy ?
>
> In the documentation:
>
>         Graph System Vendor
>                 => Graph System Implementer ?
>                 => Graph System Provider ?
>                 => TinkerPop3-Enabled Graph System ?
>
> … yea, "provider" is much better than "implementer."
>
> For the homepage we have "Implementations" perhaps that could be
> "TinkerPop3-Enabled Graph Systems" …… ? Sounds a bit more "serious" than
> implementations…
>
> Marko.
>
> http://markorodriguez.com
>
> On Oct 8, 2015, at 5:29 AM, Stephen Mallette <sp...@gmail.com> wrote:
>
> >> implementer specific strategy
> >
> > Huh - James's offering from the "vendor VOTE" thread of "provider"
> actually
> > plugs in well there:
> >
> > provider specific strategy
> >
> >
> >
> > On Wed, Oct 7, 2015 at 9:40 PM, Marko Rodriguez <ok...@gmail.com>
> > wrote:
> >
> >> Hey,
> >>
> >>> What exactly is being optimized by replacing two classes with a single
> >>> class that has two modes?
> >>
> >> I want to make Gremlin's instruction set as small as possible. This way,
> >> implementers (previously known as vendors) don't have to write so many
> >> custom steps/strategies for their graph system. This is made readily
> >> apparent in this ticket:
> >>        https://issues.apache.org/jira/browse/TINKERPOP3-844
> >> Less "building blocks" and we can focus on optimizing the specific
> >> building blocks.
> >>
> >>> Are there cases where you can perform
> >>> optimizations without caring about the mode?
> >>
> >> Yes -- if we can make GroupStepMapReduce faster, well, we get it faster
> >> for both "steps." In essence, we can focus our energies on making one
> step
> >> super efficient. … not a great argument, but less "things" (to me) is
> >> better.
> >>
> >>> Why not an abstract GroupStep
> >>> with two concrete subclasses?
> >>
> >> This is possible and yes, instead of the "boolean" we can have this.
> >> Though, note that the "boolean" is just a hack right now. I'm sure we
> can
> >> figure out how to do it right. However, with two subclasses we are back
> to
> >> the instruction set being 4 steps when they could be 2 (GroupStep and
> >> GroupSideEffectStep).
> >>
> >>> My understanding of this code is somewhat
> >>> superficial, so this is more of a reflexive, object-oriented
> programmer's
> >>> reaction.
> >>
> >> I'm more coming from a virtual machine and TraversalStrategy perspective
> >> on this one. If this was just "all about Java" I would say, lets have
> more
> >> classes all cherry picked for each use case. However, this causes
> problems
> >> for implementors (I actually don't like this word over "vendor") as if
> they
> >> have a implementor (vendor) specific strategy for GroupStep, well now
> they
> >> have to have one for GroupCountStep, etc… Again, this is made more
> apparent
> >> in TINKERPOP3-844.
> >>
> >> Thoughts?,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >>
> >>>
> >>> On Wed, Oct 7, 2015 at 3:01 PM, Marko Rodriguez <ok...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> So I branched and implemented this.
> >>>>
> >> https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872
> >>>>
> >>>> Dead simple implementation.
> >>>>
> >>>>
> >>
> https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f
> >>>>
> >>>> The one thing I don't like is the GroupStep(boolean) constructor that
> >> asks
> >>>> "groupCount?" … If you look at GroupStep you will see why its needed.
> >>>>
> >>>>
> >>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
> >>>>
> >>>>
> >>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101
> >>>>
> >>>> If people like this, then we can put it into 3.1.0 and then we have 2
> >> less
> >>>> steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).
> >>>>
> >>>> Enjoy,
> >>>> Marko.
> >>>>
> >>>> http://markorodriguez.com
> >>>>
> >>>> On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> Check it:
> >>>>>     https://issues.apache.org/jira/browse/TINKERPOP3-872
> >>>>>
> >>>>> GraphTraversal.groupCount() just uses GroupStep. Why is this so cool?
> >>>> Well, cause there are two groupCounts() -- GroupCountStep and
> >>>> GroupSideEffectStep. Two steps we don't have to maintain and we can
> >> focus
> >>>> our energies on further optimizing GroupStep and GroupSideEffectStep.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> <drop mic/>,
> >>>>> Marko.
> >>>>>
> >>>>> http://markorodriguez.com
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Marko Rodriguez <ok...@gmail.com>.
I was thinking of what we have to change to remove Vendor.

In the code:

	VendorOptimizationStrategy
		=> ImplementerOptimizationStrategy ?
		=> ProviderOptimizationStrategy ?
	
In the documentation:

	Graph System Vendor
		=> Graph System Implementer ?
		=> Graph System Provider ?
		=> TinkerPop3-Enabled Graph System ?

… yea, "provider" is much better than "implementer."

For the homepage we have "Implementations" perhaps that could be "TinkerPop3-Enabled Graph Systems" …… ? Sounds a bit more "serious" than implementations…

Marko.

http://markorodriguez.com

On Oct 8, 2015, at 5:29 AM, Stephen Mallette <sp...@gmail.com> wrote:

>> implementer specific strategy
> 
> Huh - James's offering from the "vendor VOTE" thread of "provider" actually
> plugs in well there:
> 
> provider specific strategy
> 
> 
> 
> On Wed, Oct 7, 2015 at 9:40 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hey,
>> 
>>> What exactly is being optimized by replacing two classes with a single
>>> class that has two modes?
>> 
>> I want to make Gremlin's instruction set as small as possible. This way,
>> implementers (previously known as vendors) don't have to write so many
>> custom steps/strategies for their graph system. This is made readily
>> apparent in this ticket:
>>        https://issues.apache.org/jira/browse/TINKERPOP3-844
>> Less "building blocks" and we can focus on optimizing the specific
>> building blocks.
>> 
>>> Are there cases where you can perform
>>> optimizations without caring about the mode?
>> 
>> Yes -- if we can make GroupStepMapReduce faster, well, we get it faster
>> for both "steps." In essence, we can focus our energies on making one step
>> super efficient. … not a great argument, but less "things" (to me) is
>> better.
>> 
>>> Why not an abstract GroupStep
>>> with two concrete subclasses?
>> 
>> This is possible and yes, instead of the "boolean" we can have this.
>> Though, note that the "boolean" is just a hack right now. I'm sure we can
>> figure out how to do it right. However, with two subclasses we are back to
>> the instruction set being 4 steps when they could be 2 (GroupStep and
>> GroupSideEffectStep).
>> 
>>> My understanding of this code is somewhat
>>> superficial, so this is more of a reflexive, object-oriented programmer's
>>> reaction.
>> 
>> I'm more coming from a virtual machine and TraversalStrategy perspective
>> on this one. If this was just "all about Java" I would say, lets have more
>> classes all cherry picked for each use case. However, this causes problems
>> for implementors (I actually don't like this word over "vendor") as if they
>> have a implementor (vendor) specific strategy for GroupStep, well now they
>> have to have one for GroupCountStep, etc… Again, this is made more apparent
>> in TINKERPOP3-844.
>> 
>> Thoughts?,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> 
>>> 
>>> On Wed, Oct 7, 2015 at 3:01 PM, Marko Rodriguez <ok...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> So I branched and implemented this.
>>>> 
>> https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872
>>>> 
>>>> Dead simple implementation.
>>>> 
>>>> 
>> https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f
>>>> 
>>>> The one thing I don't like is the GroupStep(boolean) constructor that
>> asks
>>>> "groupCount?" … If you look at GroupStep you will see why its needed.
>>>> 
>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
>>>> 
>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101
>>>> 
>>>> If people like this, then we can put it into 3.1.0 and then we have 2
>> less
>>>> steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).
>>>> 
>>>> Enjoy,
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com>
>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> Check it:
>>>>>     https://issues.apache.org/jira/browse/TINKERPOP3-872
>>>>> 
>>>>> GraphTraversal.groupCount() just uses GroupStep. Why is this so cool?
>>>> Well, cause there are two groupCounts() -- GroupCountStep and
>>>> GroupSideEffectStep. Two steps we don't have to maintain and we can
>> focus
>>>> our energies on further optimizing GroupStep and GroupSideEffectStep.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> <drop mic/>,
>>>>> Marko.
>>>>> 
>>>>> http://markorodriguez.com
>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Stephen Mallette <sp...@gmail.com>.
> implementer specific strategy

Huh - James's offering from the "vendor VOTE" thread of "provider" actually
plugs in well there:

provider specific strategy



On Wed, Oct 7, 2015 at 9:40 PM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hey,
>
> > What exactly is being optimized by replacing two classes with a single
> > class that has two modes?
>
> I want to make Gremlin's instruction set as small as possible. This way,
> implementers (previously known as vendors) don't have to write so many
> custom steps/strategies for their graph system. This is made readily
> apparent in this ticket:
>         https://issues.apache.org/jira/browse/TINKERPOP3-844
> Less "building blocks" and we can focus on optimizing the specific
> building blocks.
>
> >  Are there cases where you can perform
> > optimizations without caring about the mode?
>
> Yes -- if we can make GroupStepMapReduce faster, well, we get it faster
> for both "steps." In essence, we can focus our energies on making one step
> super efficient. … not a great argument, but less "things" (to me) is
> better.
>
> >  Why not an abstract GroupStep
> > with two concrete subclasses?
>
> This is possible and yes, instead of the "boolean" we can have this.
> Though, note that the "boolean" is just a hack right now. I'm sure we can
> figure out how to do it right. However, with two subclasses we are back to
> the instruction set being 4 steps when they could be 2 (GroupStep and
> GroupSideEffectStep).
>
> >  My understanding of this code is somewhat
> > superficial, so this is more of a reflexive, object-oriented programmer's
> > reaction.
>
> I'm more coming from a virtual machine and TraversalStrategy perspective
> on this one. If this was just "all about Java" I would say, lets have more
> classes all cherry picked for each use case. However, this causes problems
> for implementors (I actually don't like this word over "vendor") as if they
> have a implementor (vendor) specific strategy for GroupStep, well now they
> have to have one for GroupCountStep, etc… Again, this is made more apparent
> in TINKERPOP3-844.
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
>
> >
> > On Wed, Oct 7, 2015 at 3:01 PM, Marko Rodriguez <ok...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> So I branched and implemented this.
> >>
> https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872
> >>
> >> Dead simple implementation.
> >>
> >>
> https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f
> >>
> >> The one thing I don't like is the GroupStep(boolean) constructor that
> asks
> >> "groupCount?" … If you look at GroupStep you will see why its needed.
> >>
> >>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
> >>
> >>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101
> >>
> >> If people like this, then we can put it into 3.1.0 and then we have 2
> less
> >> steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).
> >>
> >> Enjoy,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> >>
> >>> Hello,
> >>>
> >>> Check it:
> >>>      https://issues.apache.org/jira/browse/TINKERPOP3-872
> >>>
> >>> GraphTraversal.groupCount() just uses GroupStep. Why is this so cool?
> >> Well, cause there are two groupCounts() -- GroupCountStep and
> >> GroupSideEffectStep. Two steps we don't have to maintain and we can
> focus
> >> our energies on further optimizing GroupStep and GroupSideEffectStep.
> >>>
> >>> Thoughts?
> >>>
> >>> <drop mic/>,
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>
> >>
>
>

Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Marko Rodriguez <ok...@gmail.com>.
Hey,

> What exactly is being optimized by replacing two classes with a single
> class that has two modes?

I want to make Gremlin's instruction set as small as possible. This way, implementers (previously known as vendors) don't have to write so many custom steps/strategies for their graph system. This is made readily apparent in this ticket:
	https://issues.apache.org/jira/browse/TINKERPOP3-844
Less "building blocks" and we can focus on optimizing the specific building blocks.

>  Are there cases where you can perform
> optimizations without caring about the mode?

Yes -- if we can make GroupStepMapReduce faster, well, we get it faster for both "steps." In essence, we can focus our energies on making one step super efficient. … not a great argument, but less "things" (to me) is better.

>  Why not an abstract GroupStep
> with two concrete subclasses?

This is possible and yes, instead of the "boolean" we can have this. Though, note that the "boolean" is just a hack right now. I'm sure we can figure out how to do it right. However, with two subclasses we are back to the instruction set being 4 steps when they could be 2 (GroupStep and GroupSideEffectStep).

>  My understanding of this code is somewhat
> superficial, so this is more of a reflexive, object-oriented programmer's
> reaction.

I'm more coming from a virtual machine and TraversalStrategy perspective on this one. If this was just "all about Java" I would say, lets have more classes all cherry picked for each use case. However, this causes problems for implementors (I actually don't like this word over "vendor") as if they have a implementor (vendor) specific strategy for GroupStep, well now they have to have one for GroupCountStep, etc… Again, this is made more apparent in TINKERPOP3-844.

Thoughts?,
Marko.

http://markorodriguez.com


> 
> On Wed, Oct 7, 2015 at 3:01 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> So I branched and implemented this.
>>        https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872
>> 
>> Dead simple implementation.
>> 
>> https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f
>> 
>> The one thing I don't like is the GroupStep(boolean) constructor that asks
>> "groupCount?" … If you look at GroupStep you will see why its needed.
>> 
>> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
>> 
>> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101
>> 
>> If people like this, then we can put it into 3.1.0 and then we have 2 less
>> steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).
>> 
>> Enjoy,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> Check it:
>>>      https://issues.apache.org/jira/browse/TINKERPOP3-872
>>> 
>>> GraphTraversal.groupCount() just uses GroupStep. Why is this so cool?
>> Well, cause there are two groupCounts() -- GroupCountStep and
>> GroupSideEffectStep. Two steps we don't have to maintain and we can focus
>> our energies on further optimizing GroupStep and GroupSideEffectStep.
>>> 
>>> Thoughts?
>>> 
>>> <drop mic/>,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>> 
>> 


Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Matt Frantz <ma...@gmail.com>.
What exactly is being optimized by replacing two classes with a single
class that has two modes?  Are there cases where you can perform
optimizations without caring about the mode?  Why not an abstract GroupStep
with two concrete subclasses?  My understanding of this code is somewhat
superficial, so this is more of a reflexive, object-oriented programmer's
reaction.

On Wed, Oct 7, 2015 at 3:01 PM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> So I branched and implemented this.
>         https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872
>
> Dead simple implementation.
>
> https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f
>
> The one thing I don't like is the GroupStep(boolean) constructor that asks
> "groupCount?" … If you look at GroupStep you will see why its needed.
>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
>
> https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101
>
> If people like this, then we can put it into 3.1.0 and then we have 2 less
> steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).
>
> Enjoy,
> Marko.
>
> http://markorodriguez.com
>
> On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com> wrote:
>
> > Hello,
> >
> > Check it:
> >       https://issues.apache.org/jira/browse/TINKERPOP3-872
> >
> > GraphTraversal.groupCount() just uses GroupStep. Why is this so cool?
> Well, cause there are two groupCounts() -- GroupCountStep and
> GroupSideEffectStep. Two steps we don't have to maintain and we can focus
> our energies on further optimizing GroupStep and GroupSideEffectStep.
> >
> > Thoughts?
> >
> > <drop mic/>,
> > Marko.
> >
> > http://markorodriguez.com
> >
>
>

Re: [DISCUSS] Remove GroupCountStep in favor of new GroupStep with "on the fly" reduction.

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi,

So I branched and implemented this.
	https://github.com/apache/incubator-tinkerpop/tree/TINKERPOP3-872

Dead simple implementation.
	https://github.com/apache/incubator-tinkerpop/commit/db1cd97e3ec98938961e394489dd902acbb3009f

The one thing I don't like is the GroupStep(boolean) constructor that asks "groupCount?" … If you look at GroupStep you will see why its needed.
	https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L72-L75
	https://github.com/apache/incubator-tinkerpop/blob/db1cd97e3ec98938961e394489dd902acbb3009f/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/GroupStep.java#L100-L101

If people like this, then we can put it into 3.1.0 and then we have 2 less steps to maintain (i.e. GroupCountStep and GroupCountSideEffectStep).

Enjoy,
Marko.

http://markorodriguez.com

On Oct 7, 2015, at 1:36 PM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hello,
> 
> Check it:
> 	https://issues.apache.org/jira/browse/TINKERPOP3-872
> 
> GraphTraversal.groupCount() just uses GroupStep. Why is this so cool? Well, cause there are two groupCounts() -- GroupCountStep and GroupSideEffectStep. Two steps we don't have to maintain and we can focus our energies on further optimizing GroupStep and GroupSideEffectStep.
> 
> Thoughts?
>  
> <drop mic/>,
> Marko.
> 
> http://markorodriguez.com
>