You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2015/10/03 00:52:26 UTC
[jira] [Commented] (TINKERPOP3-866) GroupStep and Traversal-Based
Reductions
[ https://issues.apache.org/jira/browse/TINKERPOP3-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941917#comment-14941917 ]
Marko A. Rodriguez commented on TINKERPOP3-866:
-----------------------------------------------
{code}
g.V.group.by(label).by(bothE.weight.fold).by(dedup(local)) // currently
g.V.group.by(label).by(bothE.weight).by(dedup) // new model
{code}
Its actually more elegant in the new model because the {{valueTraversal}} just concatenates with the {{reduceTraversal}} so its a full stream. Each incoming traverser is key'd (label) and then for the key->valueReduceTraversal you {{addStart(traverser)}}.
[~dkuppitz] was concerned about reductions that require collections -- e.g. {{mean()}}.
Simple. Its just:
{code}
g.V.group.by(label).by(bothE.weight).by(mean)
{code}
> GroupStep and Traversal-Based Reductions
> ----------------------------------------
>
> Key: TINKERPOP3-866
> URL: https://issues.apache.org/jira/browse/TINKERPOP3-866
> Project: TinkerPop 3
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.0.1-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Labels: breaking
> Fix For: 3.1.0-incubating
>
>
> Right now {{GroupStep}} is defined as:
> {code}
> public final class GroupStep<S, K, V, R> extends ReducingBarrierStep<S, Map<K, R>> implements MapReducer, TraversalParent {
> private Traversal.Admin<S, K> keyTraversal = null;
> private Traversal.Admin<S, V> valueTraversal = null;
> private Traversal.Admin<Collection<V>, R> reduceTraversal = null;
> ...
> {code}
> Look at {{reduceTraversal}}. It takes a {{Collection<V>}} of "values" and reduces them to a "reduction" {{R}}. Why are we using {{Collection<V>}}, why is this not:
> {code}
> private Traversal.Admin<V, R> reduceTraversal = null;
> {code}
> Now, when a new {{K}} is created (and reduce is defined), we clone {{reduceTraversal}}. Thus, each key has a {{reduceTraversal}} (identical clones) that operate in a stream like fashion on {{V}} to yield {{R}}. This enables us to remove the {{Collection<V>}} (memory hog) and allows us to defined {{GroupCountStep}} in terms of {{GroupStep}} without (?limited?) computational cost. HOWEVER, this changes the API as people who did this:
> {code}
> g.V.group.by(label()).by(outE().count()).by(sum(local))
> {code}
> would now have to do this:
> {code}
> g.V.group.by(label()).by(outE().count()).by(sum())
> {code}
> Its very minor, given the speed up we would gain and the ability for us to now do "groupCount" efficiently on arbitrary values -- not just bulks (e.g. sacks).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)