You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2016/02/18 18:53:18 UTC
[jira] [Comment Edited] (TINKERPOP-1166) Add Memory.reduce() as option to Memory implementations.

    [ https://issues.apache.org/jira/browse/TINKERPOP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152717#comment-15152717 ] 

Marko A. Rodriguez edited comment on TINKERPOP-1166 at 2/18/16 5:53 PM:
------------------------------------------------------------------------

Jotted this out in a notebook and this feels the right way to do this.

{code}
Memory.merge(String, Merge)
{code}

{code}
public class SumMerge<T extends Number> implements Merge<T> {
  public SumMerge<T> merge(final SumMerge<T> other);
  public T get();
}
{code}

Both {{CountGlobalStep}} and {{SumGlobalStep}} would use the same {{SumMerge}} class. However, the merge for {{CountGlobalStep}} is just {{traverser.bulk()}}. For {{SumGlobalStep}}, its {{traverser.get() * traverser.bulk()}}.

Next, we can start to slide this into {{GraphComputer}} and start to push out {{MapReduce}} (maybe).... Check it:

Lets say we have another interface called {{VertexMerge}} that extends {{Merge}} and adds this method:

{code}
public Merge<T> initial(final Vertex vertex)
{code}

{code}
graph.compute(SparkGraphComputer).program(MyVertexProgram).merge(MyMerge).merge(...).merge(...)
{code}

The {{GraphComputer.merge(VertexMerge)}} simply gets its initial value by first processing the current Vertex. Also, it can access the edges of the vertex! -- which is something our current MapReduce model doesn't support! Thats it. At that point, this identical to {{MapReduce}} EXCEPT! that in {{MapReduce}} if you ONLY do a Map, with no Reduce, you still have output splits distributed across the cluster, where in this model, that would be VERY BAD to do without some filtering of some sort or else you will merge a massive list to a single machine.

This is all very simple to do and I believe is easier to grock than the {{MapReduce}}-extension we added because its all part of the {{VertexProgram}} execution and not some auxiliary appendage.


was (Author: okram):
Jotted this out in a notebook and this feels the right way to do this.

{code}
Memory.merge(String, Mergable)
{code}

{code}
public class SumMergeable<T extends Number> implements Mergeable<T> {
  public SumMerge<T> merge(final SumMerge<T> other);
  public T get();
}
{code}

Both {{CountGlobalStep}} and {{SumGlobalStep}} would use the same {{SumMerge}} class. However, the merge for {{CountGlobalStep}} is just {{traverser.bulk()}}. For {{SumGlobalStep}}, its {{traverser.get() * traverser.bulk()}}.

Next, we can start to slide this into {{GraphComputer}} and start to push out {{MapReduce}} (maybe).... Check it:

Lets say we have another interface called {{VertexMergeable}} that extends {{Mergeable}} and adds this method:

{code}
public Mergeable<T> initial(final Vertex vertex)
{code}

{code}
graph.compute(SparkGraphComputer).program(MyVertexProgram).merge(MyMergeable).merge(...).merge(...)
{code}

The {{GraphComputer.merge(VertexMergeable)}} simply gets its initial value by first processing the current Vertex. Thats it. At that point, this identical to {{MapReduce}} EXCEPT! that in {{MapReduce}} if you ONLY do a Map, with no Reduce, you still have output splits distributed across the cluster, where in this model, that would be VERY BAD to do without some filtering of some sort or else you will merge a massive list to a single machine.

This is all very simple to do and I believe is easier to grock than the {{MapReduce}}-extension we added because its all part of the {{VertexProgram}} execution and not some auxiliary appendage.

> Add Memory.reduce() as option to Memory implementations.
> --------------------------------------------------------
>
>                 Key: TINKERPOP-1166
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1166
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop, process, tinkergraph
>    Affects Versions: 3.1.2-incubating
>            Reporter: Marko A. Rodriguez
>
> Currently {{Memory}} supports {{incr}}, {{and}}, {{or}}, ... These are great and what people will typically  use. However, we should also provide the generalization which is simply {{Memory.reduce}}. In this situation, {{incr}}, {{or}}, {{and}}, etc. are just specifications of {{Memory.reduce}}.
> How would it work?
> When memory is initialized in a {{VertexProgram}}, it would be like this:
> {code}
> memory.set("myReduction", new MyReducingFunction(0))
> {code}
> Then {{ReducingFunction}} would look like this:
> {code}
> public class ReducingFunction implements UnaryOperator<A> {
>   public A getInitialValue();
>   public A apply(A first, A second);
> }
> {code}
> Easy peasy. Note that both Spark and Giraph support such types of function-based reduction in their respective "memory engines." TinkerGraphComputer will, of course, be easy to add this functionality too.
> Why do this? For two reasons:
> 1. We get extra flexibility in {{Memory}}.
> 2. https://issues.apache.org/jira/browse/TINKERPOP-1164



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)