You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@crunch.apache.org by 陈竞 <cj...@gmail.com> on 2016/04/28 08:17:38 UTC

confused about node split in MSCRPlanner.prepareFinalGraph

i'm reading crunch source code, i am very confused about the code
of  MSCRPlanner.prepareFinalGraph():

if (baseVertex.isGBK()) {
  Vertex vertex = graph.getVertexAt(baseVertex.getPCollection());
  for (Edge e : baseVertex.getIncomingEdges()) {
    if (e.getHead().isOutput()) {
      // Execute an edge split.
      Vertex splitTail = e.getHead();
      PCollectionImpl<?> split = splitTail.getPCollection();
      InputCollection<?> inputNode = handleSplitTarget(split);
      Vertex splitHead = graph.addVertex(inputNode, false);

      // Divide up the node paths in the edge between the two GBK nodes so
      // that each node is either owned by GBK1 -> newTail or newHead -> GBK2.
      for (NodePath path : e.getNodePaths()) {
        NodePath headPath = path.splitAt(split, splitHead.getPCollection());
        graph.getEdge(vertex, splitTail).addNodePath(headPath);
        graph.getEdge(splitHead, vertex).addNodePath(path);
      }

      // Note the dependency between the vertices in the graph.
      graph.markDependency(splitHead, splitTail);
    }


my question is , since $vertex is e's tail, why  graph.getEdge(vertex,
splitTail).addNodePath(headPath)?

it seems like crunch reverse the edeg e?


-- 

Jing Chen HPCC.ICT.AC China

Re: confused about node split in MSCRPlanner.prepareFinalGraph

Posted by 陈竞 <cj...@gmail.com>.

thankyou! i really appreciate it!

2016-04-30 4:29 GMT+08:00 Josh Wills <jo...@gmail.com>:

> Yikes, what idiot wrote this code? ;-)
>
> So I'll be honest, I couldn't figure out what the code was doing either--
> so I went back in time (to 2012!) using git blame to figure out when/why I
> added it, and discovered that it was a special case to handle map-side
> outputs that had a downstream GBK operation, via this commit:
>
>
> https://github.com/apache/crunch/commit/28e51b6a4505ff406c0d9472303c28cd2e2d6aaa
>
> After staring at it for awhile, I *think* the reason this works is because
> this line:
>
> graph.getEdge(vertex, splitTail).addNodePath(headPath);
>
> doesn't actually do anything-- the headPath here is always empty, so
> there's no impact to the final graph. I'd be curious if anything failed if
> we removed that line.
>
> Josh
>
> On Wed, Apr 27, 2016 at 11:17 PM, 陈竞 <cj...@gmail.com> wrote:
>
>> i'm reading crunch source code, i am very confused about the code
>> of  MSCRPlanner.prepareFinalGraph():
>>
>> if (baseVertex.isGBK()) {
>>   Vertex vertex = graph.getVertexAt(baseVertex.getPCollection());
>>   for (Edge e : baseVertex.getIncomingEdges()) {
>>     if (e.getHead().isOutput()) {
>>       // Execute an edge split.
>>       Vertex splitTail = e.getHead();
>>       PCollectionImpl<?> split = splitTail.getPCollection();
>>       InputCollection<?> inputNode = handleSplitTarget(split);
>>       Vertex splitHead = graph.addVertex(inputNode, false);
>>
>>       // Divide up the node paths in the edge between the two GBK nodes so
>>       // that each node is either owned by GBK1 -> newTail or newHead -> GBK2.
>>       for (NodePath path : e.getNodePaths()) {
>>         NodePath headPath = path.splitAt(split, splitHead.getPCollection());
>>         graph.getEdge(vertex, splitTail).addNodePath(headPath);
>>         graph.getEdge(splitHead, vertex).addNodePath(path);
>>       }
>>
>>       // Note the dependency between the vertices in the graph.
>>       graph.markDependency(splitHead, splitTail);
>>     }
>>
>>
>> my question is , since $vertex is e's tail, why  graph.getEdge(vertex,
>> splitTail).addNodePath(headPath)?
>>
>> it seems like crunch reverse the edeg e?
>>
>>
>> --
>>
>> Jing Chen HPCC.ICT.AC China
>>
>
>


-- 
陈竞，中科院计算技术研究所，高性能计算机中心
Jing Chen HPCC.ICT.AC China

Re: confused about node split in MSCRPlanner.prepareFinalGraph

Posted by Josh Wills <jo...@gmail.com>.

Yikes, what idiot wrote this code? ;-)

So I'll be honest, I couldn't figure out what the code was doing either--
so I went back in time (to 2012!) using git blame to figure out when/why I
added it, and discovered that it was a special case to handle map-side
outputs that had a downstream GBK operation, via this commit:

https://github.com/apache/crunch/commit/28e51b6a4505ff406c0d9472303c28cd2e2d6aaa

After staring at it for awhile, I *think* the reason this works is because
this line:

graph.getEdge(vertex, splitTail).addNodePath(headPath);

doesn't actually do anything-- the headPath here is always empty, so
there's no impact to the final graph. I'd be curious if anything failed if
we removed that line.

Josh

On Wed, Apr 27, 2016 at 11:17 PM, 陈竞 <cj...@gmail.com> wrote:

> i'm reading crunch source code, i am very confused about the code
> of  MSCRPlanner.prepareFinalGraph():
>
> if (baseVertex.isGBK()) {
>   Vertex vertex = graph.getVertexAt(baseVertex.getPCollection());
>   for (Edge e : baseVertex.getIncomingEdges()) {
>     if (e.getHead().isOutput()) {
>       // Execute an edge split.
>       Vertex splitTail = e.getHead();
>       PCollectionImpl<?> split = splitTail.getPCollection();
>       InputCollection<?> inputNode = handleSplitTarget(split);
>       Vertex splitHead = graph.addVertex(inputNode, false);
>
>       // Divide up the node paths in the edge between the two GBK nodes so
>       // that each node is either owned by GBK1 -> newTail or newHead -> GBK2.
>       for (NodePath path : e.getNodePaths()) {
>         NodePath headPath = path.splitAt(split, splitHead.getPCollection());
>         graph.getEdge(vertex, splitTail).addNodePath(headPath);
>         graph.getEdge(splitHead, vertex).addNodePath(path);
>       }
>
>       // Note the dependency between the vertices in the graph.
>       graph.markDependency(splitHead, splitTail);
>     }
>
>
> my question is , since $vertex is e's tail, why  graph.getEdge(vertex,
> splitTail).addNodePath(headPath)?
>
> it seems like crunch reverse the edeg e?
>
>
> --
>
> Jing Chen HPCC.ICT.AC China
>