You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Rohini Palaniswamy <ro...@gmail.com> on 2014/02/17 08:33:24 UTC

Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/
-----------------------------------------------------------

Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-3766
    https://issues.apache.org/jira/browse/PIG-3766


Repository: pig


Description
-------

Changes done:
  1) Removed the POLocalRearrange in SampleVertex and replaced it with a POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple outputs. So got rid of the POSplit as well in skewed join sample vertex.
  2) Replaced the POPackage+POLocalRearrange in the partition vertex of left table (vertex 3) with a POIdentityInOutTez moving the project in POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge 1-1 between vertex 1 and vertex 3. 


Diffs
-----

  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1568862 

Diff: https://reviews.apache.org/r/18181/diff/


Testing
-------

TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 PIG-3727) pass


Thanks,

Rohini Palaniswamy


Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex

Posted by Cheolsoo Park <pi...@gmail.com>.

> On Feb. 17, 2014, 7:54 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 1464
> > <https://reviews.apache.org/r/18181/diff/2/?file=490429#file490429line1464>
> >
> >     Remove this comment since it's no longer applicable?
> 
> Rohini Palaniswamy wrote:
>     Left that on purpose. We want to try unsorted shuffle to reduce the number of stages if data is less. For eg: If there are 7K input splits and parallel set to 100, with 1-1 it will be 7K tasks in load vertex, 7K tasks in partition vertex and 100 in join vertex. We want to see if 7K in load vertex, 3.5K in partition vertex and 100 in join vertex performs better. In theory it might be better as the final join task only needs to merge 3.5K map outputs instead of 7K. But if that does not work out then we will stick with 1-1.

Oh I see. Make sense. Thanks for the clarification!


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/#review34628
-----------------------------------------------------------


On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18181/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 7:34 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3766
>     https://issues.apache.org/jira/browse/PIG-3766
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Changes done:
>   1) Removed the POLocalRearrange in SampleVertex and replaced it with a POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple outputs. So got rid of the POSplit as well in skewed join sample vertex.
>   2) Replaced the POPackage+POLocalRearrange in the partition vertex of left table (vertex 3) with a POIdentityInOutTez moving the project in POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge 1-1 between vertex 1 and vertex 3. 
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1568862 
> 
> Diff: https://reviews.apache.org/r/18181/diff/
> 
> 
> Testing
> -------
> 
> TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 PIG-3727) pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex

Posted by Rohini Palaniswamy <ro...@gmail.com>.

> On Feb. 17, 2014, 7:54 a.m., Cheolsoo Park wrote:
> > http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java, line 1464
> > <https://reviews.apache.org/r/18181/diff/2/?file=490429#file490429line1464>
> >
> >     Remove this comment since it's no longer applicable?

Left that on purpose. We want to try unsorted shuffle to reduce the number of stages if data is less. For eg: If there are 7K input splits and parallel set to 100, with 1-1 it will be 7K tasks in load vertex, 7K tasks in partition vertex and 100 in join vertex. We want to see if 7K in load vertex, 3.5K in partition vertex and 100 in join vertex performs better. In theory it might be better as the final join task only needs to merge 3.5K map outputs instead of 7K. But if that does not work out then we will stick with 1-1.


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/#review34628
-----------------------------------------------------------


On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18181/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 7:34 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3766
>     https://issues.apache.org/jira/browse/PIG-3766
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Changes done:
>   1) Removed the POLocalRearrange in SampleVertex and replaced it with a POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple outputs. So got rid of the POSplit as well in skewed join sample vertex.
>   2) Replaced the POPackage+POLocalRearrange in the partition vertex of left table (vertex 3) with a POIdentityInOutTez moving the project in POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge 1-1 between vertex 1 and vertex 3. 
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1568862 
> 
> Diff: https://reviews.apache.org/r/18181/diff/
> 
> 
> Testing
> -------
> 
> TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 PIG-3727) pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/#review34628
-----------------------------------------------------------

Ship it!


I just have one minor comment below. Looks good, and thank you for fixing this!


http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
<https://reviews.apache.org/r/18181/#comment64815>

    Remove this comment since it's no longer applicable?


- Cheolsoo Park


On Feb. 17, 2014, 7:34 a.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18181/
> -----------------------------------------------------------
> 
> (Updated Feb. 17, 2014, 7:34 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3766
>     https://issues.apache.org/jira/browse/PIG-3766
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Changes done:
>   1) Removed the POLocalRearrange in SampleVertex and replaced it with a POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple outputs. So got rid of the POSplit as well in skewed join sample vertex.
>   2) Replaced the POPackage+POLocalRearrange in the partition vertex of left table (vertex 3) with a POIdentityInOutTez moving the project in POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge 1-1 between vertex 1 and vertex 3. 
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld PRE-CREATION 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld 1568862 
>   http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1568862 
> 
> Diff: https://reviews.apache.org/r/18181/diff/
> 
> 
> Testing
> -------
> 
> TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 PIG-3727) pass
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>


Re: Review Request 18181: [PIG-3766] Use ONE_TO_ONE edge and IdentityInOut in skewed join intermediate vertex

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18181/
-----------------------------------------------------------

(Updated Feb. 17, 2014, 7:34 a.m.)


Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-3766
    https://issues.apache.org/jira/browse/PIG-3766


Repository: pig


Description
-------

Changes done:
  1) Removed the POLocalRearrange in SampleVertex and replaced it with a POValueOutTez for both orderby and skewedjoin. POValueOutTez takes multiple outputs. So got rid of the POSplit as well in skewed join sample vertex.
  2) Replaced the POPackage+POLocalRearrange in the partition vertex of left table (vertex 3) with a POIdentityInOutTez moving the project in POLocalRearrange into the POLocalRearrange in vertex 1. Also made the edge 1-1 between vertex 1 and vertex 3. 


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POLocalRearrangeTez.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC16.gld 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC17.gld PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/test/data/GoldenFiles/TEZC7.gld 1568862 
  http://svn.apache.org/repos/asf/pig/branches/tez/test/org/apache/pig/tez/TestTezCompiler.java 1568862 

Diff: https://reviews.apache.org/r/18181/diff/


Testing
-------

TestSkewedJoin and -t SkewedJoin in nightly.conf (except SkewedJoin_6 PIG-3727) pass


Thanks,

Rohini Palaniswamy