You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Pallavi Rao <pa...@inmobi.com> on 2016/02/01 13:33:22 UTC

Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/
-----------------------------------------------------------

Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu Zhang.


Bugs: PIG-4766
    https://issues.apache.org/jira/browse/PIG-4766


Repository: pig-git


Description
-------

PIG-4709 introduced Combiner optimization for Group By. However, the patch did not handle cases where constant/conditional expressions were used. It also did not handle limit.

This patch is to address those gaps.


Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 5fb49e2 
  src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java a05d009 
  src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 5c0919f 
  src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
  test/org/apache/pig/test/TestCombiner.java b2e81ac 

Diff: https://reviews.apache.org/r/43044/diff/


Testing
-------

With this patch, all tests in TestCombiner pass.


Thanks,

Pallavi Rao


Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/
-----------------------------------------------------------

(Updated Feb. 5, 2016, 4:25 a.m.)


Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu Zhang.


Changes
-------

Rebased patch


Bugs: PIG-4766
    https://issues.apache.org/jira/browse/PIG-4766


Repository: pig-git


Description
-------

PIG-4709 introduced Combiner optimization for Group By. However, the patch did not handle cases where constant/conditional expressions were used. It also did not handle limit.

This patch is to address those gaps.


Diffs (updated)
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 5fb49e2 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java d4b521a 
  src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java a05d009 
  src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 5c0919f 
  test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java 0e45434 
  test/org/apache/pig/test/TestCombiner.java b2e81ac 

Diff: https://reviews.apache.org/r/43044/diff/


Testing
-------

With this patch, all tests in TestCombiner pass.


Thanks,

Pallavi Rao


Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

Posted by Pallavi Rao <pa...@inmobi.com>.

> On Feb. 4, 2016, 8:56 a.m., kelly zhang wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java, line 181
> > <https://reviews.apache.org/r/43044/diff/2/?file=1230607#file1230607line181>
> >
> >     can we consider all the tuples with null key are same?
> >     
> >     I explain the detail in jira page.

Answered your question on the JIRA :-)


> On Feb. 4, 2016, 8:56 a.m., kelly zhang wrote:
> > src/org/apache/pig/data/SelfSpillBag.java, line 55
> > <https://reviews.apache.org/r/43044/diff/2/?file=1230610#file1230610line55>
> >
> >     This modification is checked in PIG-4611.

Oh! I missed the change. Will revert this change. Thanks for pointing out.


- Pallavi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/#review117779
-----------------------------------------------------------


On Feb. 3, 2016, 6:23 a.m., Pallavi Rao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43044/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2016, 6:23 a.m.)
> 
> 
> Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu Zhang.
> 
> 
> Bugs: PIG-4766
>     https://issues.apache.org/jira/browse/PIG-4766
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> PIG-4709 introduced Combiner optimization for Group By. However, the patch did not handle cases where constant/conditional expressions were used. It also did not handle limit.
> 
> This patch is to address those gaps.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 5fb49e2 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java d4b521a 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java a05d009 
>   src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 5c0919f 
>   src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
>   test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java 0e45434 
>   test/org/apache/pig/test/TestCombiner.java b2e81ac 
> 
> Diff: https://reviews.apache.org/r/43044/diff/
> 
> 
> Testing
> -------
> 
> With this patch, all tests in TestCombiner pass.
> 
> 
> Thanks,
> 
> Pallavi Rao
> 
>


Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

Posted by kelly zhang <li...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/#review117779
-----------------------------------------------------------




src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java (line 181)
<https://reviews.apache.org/r/43044/#comment179047>

    can we consider all the tuples with null key are same?
    
    I explain the detail in jira page.



src/org/apache/pig/data/SelfSpillBag.java (line 55)
<https://reviews.apache.org/r/43044/#comment179046>

    This modification is checked in PIG-4611.


- kelly zhang


On Feb. 3, 2016, 6:23 a.m., Pallavi Rao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43044/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2016, 6:23 a.m.)
> 
> 
> Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu Zhang.
> 
> 
> Bugs: PIG-4766
>     https://issues.apache.org/jira/browse/PIG-4766
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> PIG-4709 introduced Combiner optimization for Group By. However, the patch did not handle cases where constant/conditional expressions were used. It also did not handle limit.
> 
> This patch is to address those gaps.
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 5fb49e2 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java d4b521a 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java a05d009 
>   src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 5c0919f 
>   src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
>   test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java 0e45434 
>   test/org/apache/pig/test/TestCombiner.java b2e81ac 
> 
> Diff: https://reviews.apache.org/r/43044/diff/
> 
> 
> Testing
> -------
> 
> With this patch, all tests in TestCombiner pass.
> 
> 
> Thanks,
> 
> Pallavi Rao
> 
>


Re: Review Request 43044: PIG-4766 Ensure GroupBy is optimized for all algebraic Operations

Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43044/
-----------------------------------------------------------

(Updated Feb. 3, 2016, 6:23 a.m.)


Review request for pig, Xianda Ke, liyun zhang, Mohit Sabharwal, and Xuefu Zhang.


Changes
-------

Fixed some UT failures in UTs other than TestCombiner.


Bugs: PIG-4766
    https://issues.apache.org/jira/browse/PIG-4766


Repository: pig-git


Description
-------

PIG-4709 introduced Combiner optimization for Group By. However, the patch did not handle cases where constant/conditional expressions were used. It also did not handle limit.

This patch is to address those gaps.


Diffs (updated)
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/PORelationToExprProject.java 5fb49e2 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ReduceByConverter.java d4b521a 
  src/org/apache/pig/backend/hadoop/executionengine/spark/optimizer/CombinerOptimizer.java a05d009 
  src/org/apache/pig/backend/hadoop/executionengine/util/CombinerOptimizerUtil.java 5c0919f 
  src/org/apache/pig/data/SelfSpillBag.java 4e08b99 
  test/org/apache/pig/newplan/logical/relational/TestLocationInPhysicalPlan.java 0e45434 
  test/org/apache/pig/test/TestCombiner.java b2e81ac 

Diff: https://reviews.apache.org/r/43044/diff/


Testing
-------

With this patch, all tests in TestCombiner pass.


Thanks,

Pallavi Rao