You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Rohini Palaniswamy <ro...@gmail.com> on 2014/09/14 07:20:38 UTC

Review Request 25617: PIG-4104: Accumulator UDF throws OOM in Tez

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25617/
-----------------------------------------------------------

Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-4104
    https://issues.apache.org/jira/browse/PIG-4104


Repository: pig


Description
-------

Use a separate TezAccumulativeTupleBuffer that iterates through the inputs and returns tuples in batches instead of making a full copy.


Diffs
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java 1624398 

Diff: https://reviews.apache.org/r/25617/diff/


Testing
-------

Ran TestAccumulator in unit test and Accumulator, SecondarySort test groups in e2e and they all passed. Will run the full suite before committing.


Thanks,

Rohini Palaniswamy


Re: Review Request 25617: PIG-4104: Accumulator UDF throws OOM in Tez

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25617/
-----------------------------------------------------------

(Updated Sept. 14, 2014, 5:56 a.m.)


Review request for pig, Cheolsoo Park and Daniel Dai.


Changes
-------

Reuse the same TezAccumulativeTupleBuffer for all input keys.


Bugs: PIG-4104
    https://issues.apache.org/jira/browse/PIG-4104


Repository: pig


Description
-------

Use a separate TezAccumulativeTupleBuffer that iterates through the inputs and returns tuples in batches instead of making a full copy.


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 1624398 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java 1624398 

Diff: https://reviews.apache.org/r/25617/diff/


Testing
-------

Ran TestAccumulator in unit test and Accumulator, SecondarySort test groups in e2e and they all passed. Will run the full suite before committing.


Thanks,

Rohini Palaniswamy


Re: Review Request 25617: PIG-4104: Accumulator UDF throws OOM in Tez

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25617/#review53276
-----------------------------------------------------------



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java
<https://reviews.apache.org/r/25617/#comment92868>

    Daniel,
       Would there be a problem if one instance of TezAccumulativeTupleBuffer was used for each record as the ArrayList bags can be cleared and min key reset? I am only concerned with the case of streaming. I am still not familiar with internals of streaming and I believe there were cases copies of data had to be made for streaming.


- Rohini Palaniswamy


On Sept. 14, 2014, 5:20 a.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25617/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2014, 5:20 a.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-4104
>     https://issues.apache.org/jira/browse/PIG-4104
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Use a separate TezAccumulativeTupleBuffer that iterates through the inputs and returns tuples in batches instead of making a full copy.
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1624398 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java 1624398 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 1624398 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/POShuffleTezLoad.java 1624398 
>   http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/util/AccumulatorOptimizerUtil.java 1624398 
> 
> Diff: https://reviews.apache.org/r/25617/diff/
> 
> 
> Testing
> -------
> 
> Ran TestAccumulator in unit test and Accumulator, SecondarySort test groups in e2e and they all passed. Will run the full suite before committing.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>