You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Mark Wagner <wa...@gmail.com> on 2013/10/16 21:12:07 UTC

Review Request 14679: Initial implementation of PigProcessor

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14679/
-----------------------------------------------------------

Review request for pig, Cheolsoo Park, Daniel Dai, and Rohini Palaniswamy.


Bugs: PIG-3521
    https://issues.apache.org/jira/browse/PIG-3521


Repository: pig-git


Description
-------

This patch adds the PigProcessor and related changes. The current patch supports MR* jobs.

* Updates the Tez dependency to match Tez's trunk
* Add PigProcessor which roughly follows the existing Mappers and Reducers in Pig.
* The handling of input has been factored out of the PigProcessor into a new interface: InputHandler. Two implementations of InputHandler have been added: FileInputHandler and ShuffledInputHandler.
* Makes changes to TezDagBuilder to serialize and ship the necessary information from the frontend. These changes are mostly inspired by/stolen from the JobControlCompiler.
* Adds a TezPOPackageAnnotator which is analogous to the POPackageAnnotator, but for Tez.
* Fixes a problem with edge creation in the TezDagBuilder.


Diffs
-----

  ivy.xml c603def 
  src/org/apache/pig/backend/hadoop/executionengine/tez/FileInputHandler.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/tez/InputHandler.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 6724f2b 
  src/org/apache/pig/backend/hadoop/executionengine/tez/ShuffledInputHandler.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 48c0955 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java 05b0c54 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 4cc9ab4 
  src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOPackageAnnotator.java PRE-CREATION 

Diff: https://reviews.apache.org/r/14679/diff/


Testing
-------

Only integration testing has been done. Jobs with 1, 2, and 3 stages have been executed successfully. I'll be adding unit tests.


Thanks,

Mark Wagner


Re: Review Request 14679: Initial implementation of PigProcessor

Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14679/#review27104
-----------------------------------------------------------

Ship it!


Simple load/store works for me (with some minor fix, and frontend throw exception after job finish though). Still trying complex queries. But we can commit this patch first and fix based on it.

- Daniel Dai


On Oct. 16, 2013, 7:12 p.m., Mark Wagner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14679/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2013, 7:12 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park, Daniel Dai, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3521
>     https://issues.apache.org/jira/browse/PIG-3521
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> This patch adds the PigProcessor and related changes. The current patch supports MR* jobs.
> 
> * Updates the Tez dependency to match Tez's trunk
> * Add PigProcessor which roughly follows the existing Mappers and Reducers in Pig.
> * The handling of input has been factored out of the PigProcessor into a new interface: InputHandler. Two implementations of InputHandler have been added: FileInputHandler and ShuffledInputHandler.
> * Makes changes to TezDagBuilder to serialize and ship the necessary information from the frontend. These changes are mostly inspired by/stolen from the JobControlCompiler.
> * Adds a TezPOPackageAnnotator which is analogous to the POPackageAnnotator, but for Tez.
> * Fixes a problem with edge creation in the TezDagBuilder.
> 
> 
> Diffs
> -----
> 
>   ivy.xml c603def 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/FileInputHandler.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/InputHandler.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java 6724f2b 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/ShuffledInputHandler.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java 48c0955 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobControlCompiler.java 05b0c54 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java 4cc9ab4 
>   src/org/apache/pig/backend/hadoop/executionengine/tez/TezPOPackageAnnotator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/14679/diff/
> 
> 
> Testing
> -------
> 
> Only integration testing has been done. Jobs with 1, 2, and 3 stages have been executed successfully. I'll be adding unit tests.
> 
> 
> Thanks,
> 
> Mark Wagner
> 
>