You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Praveen Rachabattuni <pr...@sigmoidanalytics.com> on 2015/03/13 12:18:28 UTC

Review Request 32036: PIG-4422: Implement visitMergeJoin in SparkCompiler

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/
-----------------------------------------------------------

Review request for pig, liyun zhang and Mohit Sabharwal.


Bugs: PIG-4422
    https://issues.apache.org/jira/browse/PIG-4422


Repository: pig-git


Description
-------

POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.

TODO:
- Perform join considering the input data is sorted.
- Fix failing test cases in TestMergeJoin


Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d 
  src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36 
  src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION 
  src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION 

Diff: https://reviews.apache.org/r/32036/diff/


Testing
-------

Tested TestMergeJoin and we now have all tests passing except the following:
- testMergeJoinWithCommaSeperatedFilePaths
- testMergeJoinEmptyIndex
- testMergeJoinOutPipeline
- testExpressionFail


Thanks,

Praveen Rachabattuni


Re: Review Request 32036: PIG-4422: Implement visitMergeJoin in SparkCompiler

Posted by Mohit Sabharwal <mo...@cloudera.com>.

> On March 27, 2015, 8:02 p.m., Mohit Sabharwal wrote:
> > Thanks, Praveen! Looks good to me. I have some comments about error handling.

Any idea why some merge join tests are still failing after this patch ?


- Mohit


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/#review78099
-----------------------------------------------------------


On March 13, 2015, 11:18 a.m., Praveen Rachabattuni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32036/
> -----------------------------------------------------------
> 
> (Updated March 13, 2015, 11:18 a.m.)
> 
> 
> Review request for pig, liyun zhang and Mohit Sabharwal.
> 
> 
> Bugs: PIG-4422
>     https://issues.apache.org/jira/browse/PIG-4422
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.
> 
> TODO:
> - Perform join considering the input data is sorted.
> - Fix failing test cases in TestMergeJoin
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32036/diff/
> 
> 
> Testing
> -------
> 
> Tested TestMergeJoin and we now have all tests passing except the following:
> - testMergeJoinWithCommaSeperatedFilePaths
> - testMergeJoinEmptyIndex
> - testMergeJoinOutPipeline
> - testExpressionFail
> 
> 
> Thanks,
> 
> Praveen Rachabattuni
> 
>


Re: Review Request 32036: PIG-4422: Implement visitMergeJoin in SparkCompiler

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/#review78099
-----------------------------------------------------------


Thanks, Praveen! Looks good to me. I have some comments about error handling.


src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/32036/#comment126549>

    you may want to leave it private, and add a public getter instead. Also, pig-core changes like this needs to go in a separate patch, right ?



src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126545>

    We should check the return status for lrOut. Plus, detaching the input after getNextTuple() call is a good practice:
    
    See POMergeJoin.extractKeysFromTuple():
    
    lr.detachInput();
    if(lrOut.returnStatus!=POStatus.STATUS_OK){
                int errCode = 2167;
                String errMsg = "LocalRearrange used to extract keys from tuple isn't configured correctly";
                throw new ExecException(...);
    }



src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126546>

    Let's not eat up these exceptions. Please log an error and re-throw the exception.
    
    LOG.error(msg + e, e)
    throw new ExecException(msg, e)



src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126548>

    Same here. Log and re-throw.



src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java
<https://reviews.apache.org/r/32036/#comment126550>

    i never quite understood what these error codes mean or a central place where these are defined...


- Mohit Sabharwal


On March 13, 2015, 11:18 a.m., Praveen Rachabattuni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32036/
> -----------------------------------------------------------
> 
> (Updated March 13, 2015, 11:18 a.m.)
> 
> 
> Review request for pig, liyun zhang and Mohit Sabharwal.
> 
> 
> Bugs: PIG-4422
>     https://issues.apache.org/jira/browse/PIG-4422
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.
> 
> TODO:
> - Perform join considering the input data is sorted.
> - Fix failing test cases in TestMergeJoin
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32036/diff/
> 
> 
> Testing
> -------
> 
> Tested TestMergeJoin and we now have all tests passing except the following:
> - testMergeJoinWithCommaSeperatedFilePaths
> - testMergeJoinEmptyIndex
> - testMergeJoinOutPipeline
> - testExpressionFail
> 
> 
> Thanks,
> 
> Praveen Rachabattuni
> 
>