You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Praveen Rachabattuni <pr...@sigmoidanalytics.com> on 2015/03/13 12:18:28 UTC
Review Request 32036: PIG-4422: Implement visitMergeJoin in
SparkCompiler
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/
-----------------------------------------------------------
Review request for pig, liyun zhang and Mohit Sabharwal.
Bugs: PIG-4422
https://issues.apache.org/jira/browse/PIG-4422
Repository: pig-git
Description
-------
POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.
TODO:
- Perform join considering the input data is sorted.
- Fix failing test cases in TestMergeJoin
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d
src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION
Diff: https://reviews.apache.org/r/32036/diff/
Testing
-------
Tested TestMergeJoin and we now have all tests passing except the following:
- testMergeJoinWithCommaSeperatedFilePaths
- testMergeJoinEmptyIndex
- testMergeJoinOutPipeline
- testExpressionFail
Thanks,
Praveen Rachabattuni
Re: Review Request 32036: PIG-4422: Implement visitMergeJoin in
SparkCompiler
Posted by Mohit Sabharwal <mo...@cloudera.com>.
> On March 27, 2015, 8:02 p.m., Mohit Sabharwal wrote:
> > Thanks, Praveen! Looks good to me. I have some comments about error handling.
Any idea why some merge join tests are still failing after this patch ?
- Mohit
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/#review78099
-----------------------------------------------------------
On March 13, 2015, 11:18 a.m., Praveen Rachabattuni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32036/
> -----------------------------------------------------------
>
> (Updated March 13, 2015, 11:18 a.m.)
>
>
> Review request for pig, liyun zhang and Mohit Sabharwal.
>
>
> Bugs: PIG-4422
> https://issues.apache.org/jira/browse/PIG-4422
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.
>
> TODO:
> - Perform join considering the input data is sorted.
> - Fix failing test cases in TestMergeJoin
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION
> src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/32036/diff/
>
>
> Testing
> -------
>
> Tested TestMergeJoin and we now have all tests passing except the following:
> - testMergeJoinWithCommaSeperatedFilePaths
> - testMergeJoinEmptyIndex
> - testMergeJoinOutPipeline
> - testExpressionFail
>
>
> Thanks,
>
> Praveen Rachabattuni
>
>
Re: Review Request 32036: PIG-4422: Implement visitMergeJoin in
SparkCompiler
Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32036/#review78099
-----------------------------------------------------------
Thanks, Praveen! Looks good to me. I have some comments about error handling.
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java
<https://reviews.apache.org/r/32036/#comment126549>
you may want to leave it private, and add a public getter instead. Also, pig-core changes like this needs to go in a separate patch, right ?
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126545>
We should check the return status for lrOut. Plus, detaching the input after getNextTuple() call is a good practice:
See POMergeJoin.extractKeysFromTuple():
lr.detachInput();
if(lrOut.returnStatus!=POStatus.STATUS_OK){
int errCode = 2167;
String errMsg = "LocalRearrange used to extract keys from tuple isn't configured correctly";
throw new ExecException(...);
}
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126546>
Let's not eat up these exceptions. Please log an error and re-throw the exception.
LOG.error(msg + e, e)
throw new ExecException(msg, e)
src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java
<https://reviews.apache.org/r/32036/#comment126548>
Same here. Log and re-throw.
src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java
<https://reviews.apache.org/r/32036/#comment126550>
i never quite understood what these error codes mean or a central place where these are defined...
- Mohit Sabharwal
On March 13, 2015, 11:18 a.m., Praveen Rachabattuni wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32036/
> -----------------------------------------------------------
>
> (Updated March 13, 2015, 11:18 a.m.)
>
>
> Review request for pig, liyun zhang and Mohit Sabharwal.
>
>
> Bugs: PIG-4422
> https://issues.apache.org/jira/browse/PIG-4422
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> POMergeJoin operator is added as parent to load operators and a regular join is performed as part of the initial implementation and the MergeJoinConverter should later be modified to achieve the specialized join.
>
> TODO:
> - Perform join considering the input data is sorted.
> - Fix failing test cases in TestMergeJoin
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 87249e4c9d6c890e8ac864c3faea32e3d6aa872d
> src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java ca7a45f33320064e22628b40b34be7b9f7b07c36
> src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java PRE-CREATION
> src/org/apache/pig/backend/hadoop/executionengine/spark/plan/SparkCompiler.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/32036/diff/
>
>
> Testing
> -------
>
> Tested TestMergeJoin and we now have all tests passing except the following:
> - testMergeJoinWithCommaSeperatedFilePaths
> - testMergeJoinEmptyIndex
> - testMergeJoinOutPipeline
> - testExpressionFail
>
>
> Thanks,
>
> Praveen Rachabattuni
>
>