You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2015/05/07 07:30:00 UTC

[jira] [Commented] (PIG-4422) Implement visitMergeJoin in SparkCompiler

    [ https://issues.apache.org/jira/browse/PIG-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532035#comment-14532035 ] 

Mohit Sabharwal commented on PIG-4422:
--------------------------------------

fyi [~kellyzly], [~praveenr019], [~xuefuz]

Attached patch implements merge join in Spark engine as regular
join.

There seem to be three flavors of Merge Join (aka Sort Merge Join) 
in Pig as described here: http://pig.apache.org/docs/r0.10.0/perf.html#merge-joins
 1) Inner join with at most 2 tables.
 2) Outer join (full, left, right) with at most 2 tables. Inner Join with 3+ tables.
 3) Sparse Merge join

This patch addresses 1) only.

Both 2) and 3) require input loadfuncs to implement certain interfaces. And since Spark
engine has not implemented merge join algorithm, it cannot take advantage to these
interfaces. As such, this patch disables those tests for now.

> Implement visitMergeJoin in SparkCompiler
> -----------------------------------------
>
>                 Key: PIG-4422
>                 URL: https://issues.apache.org/jira/browse/PIG-4422
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4422.patch
>
>
> in PIG-4374_6.patch. SparkCompiler#visitMergeJoin is marked "TODO"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)