You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xianda Ke (JIRA)" <ji...@apache.org> on 2017/03/16 01:33:42 UTC

[jira] [Updated] (PIG-4858) Implement Skewed join for spark engine

     [ https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xianda Ke updated PIG-4858:
---------------------------
    Attachment: PIG-4858_4.patch

Hi [~kellyzly],  please help review this patch when your are free.

1.	based on PIG-5044. rewrite SparkCompiler.getSkewedJoinJob(), broadcasting the sampling index.  Some code is duplicated with SparkComiler.getSamplingJob(), because SparkComiler.getSamplingJob() is too big. it needs to be refactored. I will file a new jira for this.

2. merge the fix from PIG-3417

3. currently, skewed join does not support outer join.  I will file a new jira for this. 


> Implement Skewed join for spark engine
> --------------------------------------
>
>                 Key: PIG-4858
>                 URL: https://issues.apache.org/jira/browse/PIG-4858
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>         Attachments: PIG-4858_2.patch, PIG-4858_3.patch, PIG-4858_4.patch, PIG-4858.patch, SkewedJoinInSparkMode.pdf
>
>
> Now we use regular join to replace skewed join. Need implement it 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)