You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xianda Ke (JIRA)" <ji...@apache.org> on 2017/03/16 01:33:42 UTC
[jira] [Updated] (PIG-4858) Implement Skewed join for spark engine
[ https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xianda Ke updated PIG-4858:
---------------------------
Attachment: PIG-4858_4.patch
Hi [~kellyzly], please help review this patch when your are free.
1. based on PIG-5044. rewrite SparkCompiler.getSkewedJoinJob(), broadcasting the sampling index. Some code is duplicated with SparkComiler.getSamplingJob(), because SparkComiler.getSamplingJob() is too big. it needs to be refactored. I will file a new jira for this.
2. merge the fix from PIG-3417
3. currently, skewed join does not support outer join. I will file a new jira for this.
> Implement Skewed join for spark engine
> --------------------------------------
>
> Key: PIG-4858
> URL: https://issues.apache.org/jira/browse/PIG-4858
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4858_2.patch, PIG-4858_3.patch, PIG-4858_4.patch, PIG-4858.patch, SkewedJoinInSparkMode.pdf
>
>
> Now we use regular join to replace skewed join. Need implement it
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)