You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/02/03 23:19:07 UTC

[jira] [Assigned] (PIG-3727) Fix split + skewed join

     [ https://issues.apache.org/jira/browse/PIG-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy reassigned PIG-3727:
---------------------------------------

    Assignee: Rohini Palaniswamy  (was: Cheolsoo Park)

Also need to do ONE_ONE edge and PoIdentityInOutTez similar to PIG-3732. Without that Orderby with roundrobin partitioner was taking way more time than MR (I aborted running it when it had taken 30 mins more than MR and still had the last 40 reducers pending).

> Fix split + skewed join
> -----------------------
>
>                 Key: PIG-3727
>                 URL: https://issues.apache.org/jira/browse/PIG-3727
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Rohini Palaniswamy
>             Fix For: tez-branch
>
>
> The e2e SkewedJoin_6 test runs the following query-
> {code}
> a = load ':INPATH:/singlefile/studenttab10k';
> b = filter a by $1 > 25;
> c = join a by $0, b by $0 using 'skewed' parallel 7;
> store c into ':OUTPATH:';
> {code}
> Currently, this fails with a compilation error in TezCompiler. Basically, visitSkewedJoin() doesn't handle the POSplit that is inserted between load and join.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)