You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2015/05/13 03:59:01 UTC

[jira] [Commented] (PIG-4549) Set CROSS operation parallelism for Spark engine

    [ https://issues.apache.org/jira/browse/PIG-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541187#comment-14541187 ] 

Mohit Sabharwal commented on PIG-4549:
--------------------------------------

FYI: [~kellyzly], [~praveenr019], [~xuefuz]

This patch address regular CROSS using GFCross UDF for Spark engine. 

Adds a ParallelismSetter visitor which visits each Spark operator in Spark plan
and sets the key "PigImplConstants.PIG_CROSS_PARALLELISM + "." + crossKey".

Rest of the implementation of ParallelismSetter is a future item.

> Set CROSS operation parallelism for Spark engine
> ------------------------------------------------
>
>                 Key: PIG-4549
>                 URL: https://issues.apache.org/jira/browse/PIG-4549
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>
> Spark engine should set parallelism to be used for CROSS operation by GFCross UDF.
> If not set, GFCross throws an exception:
> {code}
>                 String s = cfg.get(PigImplConstants.PIG_CROSS_PARALLELISM + "." + crossKey);
>                 if (s == null) {
>                     throw new IOException("Unable to get parallelism hint from job conf");
>                 }
> {code}
> Estimating parallelism for Spark engine is a TBD item. Until that is done, for CROSS to work, we should use the default parallelism value in GFCross.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)