You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2015/05/13 03:59:01 UTC
[jira] [Commented] (PIG-4549) Set CROSS operation parallelism for
Spark engine
[ https://issues.apache.org/jira/browse/PIG-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541187#comment-14541187 ]
Mohit Sabharwal commented on PIG-4549:
--------------------------------------
FYI: [~kellyzly], [~praveenr019], [~xuefuz]
This patch address regular CROSS using GFCross UDF for Spark engine.
Adds a ParallelismSetter visitor which visits each Spark operator in Spark plan
and sets the key "PigImplConstants.PIG_CROSS_PARALLELISM + "." + crossKey".
Rest of the implementation of ParallelismSetter is a future item.
> Set CROSS operation parallelism for Spark engine
> ------------------------------------------------
>
> Key: PIG-4549
> URL: https://issues.apache.org/jira/browse/PIG-4549
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Affects Versions: spark-branch
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
>
> Spark engine should set parallelism to be used for CROSS operation by GFCross UDF.
> If not set, GFCross throws an exception:
> {code}
> String s = cfg.get(PigImplConstants.PIG_CROSS_PARALLELISM + "." + crossKey);
> if (s == null) {
> throw new IOException("Unable to get parallelism hint from job conf");
> }
> {code}
> Estimating parallelism for Spark engine is a TBD item. Until that is done, for CROSS to work, we should use the default parallelism value in GFCross.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)