You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2015/04/29 02:46:07 UTC

[jira] [Resolved] (PIG-4518) SparkOperator should correspond to complete Spark job

     [ https://issues.apache.org/jira/browse/PIG-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang resolved PIG-4518.
------------------------------
    Resolution: Fixed

Committed to Spark branch. Thanks, Mohit.

> SparkOperator should correspond to complete Spark job
> -----------------------------------------------------
>
>                 Key: PIG-4518
>                 URL: https://issues.apache.org/jira/browse/PIG-4518
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4518.1.patch, PIG-4518.patch
>
>
> SparkPlan, which was added in PIG-4374, creates a new SparkOperator for every shuffle boundary (denoted by presence of POGlobalRearrange in the corresponding physical plan).  This is unnecessary for Spark engine since it relies on Spark to do the shuffle (using groupBy(), reduceByKey() and CoGroupRDD) and does not need to explicitly identify "map" and "reduce" operations.
> It is also cleaner if a single SparkOperator represents a single complete Spark job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)