You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jork Zijlstra (JIRA)" <ji...@apache.org> on 2017/02/16 13:07:41 UTC
[jira] [Comment Edited] (SPARK-19628) Duplicate Spark jobs in 2.1.0

    [ https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869880#comment-15869880 ] 

Jork Zijlstra edited comment on SPARK-19628 at 2/16/17 1:07 PM:
----------------------------------------------------------------

I have just attached a screenshot which contains duplicate jobs when executing the above given example code. 

The example code uses show(), but in our application we use collect(). Both seem to trigger this duplication. 
The issue is that both jobs take time (they are executed sequentially), so the execution time has doubled for the same action.


was (Author: jzijlstra):
I have just attached a screenshot which contains duplicate jobs when executing the above given example code. 

The example code uses show(), but in our application we use collect(). Both seem to trigger this duplication. 
The issue is that both jobs take time, so the execution time has doubled for the same action.

> Duplicate Spark jobs in 2.1.0
> -----------------------------
>
>                 Key: SPARK-19628
>                 URL: https://issues.apache.org/jira/browse/SPARK-19628
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Jork Zijlstra
>             Fix For: 2.0.1
>
>         Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, spark2.1.0.png
>
>
> After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs executed. Going back to Spark 2.0.1 they are gone again
> {code}
> import org.apache.spark.sql._
> object DoubleJobs {
>   def main(args: Array[String]) {
>     System.setProperty("hadoop.home.dir", "/tmp");
>     val sparkSession: SparkSession = SparkSession.builder
>       .master("local[4]")
>       .appName("spark session example")
>       .config("spark.driver.maxResultSize", "6G")
>       .config("spark.sql.orc.filterPushdown", true)
>       .config("spark.sql.hive.metastorePartitionPruning", true)
>       .getOrCreate()
>     sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
>     val paths = Seq(
>       ""//some orc source
>     )
>     def dataFrame(path: String): DataFrame = {
>       sparkSession.read.orc(path)
>     }
>     paths.foreach(path => {
>       dataFrame(path).show(20)
>     })
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org