You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Srikanth <sr...@gmail.com> on 2016/09/08 19:31:44 UTC

"Job duration" and "Processing time" don't match

Hello,

I was looking at Spark streaming UI and noticed a big difference between
"Processing time" and "Job duration"

[image: Inline image 1]

Processing time/Output Op duration is show as 50s but sum of all job
duration is ~25s.
What is causing this difference? Based on logs I know that the batch
actually took 50s.

[image: Inline image 2]

The job that is taking most of time is
    joinRDD.toDS()
           .write.format("com.databricks.spark.csv")
           .mode(SaveMode.Append)
           .options(Map("mode" -> "DROPMALFORMED", "delimiter" -> "\t",
"header" -> "false"))
           .partitionBy("entityId", "regionId", "eventDate")
           .save(outputPath)

Removing SaveMode.Append really speeds things up and also the mismatch
between Job duration and processing time disappears.
I'm not able to explain what is causing this though.

Srikanth