You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Srikanth <sr...@gmail.com> on 2016/09/08 19:31:44 UTC
"Job duration" and "Processing time" don't match
Hello,
I was looking at Spark streaming UI and noticed a big difference between
"Processing time" and "Job duration"
[image: Inline image 1]
Processing time/Output Op duration is show as 50s but sum of all job
duration is ~25s.
What is causing this difference? Based on logs I know that the batch
actually took 50s.
[image: Inline image 2]
The job that is taking most of time is
joinRDD.toDS()
.write.format("com.databricks.spark.csv")
.mode(SaveMode.Append)
.options(Map("mode" -> "DROPMALFORMED", "delimiter" -> "\t",
"header" -> "false"))
.partitionBy("entityId", "regionId", "eventDate")
.save(outputPath)
Removing SaveMode.Append really speeds things up and also the mismatch
between Job duration and processing time disappears.
I'm not able to explain what is causing this though.
Srikanth