You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by alvarobrandon <al...@gmail.com> on 2016/05/30 10:48:00 UTC
DAG of Spark Sort application spanning two jobs
I've written a very simple Sort scala program with Spark.
/object Sort {
def main(args: Array[String]): Unit = {
if (args.length < 2) {
System.err.println("Usage: Sort <data_file> <save_file>" +
" [<slices>]")
System.exit(1)
}
val conf = new SparkConf().setAppName("BigDataBench Sort")
val spark = new SparkContext(conf)
val logger = new JobPropertiesLogger(spark,"/home/abrandon/log.csv")
val filename = args(0)
val save_file = args(1)
var splits = spark.defaultMinPartitions
if (args.length > 2){
splits = args(2).toInt
}
val lines = spark.textFile(filename, splits)
logger.start_timer()
val data_map = lines.map(line => {
(line, 1)
})
val result = data_map.sortByKey().map { line => line._1}
logger.stop_timer()
logger.write_log("Sort By Key: Sort App")
result.saveAsTextFile(save_file)
println("Result has been saved to: " + save_file)
}
}/
Now, I was thinking that since there is only one wide transformation
("sortByKey") two stages will be spanned. However I see two jobs with one
stage in Job 0 and two stages for Job 1. Am I missing something?. What I
don't get is the first stage of the second job. it seems to do the same job
as the stage of Job 0.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/cbKDZ.png>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png>
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spark-Sort-application-spanning-two-jobs-tp27047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org