You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by alvarobrandon <al...@gmail.com> on 2016/05/30 10:48:00 UTC

DAG of Spark Sort application spanning two jobs

I've written a very simple Sort scala program with Spark.

/object Sort {

    def main(args: Array[String]): Unit = {
        if (args.length < 2) {
            System.err.println("Usage: Sort <data_file> <save_file>" +
            " [<slices>]")
            System.exit(1)
        }


        val conf = new SparkConf().setAppName("BigDataBench Sort")
        val spark = new SparkContext(conf)
        val logger = new JobPropertiesLogger(spark,"/home/abrandon/log.csv")
        val filename = args(0)
        val save_file = args(1)
        var splits = spark.defaultMinPartitions
        if (args.length > 2){
            splits = args(2).toInt
        }
        val lines = spark.textFile(filename, splits)
        logger.start_timer()
        val data_map = lines.map(line => {
            (line, 1)
        })

        val result = data_map.sortByKey().map { line => line._1}
        logger.stop_timer()
        logger.write_log("Sort By Key: Sort App")
        result.saveAsTextFile(save_file)

        println("Result has been saved to: " + save_file)
    }

}/


Now, I was thinking that since there is only one wide transformation
("sortByKey") two stages will be spanned. However I see two jobs with one
stage in Job 0 and two stages for Job 1. Am I missing something?. What I
don't get is the first stage of the second job. it seems to do the same job
as the stage of Job 0.

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/cbKDZ.png> 
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png> 
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png> 





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spark-Sort-application-spanning-two-jobs-tp27047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org