You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Vidya Sujeet <sj...@gmail.com> on 2017/02/28 17:17:40 UTC

toDebugString Vs Spark UI

Hi,

How detailed is toDebugString method compared to SparkUI? The below RDD
join doesn't show any shuffling when I do a debugString. However, when I
run the command, the UI shows some shuffle read and write. Can anyone
explain why the difference and which is accurate? Is my code below also
doing shuffling at some point? I have attached the screenshot as well.



RDD: com.datastax.spark.connector.rdd.CassandraTableScanRDD[((Int, Int),
(String, String, Int, Int, Int, Int))] = CassandraTableScanRDD[2] at RDD at
CassandraRDD.scala:15

RDD2: com.datastax.spark.connector.rdd.CassandraTableScanRDD[((Int, Int),
(Int, Int))] = CassandraTableScanRDD[5] at RDD at CassandraRDD.scala:15

RDD3: com.datastax.spark.connector.rdd.CassandraTableScanRDD[((Int,), (Int,
String))] = CassandraTableScanRDD[11] at RDD at CassandraRDD.scala:15

val joinedRDD = RDD.leftOuterJoin(RDD2)

val mjrdd = joinedRDD.map { x => (Tuple1(x._1._1), x._2) }

val result = mjrdd.leftOuterJoin(RDD3)

scala> result.toDebugString

scala>result.take(10).foreach(println)

res19: String = (6) MapPartitionsRDD[27] at leftOuterJoin at <console>:66 []

 |  MapPartitionsRDD[26] at leftOuterJoin at <console>:66 []

 |  CoGroupedRDD[25] at leftOuterJoin at <console>:66 []

 +-(6) MapPartitionsRDD[21] at map at <console>:60 []

 |  |  MapPartitionsRDD[17] at leftOuterJoin at <console>:58 []

 |  |  MapPartitionsRDD[16] at leftOuterJoin at <console>:58 []

 |  |  CoGroupedRDD[15] at leftOuterJoin at <console>:58 []

 |  +-(6) CassandraTableScanRDD[2] at RDD at CassandraRDD.scala:15 []

 |  +-(6) CassandraTableScanRDD[5] at RDD at CassandraRDD.scala:15 []

 +-(6) CassandraTableScanRDD[11] at RDD at CassandraRDD.scala:15 []