You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Allen Liang (JIRA)" <ji...@apache.org> on 2016/01/07 08:51:39 UTC

[jira] [Created] (SPARK-12691) Multiple unionAll on Dataframe seems to cause repeated calculations in a "Fibonacci" manner

Allen Liang created SPARK-12691:
-----------------------------------

             Summary: Multiple unionAll on Dataframe seems to cause repeated calculations in a "Fibonacci" manner
                 Key: SPARK-12691
                 URL: https://issues.apache.org/jira/browse/SPARK-12691
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
         Environment: Tested in Spark 1.3 and 1.4.
            Reporter: Allen Liang


Multiple unionAll on Dataframe seems to cause repeated calculations. Here is the sample code to reproduce this issue.

val dfs = for (i<-0 to 100) yield {
  val df = sc.parallelize((0 to 10).zipWithIndex).toDF("A", "B")
  df
}

var i = 1
val s1 = System.currentTimeMillis()
dfs.reduce{(a,b)=>{
  val t1 = System.currentTimeMillis()
  val dd = a unionAll b
  val t2 = System.currentTimeMillis()
  println("Round " + i + " unionAll took " + (t2 - t1) + " ms")
  i = i + 1
  dd
  }
}
val s2 = System.currentTimeMillis()
println((i - 1) + " unionAll took totally " + (s2 - s1) + " ms")

And it printed as follows. And as you can see, it looks like each unionAll seems to redo all the previous unionAll and therefore took self time plus all previous time, which, not precisely speaking, makes each unionAll look like a "Fibonacci" action.

BTW, this behaviour doesn't happen if I directly union all the RDDs in Dataframe.

----- output start ----
Round 1 unionAll took 1 ms
Round 2 unionAll took 1 ms
Round 3 unionAll took 1 ms
Round 4 unionAll took 1 ms
Round 5 unionAll took 1 ms
Round 6 unionAll took 1 ms
Round 7 unionAll took 1 ms
Round 8 unionAll took 2 ms
Round 9 unionAll took 2 ms
Round 10 unionAll took 2 ms
Round 11 unionAll took 3 ms
Round 12 unionAll took 3 ms
Round 13 unionAll took 3 ms
Round 14 unionAll took 3 ms
Round 15 unionAll took 3 ms
Round 16 unionAll took 4 ms
Round 17 unionAll took 4 ms
Round 18 unionAll took 4 ms
Round 19 unionAll took 4 ms
Round 20 unionAll took 4 ms
Round 21 unionAll took 5 ms
Round 22 unionAll took 5 ms
Round 23 unionAll took 5 ms
Round 24 unionAll took 5 ms
Round 25 unionAll took 5 ms
Round 26 unionAll took 6 ms
Round 27 unionAll took 6 ms
Round 28 unionAll took 6 ms
Round 29 unionAll took 6 ms
Round 30 unionAll took 6 ms
Round 31 unionAll took 6 ms
Round 32 unionAll took 7 ms
Round 33 unionAll took 7 ms
Round 34 unionAll took 7 ms
Round 35 unionAll took 7 ms
Round 36 unionAll took 7 ms
Round 37 unionAll took 8 ms
Round 38 unionAll took 8 ms
Round 39 unionAll took 8 ms
Round 40 unionAll took 8 ms
Round 41 unionAll took 9 ms
Round 42 unionAll took 9 ms
Round 43 unionAll took 9 ms
Round 44 unionAll took 9 ms
Round 45 unionAll took 9 ms
Round 46 unionAll took 9 ms
Round 47 unionAll took 9 ms
Round 48 unionAll took 9 ms
Round 49 unionAll took 10 ms
Round 50 unionAll took 10 ms
Round 51 unionAll took 10 ms
Round 52 unionAll took 10 ms
Round 53 unionAll took 11 ms
Round 54 unionAll took 11 ms
Round 55 unionAll took 11 ms
Round 56 unionAll took 12 ms
Round 57 unionAll took 12 ms
Round 58 unionAll took 12 ms
Round 59 unionAll took 12 ms
Round 60 unionAll took 12 ms
Round 61 unionAll took 12 ms
Round 62 unionAll took 13 ms
Round 63 unionAll took 13 ms
Round 64 unionAll took 13 ms
Round 65 unionAll took 13 ms
Round 66 unionAll took 14 ms
Round 67 unionAll took 14 ms
Round 68 unionAll took 14 ms
Round 69 unionAll took 14 ms
Round 70 unionAll took 14 ms
Round 71 unionAll took 14 ms
Round 72 unionAll took 14 ms
Round 73 unionAll took 14 ms
Round 74 unionAll took 15 ms
Round 75 unionAll took 15 ms
Round 76 unionAll took 15 ms
Round 77 unionAll took 15 ms
Round 78 unionAll took 16 ms
Round 79 unionAll took 16 ms
Round 80 unionAll took 16 ms
Round 81 unionAll took 16 ms
Round 82 unionAll took 17 ms
Round 83 unionAll took 17 ms
Round 84 unionAll took 17 ms
Round 85 unionAll took 17 ms
Round 86 unionAll took 17 ms
Round 87 unionAll took 18 ms
Round 88 unionAll took 17 ms
Round 89 unionAll took 18 ms
Round 90 unionAll took 18 ms
Round 91 unionAll took 18 ms
Round 92 unionAll took 18 ms
Round 93 unionAll took 18 ms
Round 94 unionAll took 19 ms
Round 95 unionAll took 19 ms
Round 96 unionAll took 20 ms
Round 97 unionAll took 20 ms
Round 98 unionAll took 20 ms
Round 99 unionAll took 20 ms
Round 100 unionAll took 20 ms
100 unionAll took totally 1337 ms

----- output end ----






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org