You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Prashant Sharma (JIRA)" <ji...@apache.org> on 2017/06/22 10:54:00 UTC
[jira] [Created] (SPARK-21177) Append to hive slows down linearly,
with number of appends.
Prashant Sharma created SPARK-21177:
---------------------------------------
Summary: Append to hive slows down linearly, with number of appends.
Key: SPARK-21177
URL: https://issues.apache.org/jira/browse/SPARK-21177
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.0
Reporter: Prashant Sharma
In short, please use the following shell transcript for the reproducer.
{code:java}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
scala> def printTimeTaken(str: String, f: () => Unit) {
val start = System.nanoTime()
f()
val end = System.nanoTime()
val timetaken = end - start
import scala.concurrent.duration._
println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
}
| | | | | | | printTimeTaken: (str: String, f: () => Unit)Unit
scala>
for(i <- 1 to 10000) {printTimeTaken("time to append to hive:", () => { Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
Time taken for time to append to hive: is 284
Time taken for time to append to hive: is 211
...
...
Time taken for time to append to hive: is 2615
Time taken for time to append to hive: is 3055
Time taken for time to append to hive: is 22425
....
{code}
Why does it matter ?
In a streaming job it is not possible to append to hive using this dataframe operation.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org