You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2017/05/25 21:57:04 UTC
shuffle write is very slow
Hi,
I am converting hive job with spark job. I have tested on small set and
logic is correct in hive and spark.
when i started testing on large data, spark is very slow when compared to
hive.
shuffle write is taking long time. any suggestions?
I am creating temporary table in spark and overwriting hive table with
partitions from that temporary table created on spark.
dataframe_transposed.registerTempTable(srcTable)
import sqlContext._
import sqlContext.implicits._
val query=s"INSERT OVERWRITE TABLE ${destTable} SELECT * from
${srcTable}"
println(s"INSERT OVERWRITE TABLE ${destTable} SELECT * from
${srcTable}")
logger.info(s"Executing Query ${query}")
sqlContext.sql(query)
total size of dataframe is around 190 GB and it is running for ever in this
case while hive job can be completed in 4 hours.
Thanks,
Asmath.