You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Madhavi Vaddepalli (JIRA)" <ji...@apache.org> on 2017/08/07 06:47:02 UTC
[jira] [Created] (SPARK-21650) Insert into hive partitioned table
from spark-sql taking hours to complete
Madhavi Vaddepalli created SPARK-21650:
------------------------------------------
Summary: Insert into hive partitioned table from spark-sql taking hours to complete
Key: SPARK-21650
URL: https://issues.apache.org/jira/browse/SPARK-21650
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.0
Environment: Linux machines
Spark version - 1.6.0
Hive Version - 1.1
200- number of executors.
3 - number of executor cores.
10g - executor and driver memory.
dynamic allocation enabled.
Reporter: Madhavi Vaddepalli
We are trying to execute some logic using spark sql:
Input to program : 7 billion records. (60 gb gzip compressed,text format)
Output : 7 billion records.(260 gb gzip compressed and partitioned on few columns)
output has 10000 partitions(it has 10000 different combinations of partition columns)
We are trying to insert this output to a hive table. (text format , gzip compressed)
All the tasks spawned finished completely in 33 minutes and all the executors are de-commissioned, only driver is active.*It remained in this state without showing any active stage or task in spark UI for about 2.5 hrs. *and completed successfully.
Please let us know what can be done to improve the performance here.(is it fixed in later versions ?)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org