You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by kachau <um...@gmail.com> on 2015/07/10 18:48:20 UTC

dataFrame.colaesce(1) or dataFrame.reapartition(1) does not seem work for me

Hi I have Hive insert into query which creates new Hive partitions. I have
two Hive partitions named server and date. Now I execute insert into queries
using the following code and try to save it

DataFrame dframe = hiveContext.sql("insert into summary1
partition(server='a1',date='2015-05-22') select from sourcetbl bla bla")
//above query creates orc file at /user/db/a1/20-05-22
 // I want only one part-00000 file at the end of above query so I tried the
following and none worked
drame.coalesce(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1");
drame.repartition(1).write().format("orc").mode(SaveMode.OverWrite).saveAsTable("summary1");
drame.coalesce(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite);
drame.repartition(1).write().format("orc").save("/user/db/a1/20-05-22",SaveMode.OverWrite);

No matter I use coalesce or reparition above query creates around 200 files
at the location /user/db/a1/20-05-22. I was thinking if I call coalesce(1)
then it will create final one part file. Am I wrong?

Please guide. Thanks in advance.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/dataFrame-colaesce-1-or-dataFrame-reapartition-1-does-not-seem-work-for-me-tp23769.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org