You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nikolay Skovpin <ko...@gmail.com> on 2018/08/07 14:47:43 UTC
Dynamic partitioning weird behavior
Hi guys.
I was investigating a spark property
/spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")/. It
works perfectly in local fs, but on s3 i stumbled into a strange behavior.
If i don't have a hive table or this table is empty, spark won't save any
data into this table with SaveMode.Overwrite.
What i did:
import org.apache.spark.sql.{SaveMode, SparkSession}
val spark = SparkSession.builder()
.appName("Test for dynamic partitioning")
.config("spark.sql.sources.partitionOverwriteMode", "dynamic")
.getOrCreate()
val users = Seq(
("11", "Nikolay", "1900", "1"),
("12", "Nikolay", "1900", "1"),
("13", "Sergey", "1901", "1"),
("14", "Jone", "1900", "2"))
.toDF("user_id", "name","year", "month")
users.write.partitionBy("year",
"month").mode(SaveMode.Overwrite).option("path",
"s3://dynamicPartitioning/users").saveAsTable("test.users")
I can see from logs that spark populates .spark-staging directory with the
data, then spark executes rename command.
But AlterTableRecoverPartitionsCommand shows me a message: /Found 0
partitions, Finished to gather the fast stats for all 0 partitions/. After
that the directory on s3 is empty (except _Sussess flag).
It is ok or a bug?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org