You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Artur Sukhenko <ar...@gmail.com> on 2019/06/24 14:45:41 UTC

Spark locking Hive partition

Hi,
I have Spark streaming app(1m batch) writing parquet data to a partition
e.g.
val hdfsPath = s"$dbPath/$tableName/year=$year/month=$month/day=$day"

df.write.mode(SaveMode.Append).parquet(hdfsPath)

I wonder would I lose data if I overwrite this partition with Hive
(compaction/deduplication) while Spark is adding more data to it every
minute. (hive query can take > 2 minutes)

Thanks,
Artur Sukhenko