You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "golokeshpatra.patra" <go...@gmail.com> on 2020/08/19 07:43:47 UTC

Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

Adding this simple setting helped me overcome the issue - 

*spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
*
My Issue - 

In a S3 Folder, I previously had data partitionedBy - *ingestiontime* .
Now I wanted to reprocess this data and partition it by - 
businessname & ingestiontime.

Whenever I was writing my dataframe in OverWrite Mode, 
All my data, which was present prior to this operation were
TRUNCATED/DELETED.

After setting the above spark configuration,
Only the required Partitions are being truncated and overwritten and all
others stay Intact.

In addition to this, if you have hadoop Trash Enabled, then you might be
able to fetch this lost data back.
For more -
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#File_Deletes_and_Undeletes



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org