You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by yohann jardin <yo...@hotmail.com> on 2017/03/28 08:59:02 UTC

Writing dataframe to a final path using another temporary path

Hello,



I’m using spark 2.1.

Once a job completes, I want to write a Parquet file to, let’s say, the folder /user/my_user/final_path/


However, I have other jobs reading files in that specific folder, so I need those files to be completely written when there are in that folder.

So while the file is written, I need it to be written in a temporary location like /user/my_user/tmp_path/, the path of my application or any other path that could be temporary. Once fully written, that file can then be moved to the real destination folder /user/my_user/final_path/


So I was wondering, is this the default behavior? If not, did I miss an option to do so? I looked in the documentation and in org.apache.spark.sql.execution.datasources.parquet.ParquetOptions.scala but I can’t find any information about this.

Or else, should I save by myself to a temporary location and then move the file to the right location?


Any input is greatly appreciated,

Yohann