You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divya Narayan <na...@gmail.com> on 2019/06/26 12:22:15 UTC
hadoop replication property from spark code not working
Hi,
I have a use case for which I want to override the default hdfs replication
factor from my spark code. For this I have set the hadoop replication like
this:
val sc = new SparkContext(conf)
sc.hadoopConfiguration.set('dfs.replication','1').
Now my spark job runs as a cron job in some specific interval and create
output directory for corresponding hour. Problem I am facing is that for
80% of the runs,the files are created with replication factor 1(which is
desired), but for rest 20% case files are created with default replication
factor 2. I am not sure why that is happening. Any help would be
appreciated.
Thank you
Divya