You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divya Narayan <na...@gmail.com> on 2019/06/26 12:22:15 UTC

hadoop replication property from spark code not working

Hi,

I have a use case for which I want to override the default hdfs replication
factor from my spark code. For this I have set the hadoop replication like
this:

val sc = new SparkContext(conf)
sc.hadoopConfiguration.set('dfs.replication','1').

Now my spark job runs as a cron job in some specific interval and create
output directory for corresponding hour. Problem I am facing is that for
80% of the runs,the  files are created with replication factor 1(which is
desired), but for rest 20% case files are created with default replication
factor 2. I am not sure why that is happening. Any help would be
appreciated.

Thank you
Divya