You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Ulanov, Alexander" <al...@hp.com> on 2015/06/23 03:44:29 UTC

Force Spark save parquet files with replication factor other than 3 (default one)

Hi,

My Hadoop is configured to have replication ratio = 2. I've added $HADOOP_HOME/config to the PATH as suggested in http://apache-spark-user-list.1001560.n3.nabble.com/hdfs-replication-on-saving-RDD-td289.html. Spark (1.4) does rdd.saveAsTextFile with replication=2. However DataFrame.saveAsParquet is done with replication = 3. How can I force Spark Dataframe to save parquet files with replication factor other than 3 (default one)?

Best regards, Alexander