You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Femi Anthony <fe...@gmail.com> on 2019/06/11 12:50:26 UTC

AWS EMR slow write to HDFS

I'm writing a large dataset in Parquet format to HDFS using Spark and it runs rather slowly in EMR vs say Databricks. I realize that if I was able to use Hadoop 3.1, it would be much more performant because it has a high performance output committer. Is this the case, and if so - when will there be a version of EMR that uses Hadoop 3.1 ? The current version I'm using is 5.21.
Sent from my iPhone
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org