You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Parsian, Mahmoud" <mp...@illumina.com> on 2017/03/11 06:33:23 UTC
How to improve performance of saveAsTextFile()
How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“).
This is taking over 30 minutes on a cluster of 10 nodes.
Running Spark on YARN.
JavaRDD<String> has 120 million entries.
Thank you,
Best regards,
Mahmoud
Re: How to improve performance of saveAsTextFile()
Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.
How about increasing RDD's partitions / rebalancing data?
On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud <mp...@illumina.com>
wrote:
> How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD<String> has 120 million entries.
>
> Thank you,
> Best regards,
> Mahmoud
>