You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Parsian, Mahmoud" <mp...@illumina.com> on 2017/03/11 06:33:23 UTC

How to improve performance of saveAsTextFile()

How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“).
This is taking over 30 minutes on a cluster of 10 nodes.
Running Spark on YARN.

JavaRDD<String> has 120 million entries.

Thank you,
Best regards,
Mahmoud

Re: How to improve performance of saveAsTextFile()

Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.
How about increasing RDD's partitions / rebalancing data?

On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud <mp...@illumina.com>
wrote:

> How to improve performance of JavaRDD<String>.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD<String> has 120 million entries.
>
> Thank you,
> Best regards,
> Mahmoud
>