You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sa...@wellsfargo.com on 2016/07/29 18:21:18 UTC

multiple SPARK_LOCAL_DIRS causing strange behavior in parallelism

Hi all,

I was currently playing around with spark-env around SPARK_LOCAL_DIRS in order to add additional shuffle storage.

But since I did this, I am getting too many open files error if total executor cores is high. I am also getting low parallelism, by monitoring the running tasks on some big job, most tasks run on the driver host, and very limited in other nodes, while using ANY locality.

Generally speaking, Could I be doing anything wrong regarding this setting?

I am setting on each node, local different phyisical hard drives to store shuffle information. Returning this configuration to a single folder storage on each node, everything runs normally

Thanks,
Saif