You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2014/09/26 01:20:19 UTC

Shuffle files

Hi,

I am using Spark 1.1.0 on a cluster. My job takes as input 30 files in a
directory (I am using sc.textfile("dir/*") ) to read in the files. I am
getting the following warning:

WARN TaskSetManager: Lost task 99.0 in stage 1.0 (TID 99,
mesos12-dev.sccps.net): java.io.FileNotFoundException:
/tmp/spark-local-20140925215712-0319/12/shuffle_0_99_93138 (Too many open
files)

basically I think a lot of shuffle files are being created.

1) The tasks eventually fail and the job just hangs (after taking very long,
more than an hour). If I read these 30 files in a for loop, the same job
completes in a few minutes. However, I need to specify the files names,
which is not convenient. I am assuming that sc.textfile("dir/*") creates a
large RDD for all the 30 files. Is there a way to make the operation on this
large RDD efficient so as to avoid creating too many shuffle files?

2) Also, I am finding that all the shuffle files for my other completed jobs
are not being automatically deleted even after days. I thought that
sc.stop() clears the intermediate files. Is there some way to
programmatically delete these temp shuffle files upon job completion?

thanks

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: Shuffle files

Posted by "Shao, Saisai" <sa...@intel.com>.

Hi Song,

For what I know in sort-based shuffle.

Normally parallel opened file numbers for sort-based shuffle is much smaller than hash-based shuffle.

In hash based shuffle, parallel opened file numbers is C * R (where C is core number used and R is the reducer number), as you can see the file numbers are related to reducer number, no matter how large the shuffle size is.

While in sort-based shuffle, final map output file is only 1, to achieve this we need to do by-partition sorting, this will generate some intermediate spilling files, but spilled file numbers are related to shuffle size and memory size for shuffle, no relation to reducer number.

So If you met “too many open files” in sort-based shuffle, I guess that you have so many spilled files while doing shuffle write, one possible way to alleviate this is to increase the shuffle memory usage, also change the ulimit is a possible way.

I guess in Yarn you have to do system configuration manually, Spark cannot set ulimit automatically for you, I don’t think it’s an issue Spark should take care.

Thanks
Jerry

From: Chen Song [mailto:chen.song.82@gmail.com]
Sent: Tuesday, October 21, 2014 9:10 AM
To: Andrew Ash
Cc: Sunny Khatri; Lisonbee, Todd; user@spark.incubator.apache.org
Subject: Re: Shuffle files

My observation is opposite. When my job runs under default spark.shuffle.manager, I don't see this exception. However, when it runs with SORT based, I start seeing this error? How would that be possible?

I am running my job in YARN, and I noticed that the YARN process limits (cat /proc/$PID/limits) are not consistent with system wide limits (shown by limit -a), I don't know how that happened. Is there a way to let Spark driver to propagate this setting (limit -n <number>) to spark executors before startup?




On Tue, Oct 7, 2014 at 11:53 PM, Andrew Ash <an...@andrewash.com>> wrote:
You will need to restart your Mesos workers to pick up the new limits as well.

On Tue, Oct 7, 2014 at 4:02 PM, Sunny Khatri <su...@gmail.com>> wrote:
@SK:
Make sure ulimit has taken effect as Todd mentioned. You can verify via ulimit -a. Also make sure you have proper kernel parameters set in /etc/sysctl.conf (MacOSX)

On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd <to...@intel.com>> wrote:

Are you sure the new ulimit has taken effect?

How many cores are you using?  How many reducers?

        "In general if a node in your cluster has C assigned cores and you run
        a job with X reducers then Spark will open C*X files in parallel and
        start writing. Shuffle consolidation will help decrease the total
        number of files created but the number of file handles open at any
        time doesn't change so it won't help the ulimit problem."

Quoted from Patrick at:
http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html

Thanks,

Todd

-----Original Message-----
From: SK [mailto:skrishna.id@gmail.com<ma...@gmail.com>]
Sent: Tuesday, October 7, 2014 2:12 PM
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: Re: Shuffle files

- We set ulimit to 500000. But I still get the same "too many open files"
warning.

- I tried setting consolidateFiles to True, but that did not help either.

I am using a Mesos cluster.   Does Mesos have any limit on the number of
open files?

thanks






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>





--
Chen Song

Re: Shuffle files

Posted by Chen Song <ch...@gmail.com>.

My observation is opposite. When my job runs under default
spark.shuffle.manager, I don't see this exception. However, when it runs
with SORT based, I start seeing this error? How would that be possible?

I am running my job in YARN, and I noticed that the YARN process limits
(cat /proc/$PID/limits) are not consistent with system wide limits (shown
by limit -a), I don't know how that happened. Is there a way to let Spark
driver to propagate this setting (limit -n <number>) to spark executors
before startup?




On Tue, Oct 7, 2014 at 11:53 PM, Andrew Ash <an...@andrewash.com> wrote:

> You will need to restart your Mesos workers to pick up the new limits as
> well.
>
> On Tue, Oct 7, 2014 at 4:02 PM, Sunny Khatri <su...@gmail.com> wrote:
>
>> @SK:
>> Make sure ulimit has taken effect as Todd mentioned. You can verify via
>> ulimit -a. Also make sure you have proper kernel parameters set in
>> /etc/sysctl.conf (MacOSX)
>>
>> On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd <to...@intel.com>
>> wrote:
>>
>>>
>>> Are you sure the new ulimit has taken effect?
>>>
>>> How many cores are you using?  How many reducers?
>>>
>>>         "In general if a node in your cluster has C assigned cores and
>>> you run
>>>         a job with X reducers then Spark will open C*X files in parallel
>>> and
>>>         start writing. Shuffle consolidation will help decrease the total
>>>         number of files created but the number of file handles open at
>>> any
>>>         time doesn't change so it won't help the ulimit problem."
>>>
>>> Quoted from Patrick at:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
>>>
>>> Thanks,
>>>
>>> Todd
>>>
>>> -----Original Message-----
>>> From: SK [mailto:skrishna.id@gmail.com]
>>> Sent: Tuesday, October 7, 2014 2:12 PM
>>> To: user@spark.incubator.apache.org
>>> Subject: Re: Shuffle files
>>>
>>> - We set ulimit to 500000. But I still get the same "too many open files"
>>> warning.
>>>
>>> - I tried setting consolidateFiles to True, but that did not help either.
>>>
>>> I am using a Mesos cluster.   Does Mesos have any limit on the number of
>>> open files?
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>


-- 
Chen Song

Re: Shuffle files

Posted by Andrew Ash <an...@andrewash.com>.

You will need to restart your Mesos workers to pick up the new limits as
well.

On Tue, Oct 7, 2014 at 4:02 PM, Sunny Khatri <su...@gmail.com> wrote:

> @SK:
> Make sure ulimit has taken effect as Todd mentioned. You can verify via
> ulimit -a. Also make sure you have proper kernel parameters set in
> /etc/sysctl.conf (MacOSX)
>
> On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd <to...@intel.com>
> wrote:
>
>>
>> Are you sure the new ulimit has taken effect?
>>
>> How many cores are you using?  How many reducers?
>>
>>         "In general if a node in your cluster has C assigned cores and
>> you run
>>         a job with X reducers then Spark will open C*X files in parallel
>> and
>>         start writing. Shuffle consolidation will help decrease the total
>>         number of files created but the number of file handles open at any
>>         time doesn't change so it won't help the ulimit problem."
>>
>> Quoted from Patrick at:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
>>
>> Thanks,
>>
>> Todd
>>
>> -----Original Message-----
>> From: SK [mailto:skrishna.id@gmail.com]
>> Sent: Tuesday, October 7, 2014 2:12 PM
>> To: user@spark.incubator.apache.org
>> Subject: Re: Shuffle files
>>
>> - We set ulimit to 500000. But I still get the same "too many open files"
>> warning.
>>
>> - I tried setting consolidateFiles to True, but that did not help either.
>>
>> I am using a Mesos cluster.   Does Mesos have any limit on the number of
>> open files?
>>
>> thanks
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Shuffle files

Posted by Sunny Khatri <su...@gmail.com>.

@SK:
Make sure ulimit has taken effect as Todd mentioned. You can verify via
ulimit -a. Also make sure you have proper kernel parameters set in
/etc/sysctl.conf (MacOSX)

On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd <to...@intel.com>
wrote:

>
> Are you sure the new ulimit has taken effect?
>
> How many cores are you using?  How many reducers?
>
>         "In general if a node in your cluster has C assigned cores and you
> run
>         a job with X reducers then Spark will open C*X files in parallel
> and
>         start writing. Shuffle consolidation will help decrease the total
>         number of files created but the number of file handles open at any
>         time doesn't change so it won't help the ulimit problem."
>
> Quoted from Patrick at:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
>
> Thanks,
>
> Todd
>
> -----Original Message-----
> From: SK [mailto:skrishna.id@gmail.com]
> Sent: Tuesday, October 7, 2014 2:12 PM
> To: user@spark.incubator.apache.org
> Subject: Re: Shuffle files
>
> - We set ulimit to 500000. But I still get the same "too many open files"
> warning.
>
> - I tried setting consolidateFiles to True, but that did not help either.
>
> I am using a Mesos cluster.   Does Mesos have any limit on the number of
> open files?
>
> thanks
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

RE: Shuffle files

Posted by "Lisonbee, Todd" <to...@intel.com>.

Are you sure the new ulimit has taken effect?
 
How many cores are you using?  How many reducers?

	"In general if a node in your cluster has C assigned cores and you run 
	a job with X reducers then Spark will open C*X files in parallel and 
	start writing. Shuffle consolidation will help decrease the total 
	number of files created but the number of file handles open at any 
	time doesn't change so it won't help the ulimit problem."

Quoted from Patrick at:
http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html

Thanks,

Todd

-----Original Message-----
From: SK [mailto:skrishna.id@gmail.com] 
Sent: Tuesday, October 7, 2014 2:12 PM
To: user@spark.incubator.apache.org
Subject: Re: Shuffle files

- We set ulimit to 500000. But I still get the same "too many open files"
warning. 

- I tried setting consolidateFiles to True, but that did not help either.

I am using a Mesos cluster.   Does Mesos have any limit on the number of
open files?

thanks






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle files

Posted by SK <sk...@gmail.com>.

- We set ulimit to 500000. But I still get the same "too many open files"
warning. 

- I tried setting consolidateFiles to True, but that did not help either.

I am using a Mesos cluster.   Does Mesos have any limit on the number of
open files?

thanks






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle files

Posted by Andrew Ash <an...@andrewash.com>.

Hi SK,

For the problem with lots of shuffle files and the "too many open files"
exception there are a couple options:

1. The linux kernel has a limit on the number of open files at once.  This
is set with ulimit -n, and can be set permanently in /etc/sysctl.conf or
/etc/sysctl.d/.  Try increasing this to a large value, at the bare minimum
the square of your partition count.
2. Try using shuffle consolidation -- spark.shuffle.consolidateFiles=true This
option writes fewer files to disk so shouldn't hit limits nearly as much
3. Try using the sort-based shuffle by setting spark.shuffle.manager=SORT.
You should likely hold off on this until
https://issues.apache.org/jira/browse/SPARK-3032 is fixed, hopefully in
1.1.1

Hope that helps!
Andrew

On Thu, Sep 25, 2014 at 4:20 PM, SK <sk...@gmail.com> wrote:

> Hi,
>
> I am using Spark 1.1.0 on a cluster. My job takes as input 30 files in a
> directory (I am using  sc.textfile("dir/*") ) to read in the files.  I am
> getting the following warning:
>
> WARN TaskSetManager: Lost task 99.0 in stage 1.0 (TID 99,
> mesos12-dev.sccps.net): java.io.FileNotFoundException:
> /tmp/spark-local-20140925215712-0319/12/shuffle_0_99_93138 (Too many open
> files)
>
> basically I think a lot of shuffle files are being created.
>
> 1) The tasks eventually fail and the job just hangs (after taking very
> long,
> more than an hour).  If I read these 30 files in a for loop, the same job
> completes in a few minutes. However, I need to specify the files names,
> which is not convenient. I am assuming that sc.textfile("dir/*") creates a
> large RDD for all the 30 files. Is there a way to make the operation on
> this
> large RDD efficient so as to avoid creating too many shuffle files?
>
>
> 2) Also, I am finding that all the shuffle files for my other completed
> jobs
> are not being automatically deleted even after days. I thought that
> sc.stop() clears the intermediate files.  Is there some way to
> programmatically delete these temp shuffle files upon job completion?
>
>
> thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>