You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yuri Pradkin <yu...@isi.edu> on 2008/11/05 00:45:39 UTC

too many open files? Isn't 4K enough???

Hi,

I'm running current snapshot (-r709609), doing a simple word count using python over 
streaming.  I'm have a relatively moderate setup of 17 nodes.

I'm getting this exception:

java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index 
(Too many open files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.(FileInputStream.java:137)
        at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
        at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
        at org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

I see that AFTER I've reconfigured the max allowable open files to 4096!

When I monitor the number of open files on a box running hadoop I see the 
number fluctuating around 900 during the map phase.  Then I see it going up through 
the roof during sorting/shuffling phase.  I see a lot of open files named like 
"/users/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_1/outp
ut/spill2188.out"

What a poor user to do about this?  Reconfigure hadoop to allow 32K open files as somebody suggested
on an hbase forum that I googled up?  Or some other ridiculous number?  If yes, what should it be?
Or is it my config problem and there is a way to control this?

Do I need to file a jira about this or is this a problem that people are aware of?  Because right
now it looks to me that Hadoop scalability is broken.  No way 4K descriptors should be insufficient.

Any feedback will be appreciated.

Thanks,

  -Yuri

P.S.  BTW, someone on this list has suggested before that after restarting hadoop a similar sounding 
problem goes away for a while.  It did not work for me.

Re: too many open files? Isn't 4K enough???

Posted by Jason Venner <ja...@attributor.com>.
we just went from 8k to 64k after some problems,

Karl Anderson wrote:
>
> On 4-Nov-08, at 3:45 PM, Yuri Pradkin wrote:
>
>> Hi,
>>
>> I'm running current snapshot (-r709609), doing a simple word count 
>> using python over
>> streaming.  I'm have a relatively moderate setup of 17 nodes.
>>
>> I'm getting this exception:
>>
>> java.io.FileNotFoundException: 
>> /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index 
>>
>> (Too many open files)
>
>> [...]
>
>> I see that AFTER I've reconfigured the max allowable open files to 4096!
>
> I am running into a similar issue.  It seems to be affected by the
> number of simultaneous tasks.
>
> Here's one example.  I have a mapper-only streaming job with 512 tasks
> and a total combined output of 93 megs for a successful run.  The tasks
> don't interact with HDFS other than what streaming sets up with stdin
> and stdout.
>
> On a 32 slave cluster with 8 max mappers per node, I see 27 failed
> task attempts and 5 black-listed task trackers by the time the job
> fails, although this will change from run to run.  The failures are
> all file-handle related: "can't run mapper, no such file or directory
> <mapper source>" , "IOException: can't run bash, too many open files",
> and many unknown failures because the admin interface couldn't find
> the log files.  This happens with both hadoop 17.2.1 and 18.1.
>
> On a 128 slave cluster with 2 max mappers per cluster, same job, same
> number of tasks, I get no failures.  This is running hadoop 18.1.
>
> I'm running on EC2 clusters created with the hadoop-ec2 tools.  The
> 2-mapper nodes are small EC2 instances, and the 8-mapper nodes are
> xlarge EC2 instances.  I upped the nofile limit in
> /etc/security/limits.conf to 131072 for all users on all of my EC2
> images, but it didn't help.  I'm never running more than one job at
> once.
>
> The hadoop-ec2 tools launch clusters with one master which runs the
> namenode and jobtracker, and slaves each running a datanode and
> tasktracker.  It seems that running more than 2 mappers per node isn't
> feasable with this setup, which surprises me because the cluster setup
> suggestions I've read advise using far more.  Would changing the ratio
> of datanodes to tasktrackers have an effect?  Is this done in
> practice?
>
> Are you running more than 2 mappers per node?  Do you see any
> differences in the number of failed tasks when you change the number
> of tasks over the same input set?
>
> In general, I've had to do a lot of fine-tuning of my job paramaters
> to balance memory, file handles, and task timeouts.  I'm finding that
> a setup that works with one input set breaks when I try it on an input
> set which is twice the size.  My productivity is not high while I'm
> figuring this out, and I wonder why I don't hear about this more.
> Perhaps this is a streaming issue, and streaming isn't being used very
> much?
>
>
> Karl Anderson
> kra@monkey.org
> http://monkey.org/~kra
>
>
>

Re: too many open files? Isn't 4K enough???

Posted by Yuri Pradkin <yu...@isi.edu>.
We created a Jira for this as well as provided a patch.  Please see

http://issues.apache.org/jira/browse/HADOOP-4614

I hope it'll make it into svn soon (it's been kind of slow lately).

> Are you able to create a reproducible setup for this?  I haven't been
> able to.

Yes we did see consistent behavior.

> I'm only able to cause this to happen after a few runs of my own jobs
> first, which do various things and involve several Python libraries
> and downloading from S3.  After I've done this, it looks like any
> streaming job will have tasks die, but if I don't run my jobs first, I
> don't have a problem.  I also can't figure out what's consuming the
> open files; I'm not seeing the large lsof numbers that you were.

This may be a different problem that you're seeing then; in our case files 
would be closed after the job failure.  In any case you can try and see if 
the patch helps.  There was another "too many files" problem mentioned in the 
jira discussion thread, you might want to have a look at that too.

  -Yuri



Re: too many open files? Isn't 4K enough???

Posted by Karl Anderson <kr...@monkey.org>.
On 5-Nov-08, at 4:08 PM, Yuri Pradkin wrote:

> I suspect your total open FDs = (#mappers) x (FDs/map)
>
> In my case the second factor was ~5K; so if I ran 8 mappers total  
> might have
> been as high as 40K!  This is totally insane.
>
> Perhaps playing with GC modes might help...
>
>> In general, I've had to do a lot of fine-tuning of my job paramaters
>> to balance memory, file handles, and task timeouts.  I'm finding that
>> a setup that works with one input set breaks when I try it on an  
>> input
>> set which is twice the size.  My productivity is not high while I'm
>> figuring this out, and I wonder why I don't hear about this more.
>> Perhaps this is a streaming issue, and streaming isn't being used  
>> very
>> much?
>
> I doubt in my case this is a specific to streaming, although  
> streaming might
> exacerbate the problem by opening pipes, etc.  In my case the vast  
> majority
> of open files were to spills during sorting/shuffling which is not  
> restricted
> to streaming.
>
> This is a scalability issue and I'd really like to hear from  
> developers.
>
>  -Yuri
>
> P.S. It looks like we need to file a jira on this one...

Are you able to create a reproducible setup for this?  I haven't been  
able to.

I'm only able to cause this to happen after a few runs of my own jobs
first, which do various things and involve several Python libraries
and downloading from S3.  After I've done this, it looks like any
streaming job will have tasks die, but if I don't run my jobs first, I
don't have a problem.  I also can't figure out what's consuming the
open files; I'm not seeing the large lsof numbers that you were.

Obviously, the jobs I'm running beforehand are causing problems for  
later
jobs, but I haven't isolated what it is yet.


My cluster:
- hadoop 0.18.1
- cluster of 64 EC2 xlarge nodes, created with the hadoop-ec2 tools,  
edited
   to increase the max open files for root to 131072
- 8 max mappers or reducers per node

After I had some of my jobs die, I tested the cluster with this  
streaming job:

   hadoop jar /usr/local/hadoop-0.18.1/contrib/streaming/hadoop-0.18.1- 
streaming.jar -mapper cat -reducer cat  -input clusters_0 -output foo - 
jobconf mapred.output.compress=false -jobconf mapred.map.tasks=256 - 
jobconf mapred.reduce.tasks=256

Ran this manually a few times, not changing anything other than  
deleting the
output directory and never running more than one job at once.
While I ran it, I checked the number of open files on two of the nodes  
with:

   while true; do lsof | wc -l; sleep 1; done

Tasks died on each job due to "file not found" or "too many open  
files" errors.
Each job succeeded eventually.
The job never got more than 120 or so mappers or reducers at once  
(because
the scheduler couldn't catch up; a real job on this cluster setup was  
able
to get to 8 tasks per node).
1st run: 31 mappers die, 11 reducers die.
2nd run: 16/12
3rd run: 14/6
4th run: 14/6

Never saw more than 1600 or so open files on the two nodes I was  
checking.
Tasks were dying on these nodes during this time.

The input directory (clusters_0) contained one 797270 byte, 4096 line  
ASCII
file.

I terminated and re-created my cluster.  This time I just uploaded the  
input
file and ran the test jobs, I didn't run my jobs first.
I wasn't able to cause any errors.




Karl Anderson
kra@monkey.org
http://monkey.org/~kra




Re: too many open files? Isn't 4K enough???

Posted by Yuri Pradkin <yu...@isi.edu>.
On Wednesday 05 November 2008 15:27:34 Karl Anderson wrote:
> I am running into a similar issue.  It seems to be affected by the
> number of simultaneous tasks.

For me, while I generally allow up to 4 mappers per node, in this particular 
instance I had only one mapper reading from a single gzipped text file.

It did work when I ran the same code on a smaller file.

> I upped the nofile limit in
> /etc/security/limits.conf to 131072 for all users on all of my EC2
> images, but it didn't help.  I'm never running more than one job at
> once.

After I upped the max number of fds to 16K, the job ran to completion.
I was monitoring the number of open files/processes every 15s (by simply 
running ps and lsof | wc -l) and saw this:
#processes   open_files
...
13   646
13   648
12   2535
13   4860
12   4346
12   3842
12   3324
12   2823
12   2316
12   1852
12   1387
12   936
12   643
12   643
12   643
12   643
12   643
12   643
13   642
12   642
12   4775
12   2738
12   917
12   643
12   642
12   4992
12   4453
12   3943
12   3299
12   2855
12   2437
...

It looks like something (garbage collection?) cleans up fds periodically; the 
max I saw was 5007 (but again, there may have been more in between the 15s 
sampling interval).

> The hadoop-ec2 tools launch clusters with one master which runs the
> namenode and jobtracker, and slaves each running a datanode and
> tasktracker.  It seems that running more than 2 mappers per node isn't
> feasable with this setup, which surprises me because the cluster setup
> suggestions I've read advise using far more.  Would changing the ratio
> of datanodes to tasktrackers have an effect?  Is this done in
> practice?
>
> Are you running more than 2 mappers per node?  Do you see any
> differences in the number of failed tasks when you change the number
> of tasks over the same input set?

I'm running a hand-carved cluster on a bunch of heterogeneous systems, some 
crappy some good.

I suspect your total open FDs = (#mappers) x (FDs/map)

In my case the second factor was ~5K; so if I ran 8 mappers total might have 
been as high as 40K!  This is totally insane.

Perhaps playing with GC modes might help...

> In general, I've had to do a lot of fine-tuning of my job paramaters
> to balance memory, file handles, and task timeouts.  I'm finding that
> a setup that works with one input set breaks when I try it on an input
> set which is twice the size.  My productivity is not high while I'm
> figuring this out, and I wonder why I don't hear about this more.
> Perhaps this is a streaming issue, and streaming isn't being used very
> much?

I doubt in my case this is a specific to streaming, although streaming might 
exacerbate the problem by opening pipes, etc.  In my case the vast majority 
of open files were to spills during sorting/shuffling which is not restricted 
to streaming.

This is a scalability issue and I'd really like to hear from developers.

  -Yuri

P.S. It looks like we need to file a jira on this one...

Re: too many open files? Isn't 4K enough???

Posted by Karl Anderson <kr...@monkey.org>.
On 4-Nov-08, at 3:45 PM, Yuri Pradkin wrote:

> Hi,
>
> I'm running current snapshot (-r709609), doing a simple word count  
> using python over
> streaming.  I'm have a relatively moderate setup of 17 nodes.
>
> I'm getting this exception:
>
> java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/ 
> mapred/local/taskTracker/jobcache/job_200811041109_0003/ 
> attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
> (Too many open files)

> [...]

> I see that AFTER I've reconfigured the max allowable open files to  
> 4096!

I am running into a similar issue.  It seems to be affected by the
number of simultaneous tasks.

Here's one example.  I have a mapper-only streaming job with 512 tasks
and a total combined output of 93 megs for a successful run.  The tasks
don't interact with HDFS other than what streaming sets up with stdin
and stdout.

On a 32 slave cluster with 8 max mappers per node, I see 27 failed
task attempts and 5 black-listed task trackers by the time the job
fails, although this will change from run to run.  The failures are
all file-handle related: "can't run mapper, no such file or directory
<mapper source>" , "IOException: can't run bash, too many open files",
and many unknown failures because the admin interface couldn't find
the log files.  This happens with both hadoop 17.2.1 and 18.1.

On a 128 slave cluster with 2 max mappers per cluster, same job, same
number of tasks, I get no failures.  This is running hadoop 18.1.

I'm running on EC2 clusters created with the hadoop-ec2 tools.  The
2-mapper nodes are small EC2 instances, and the 8-mapper nodes are
xlarge EC2 instances.  I upped the nofile limit in
/etc/security/limits.conf to 131072 for all users on all of my EC2
images, but it didn't help.  I'm never running more than one job at
once.

The hadoop-ec2 tools launch clusters with one master which runs the
namenode and jobtracker, and slaves each running a datanode and
tasktracker.  It seems that running more than 2 mappers per node isn't
feasable with this setup, which surprises me because the cluster setup
suggestions I've read advise using far more.  Would changing the ratio
of datanodes to tasktrackers have an effect?  Is this done in
practice?

Are you running more than 2 mappers per node?  Do you see any
differences in the number of failed tasks when you change the number
of tasks over the same input set?

In general, I've had to do a lot of fine-tuning of my job paramaters
to balance memory, file handles, and task timeouts.  I'm finding that
a setup that works with one input set breaks when I try it on an input
set which is twice the size.  My productivity is not high while I'm
figuring this out, and I wonder why I don't hear about this more.
Perhaps this is a streaming issue, and streaming isn't being used very
much?


Karl Anderson
kra@monkey.org
http://monkey.org/~kra