You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Eric Zhang <ez...@yahoo-inc.com> on 2008/09/27 00:09:20 UTC

too many open files error

Hi,
I encountered following FileNotFoundException resulting from "too many 
open files" error when i tried to run a job.  The job had been run for 
several times before without problem.  I am confused by the exception 
because my code closes all the files and even it doesn't,  the job only 
have only 10-20 small input/output files.   The limit on the open file 
on my box is 1024.    Besides, the error seemed to happen even before 
the task was executed, I am using 0.17 version.   I'd appreciate if 
somebody can shed some light on this issue.  BTW, the job ran ok after i 
restarted hadoop.    Yes, the hadoop-site.xml did exist in that directory. 

java.lang.RuntimeException: java.io.FileNotFoundException: 
/home/y/conf/hadoop/hadoop-site.xml (Too many open files)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:901)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:804)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:772)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:272)
        at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:414)
        at 
org.apache.hadoop.mapred.JobConf.getKeepFailedTaskFiles(JobConf.java:306)
        at 
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.setJobConf(TaskTracker.java:1487)
        at 
org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:722)
        at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:716)
        at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
Caused by: java.io.FileNotFoundException: 
/home/y/conf/hadoop/hadoop-site.xml (Too many open files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at 
sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
        at 
sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
        at 
com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
        at 
com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:186)
        at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
        at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
        at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:225)
        at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:832)
        ... 12 more


Sometimes it gave me this message:
java.io.IOException: Cannot run program "bash": java.io.IOException: 
error=24, Too many open files
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
        at org.apache.hadoop.util.Shell.run(Shell.java:134)
        at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:133)
Caused by: java.io.IOException: java.io.IOException: error=24, Too many 
open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
        ... 6 more



-- 
Eric Zhang
408-349-2466
Vespa Content team


Re: too many open files error

Posted by Johannes Zillmann <jz...@101tec.com>.
Having a similar problem.
After upgrading from hadoop 0.16.4 to 0.17.2.1 we're facing  
"java.io.IOException: java.io.IOException: Too many open files" fater  
a few jobs.
f.e.:
Error message from task (reduce) tip_200810020918_0014_r_000031 Error  
initializing task_200810020918_0014_r_000031_1:
java.io.IOException: java.io.IOException: Too many open files
         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
         at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
         at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
         at org.apache.hadoop.util.Shell.run(Shell.java:134)
         at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
         at org.apache.hadoop.fs.LocalDirAllocator 
$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
         at  
org 
.apache 
.hadoop 
.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
         at  
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:646)
         at  
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1271)
         at  
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:912)
         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java: 
1307)
         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java: 
2266)


Once a job failed with because of these exception, all subsequent jobs  
failing too for the same reason.
After cluster-restart it works fine for a few jobs again....

Johannes

On Sep 27, 2008, at 1:59 AM, Karl Anderson wrote:

>
> On 26-Sep-08, at 3:09 PM, Eric Zhang wrote:
>
>> Hi,
>> I encountered following FileNotFoundException resulting from "too  
>> many open files" error when i tried to run a job.  The job had been  
>> run for several times before without problem.  I am confused by the  
>> exception because my code closes all the files and even it  
>> doesn't,  the job only have only 10-20 small input/output files.    
>> The limit on the open file on my box is 1024.    Besides, the error  
>> seemed to happen even before the task was executed, I am using 0.17  
>> version.   I'd appreciate if somebody can shed some light on this  
>> issue.  BTW, the job ran ok after i restarted hadoop.    Yes, the  
>> hadoop-site.xml did exist in that directory.
>
> I had the same errors, including the bash one.  Running one  
> particular job would cause all subsequent jobs of any kind to fail,  
> even after all running jobs had completed or failed out.  This was  
> confusing because the failing jobs themselves often had no  
> relationship to the cause, they were just in a bad environment.
>
> If you can't successfully run a dummy job (with the identity mapper  
> and reducer, or a streaming job with cat) once you start getting  
> failures, then you are probably in the same situation.
>
> I believe that the problem was caused by increasing the timeout, but  
> I never pinned it down enough to submit a Jira issue.  It might have  
> been the XML reader or something else.  I was using streaming,  
> hadoop-ec2, and either 0.17.0 or 0.18.0.  It would happen just as  
> rapidly after I made an ec2 image with a higher open file limit.
>
> Eventually I figured it out by running each job in my pipeline 5 or  
> so times before trying the next one, which let me see which job was  
> causing the problem (because it would eventually fail itself, rather  
> than hosing a later job).
>
> Karl Anderson
> kra@monkey.org
> http://monkey.org/~kra
>
>
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com


Re: too many open files error

Posted by Karl Anderson <ka...@somethingsimpler.com>.
On 26-Sep-08, at 3:09 PM, Eric Zhang wrote:

> Hi,
> I encountered following FileNotFoundException resulting from "too  
> many open files" error when i tried to run a job.  The job had been  
> run for several times before without problem.  I am confused by the  
> exception because my code closes all the files and even it doesn't,   
> the job only have only 10-20 small input/output files.   The limit  
> on the open file on my box is 1024.    Besides, the error seemed to  
> happen even before the task was executed, I am using 0.17 version.    
> I'd appreciate if somebody can shed some light on this issue.  BTW,  
> the job ran ok after i restarted hadoop.    Yes, the hadoop-site.xml  
> did exist in that directory.

I had the same errors, including the bash one.  Running one particular  
job would cause all subsequent jobs of any kind to fail, even after  
all running jobs had completed or failed out.  This was confusing  
because the failing jobs themselves often had no relationship to the  
cause, they were just in a bad environment.

If you can't successfully run a dummy job (with the identity mapper  
and reducer, or a streaming job with cat) once you start getting  
failures, then you are probably in the same situation.

I believe that the problem was caused by increasing the timeout, but I  
never pinned it down enough to submit a Jira issue.  It might have  
been the XML reader or something else.  I was using streaming, hadoop- 
ec2, and either 0.17.0 or 0.18.0.  It would happen just as rapidly  
after I made an ec2 image with a higher open file limit.

Eventually I figured it out by running each job in my pipeline 5 or so  
times before trying the next one, which let me see which job was  
causing the problem (because it would eventually fail itself, rather  
than hosing a later job).

Karl Anderson
kra@monkey.org
http://monkey.org/~kra