You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Arv Mistry <ar...@kindsight.net> on 2008/07/30 22:08:49 UTC

File Descriptors not cleaned up

 
I've been trying to track down an issue where after some time I get "Too
many files open " i.e.
we're not cleaning up somewhere ...

I'm using "lsof -p <pid>" to track the open files and I find it's adding
3 file descriptors everytime I do a
fs.open(<file>) where fs is FileSystem and <file> is a Path object to a
gzipped file in hadoop. When I'm done I call
Close() on the FSDataInputStream that the open returned. But those 3
file descriptors never get cleaned up.

The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.

Is there some other cleanup method I should be calling, other than on
the InputStream after the open()?

I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1

Cheers Arv

RE: java.io.IOException: Cannot allocate memory

Posted by Xavier Stevens <Xa...@fox.com>.
Actually I found the problem was our operations people had enabled
overcommit on memory and restricted it to 50%...lol.  Telling them to
make it 100% fixed the problem.

-Xavier


-----Original Message-----
From: Taeho Kang [mailto:tkang1@gmail.com] 
Sent: Thursday, July 31, 2008 6:16 PM
To: core-user@hadoop.apache.org
Subject: Re: java.io.IOException: Cannot allocate memory

Are you using HadoopStreaming?

If so, then subprocess created by HadoopStreaming Job can take as much
memory as it needs. In that case, the system will run out of memory and
other processes (e.g. TaskTracker) may not be able to run properly or
even be killed by the OS.

/Taeho

On Fri, Aug 1, 2008 at 2:24 AM, Xavier Stevens
<Xa...@fox.com>wrote:

> We're currently running jobs on machines with around 16GB of memory 
> with
> 8 map tasks per machine.  We used to run with max heap set to 2048m.
> Since we started using version 0.17.1 we've been getting a lot of 
> these
> errors:
>
> task_200807251330_0042_m_000146_0: Caused by: java.io.IOException:
> java.io.IOException: Cannot allocate memory
> task_200807251330_0042_m_000146_0:      at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> task_200807251330_0042_m_000146_0:      at
> java.lang.ProcessImpl.start(ProcessImpl.java:65)
> task_200807251330_0042_m_000146_0:      at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.util.Shell.run(Shell.java:134)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat
> hF
> orWrite(LocalDirAllocator.java:296)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> lo
> cator.java:124)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputF
> il
> e.java:107)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.
> ja
> va:734)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.j
> av
> a:272)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTa
> sk
> .java:707)
>
> We haven't changed our heapsizes at all.  Has anyone else experienced 
> this?  Is there a way around it other than reducing heap sizes 
> excessively low?  I've tried all the way down to 1024m max heap and I 
> still get this error.
>
>
> -Xavier
>
>


Re: java.io.IOException: Cannot allocate memory

Posted by Taeho Kang <tk...@gmail.com>.
Are you using HadoopStreaming?

If so, then subprocess created by HadoopStreaming Job can take as much
memory as it needs. In that case, the system will run out of memory and
other processes (e.g. TaskTracker) may not be able to run properly or even
be killed by the OS.

/Taeho

On Fri, Aug 1, 2008 at 2:24 AM, Xavier Stevens <Xa...@fox.com>wrote:

> We're currently running jobs on machines with around 16GB of memory with
> 8 map tasks per machine.  We used to run with max heap set to 2048m.
> Since we started using version 0.17.1 we've been getting a lot of these
> errors:
>
> task_200807251330_0042_m_000146_0: Caused by: java.io.IOException:
> java.io.IOException: Cannot allocate memory
> task_200807251330_0042_m_000146_0:      at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> task_200807251330_0042_m_000146_0:      at
> java.lang.ProcessImpl.start(ProcessImpl.java:65)
> task_200807251330_0042_m_000146_0:      at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.util.Shell.run(Shell.java:134)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
> orWrite(LocalDirAllocator.java:296)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
> cator.java:124)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
> e.java:107)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.ja
> va:734)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.jav
> a:272)
> task_200807251330_0042_m_000146_0:      at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask
> .java:707)
>
> We haven't changed our heapsizes at all.  Has anyone else experienced
> this?  Is there a way around it other than reducing heap sizes
> excessively low?  I've tried all the way down to 1024m max heap and I
> still get this error.
>
>
> -Xavier
>
>

java.io.IOException: Cannot allocate memory

Posted by Xavier Stevens <Xa...@fox.com>.
We're currently running jobs on machines with around 16GB of memory with
8 map tasks per machine.  We used to run with max heap set to 2048m.
Since we started using version 0.17.1 we've been getting a lot of these
errors:

task_200807251330_0042_m_000146_0: Caused by: java.io.IOException:
java.io.IOException: Cannot allocate memory
task_200807251330_0042_m_000146_0:      at
java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
task_200807251330_0042_m_000146_0:      at
java.lang.ProcessImpl.start(ProcessImpl.java:65)
task_200807251330_0042_m_000146_0:      at
java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.util.Shell.run(Shell.java:134)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
orWrite(LocalDirAllocator.java:296)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
cator.java:124)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
e.java:107)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.ja
va:734)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1600(MapTask.jav
a:272)
task_200807251330_0042_m_000146_0:      at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask
.java:707)

We haven't changed our heapsizes at all.  Has anyone else experienced
this?  Is there a way around it other than reducing heap sizes
excessively low?  I've tried all the way down to 1024m max heap and I
still get this error.


-Xavier


Re: Running mapred job from remote machine to a pseudo-distributed hadoop

Posted by James Moore <ja...@gmail.com>.
On Fri, Aug 1, 2008 at 7:13 AM, Arv Mistry <ar...@kindsight.net> wrote:
>
> I'll try again, can anyone tell me should it be possible to run hadoop
> in a pseudo-distributed mode (i.e. everything on one machine)

That's not quite what pseudo-distributed mode is.  You can run regular
hadoop jobs on a cluster that consists of one machine, just change the
hostname in your hadoop-site.xml file to the real hostname of your
machine.  If you've got "localhost" in the conf, Hadoop is going to
use LocalJobRunner, and that may be related to your issue.

I may be wrong on this - I haven't spent much time looking at that
code.  Take a look at
./src/java/org/apache/hadoop/mapred/JobClient.java for what gets
kicked off (for 0.17.1 at least).

-- 
James Moore | james@restphone.com
Ruby and Ruby on Rails consulting
blog.restphone.com

Re: Running mapred job from remote machine to a pseudo-distributed hadoop

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Arv Mistry wrote:
> I'll try again, can anyone tell me should it be possible to run hadoop
> in a pseudo-distributed mode (i.e. everything on one machine) and then
> submit a mapred job using the ToolRunner from another machine on that
> hadoop configuration?
>
> Cheers Arv
>  
>   
Yes. It is possible to do. You can start hadoop cluster on single node.
Documentation available at 
http://hadoop.apache.org/core/docs/current/quickstart.html#PseudoDistributed
Once the cluster is up, you can submit jobs from any client, but the 
client configuration should be aware of Namenode and JobTracker nodes. 
You can use the generic options *-fs* and *-jt* on commandline for the same.

Thanks
Amareshwari
>
> -----Original Message-----
> From: Arv Mistry [mailto:arv@kindsight.net] 
> Sent: Thursday, July 31, 2008 2:32 PM
> To: core-user@hadoop.apache.org
> Subject: Running mapred job from remote machine to a pseudo-distributed
> hadoop
>
>  
> I have hadoop setup in a pseudo-distributed mode i.e. everything on one
> machine, And I'm trying to submit a hadoop mapred job from another
> machine to that hadoop setup.
>
> At the point that I run the mapred job I get the following error. Any
> ideas as to what I'm doing wrong?
> Is this possible in a pseudo-distributed mode?
>
> Cheers Arv
>
>  INFO   | jvm 1    | 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR
> [HadoopJobTool] java.io.IOException:
> /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
> file or directory
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> java.lang.reflect.Method.invoke(Method.java:597)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |
> INFO   | jvm 1    | 2008/07/31 14:01:00 |
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> /tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
> file or directory
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> java.lang.reflect.Method.invoke(Method.java:597)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.Client.call(Client.java:557)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> $Proxy5.submitJob(Unknown Source)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> java.lang.reflect.Method.invoke(Method.java:597)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo
> cationHandler.java:82)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation
> Handler.java:59)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> $Proxy5.submitJob(Unknown Source)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.run(Hadoop
> JobTool.java:129)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.launchJob(
> HadoopJobTool.java:142)
> INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
> com.rialto.profiler.profiler.clickstream.RawStreamGenerator.run(RawStrea
> mGenerator.java:138)
>   


RE: Running mapred job from remote machine to a pseudo-distributed hadoop

Posted by Arv Mistry <ar...@kindsight.net>.
I'll try again, can anyone tell me should it be possible to run hadoop
in a pseudo-distributed mode (i.e. everything on one machine) and then
submit a mapred job using the ToolRunner from another machine on that
hadoop configuration?

Cheers Arv
 


-----Original Message-----
From: Arv Mistry [mailto:arv@kindsight.net] 
Sent: Thursday, July 31, 2008 2:32 PM
To: core-user@hadoop.apache.org
Subject: Running mapred job from remote machine to a pseudo-distributed
hadoop

 
I have hadoop setup in a pseudo-distributed mode i.e. everything on one
machine, And I'm trying to submit a hadoop mapred job from another
machine to that hadoop setup.

At the point that I run the mapred job I get the following error. Any
ideas as to what I'm doing wrong?
Is this possible in a pseudo-distributed mode?

Cheers Arv

 INFO   | jvm 1    | 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR
[HadoopJobTool] java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1    | 2008/07/31 14:01:00 |
INFO   | jvm 1    | 2008/07/31 14:01:00 |
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1    | 2008/07/31 14:01:00 |
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Client.call(Client.java:557)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo
cationHandler.java:82)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation
Handler.java:59)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.run(Hadoop
JobTool.java:129)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.launchJob(
HadoopJobTool.java:142)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.RawStreamGenerator.run(RawStrea
mGenerator.java:138)

Running mapred job from remote machine to a pseudo-distributed hadoop

Posted by Arv Mistry <ar...@kindsight.net>.
 
I have hadoop setup in a pseudo-distributed mode i.e. everything on one
machine,
And I'm trying to submit a hadoop mapred job from another machine to
that hadoop setup.

At the point that I run the mapred job I get the following error. Any
ideas as to what I'm doing wrong?
Is this possible in a pseudo-distributed mode?

Cheers Arv

 INFO   | jvm 1    | 2008/07/31 14:01:00 | 2008-07-31 14:01:00,547 ERROR
[HadoopJobTool] java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1    | 2008/07/31 14:01:00 |
INFO   | jvm 1    | 2008/07/31 14:01:00 |
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/tmp/hadoop-root/mapred/system/job_200807310809_0006/job.xml: No such
file or directory
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:175)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
INFO   | jvm 1    | 2008/07/31 14:01:00 |
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.Client.call(Client.java:557)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
java.lang.reflect.Method.invoke(Method.java:597)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvo
cationHandler.java:82)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocation
Handler.java:59)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
$Proxy5.submitJob(Unknown Source)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.run(Hadoop
JobTool.java:129)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.launchJob(
HadoopJobTool.java:142)
INFO   | jvm 1    | 2008/07/31 14:01:00 |       at
com.rialto.profiler.profiler.clickstream.RawStreamGenerator.run(RawStrea
mGenerator.java:138)

Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Arv Mistry wrote:
>  
> Raghu,
> 
> This is a real scenario for our application which nightly does basically
> what the test stub is doing and what happens there now is after several
> weeks the system stops processing requests with an exception "Too many
> files open".

Before providing a fix, I just want to make sure if Hadoop's i/o is 
leaking fds. Could you confirm how many fds you see that are pipes or 
eventpoll when you see the exception? Do you see something like 600 or 
so pipe fds and 300 or so 'eventpoll' fds? If yes, then we need to look 
into it more, possibly improving test program to show that.

Thanks,
Raghu.

> The default ulimit is 1024 files (I think this is per process) so once
> my process exceeds this it will throw an exception and I am forced to
> restart.
> 
> Why does this not happen when I'm in a single-threaded mode?
> 
> Will you be able to provide a fix?
> 
> Cheers Arv

Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Jason Venner wrote:
> We have just realized one reason for the '/no live node contains block/' 
> error from /DFSClient/ is an indication that the /DFSClient/ was unable 
> to open a connection due to insufficient available file descriptors.
> 
> FsShell is particularly bad about consuming descriptors and leaving the 
> containing objects for the Garbage Collector to reclaim the descriptors.
> 
> We will submit a patch in a few days.

please do.

We know more since I last replied on July 31st. Hadoop itself does not 
have any finalizers that depend on GC to close fds. If GC is affecting 
number of fds, you are likely a victim of HADOOP-4346.

thanks,
Raghu.

Re: File Descriptors not cleaned up

Posted by Jason Venner <ja...@attributor.com>.
We have just realized one reason for the '/no live node contains block/' 
error from /DFSClient/ is an indication that the /DFSClient/ was unable 
to open a connection due to insufficient available file descriptors.

FsShell is particularly bad about consuming descriptors and leaving the 
containing objects for the Garbage Collector to reclaim the descriptors.

We will submit a patch in a few days.

Raghu Angadi wrote:
> Arv Mistry wrote:
>>  
>> Raghu,
>>
>> In the test program I see 3 fd's used when the fs.open() is called. Two
>> of these are pipe and 1 eventpoll.
>> These 3 are never cleaned up and stay around. I track this by running it
>> in the debug mode and put a break point and use
>> Lsof -p <pid> to see the fd's. I do a diff of the output before the open
>> and after the open.
>
> It important to know _exactly_ where "before" and "after" break points 
> are in your example to answer accurately. In your example, I don't see 
> why extra thread matters. May be if you give me a runnable or close to 
> runnable example, I will know.
>
> But that does *not* mean there is an fd leak.
>
> For e.g., extend your example  like this : After the first thread 
> exists, repeat the same thing again. Do you see 6 more extra fds? You 
> wouldn't, or you shouldn't rather.
>
> If you want to further explore.. now sleep for 15 seconds in the main 
> thread after the second thread exits. Then invoke TestThread.run() in 
> the main thread (instead of using a seperate thread). Check lsof after 
> run() returns. What do you see?
>
> If you do these experiments and still think there is a leak, please 
> file a Jira.. file a jira even if you don't do the experiments :).
>
> IMHO, I still don't see any suspicious behavior.. may be 'lsof' when 
> your app sees 'too many open files' exception will clear this up us.
>
> Hope this helps.
> Raghu.
>
>> What I don't understand is why this doesn't get cleaned up when done in
>> a separate thread but does when its done in a single thread.
>>
>> This is a problem in the real system because I run out of fd's and am no
>> longer able to open any more files after a few weeks.
>> This forces me to do a system restart to flush things out.
>>
>> Cheers Arv
>>
>

Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Arv Mistry wrote:
>  
> Raghu,
> 
> In the test program I see 3 fd's used when the fs.open() is called. Two
> of these are pipe and 1 eventpoll.
> These 3 are never cleaned up and stay around. I track this by running it
> in the debug mode and put a break point and use
> Lsof -p <pid> to see the fd's. I do a diff of the output before the open
> and after the open.

It important to know _exactly_ where "before" and "after" break points 
are in your example to answer accurately. In your example, I don't see 
why extra thread matters. May be if you give me a runnable or close to 
runnable example, I will know.

But that does *not* mean there is an fd leak.

For e.g., extend your example  like this : After the first thread 
exists, repeat the same thing again. Do you see 6 more extra fds? You 
wouldn't, or you shouldn't rather.

If you want to further explore.. now sleep for 15 seconds in the main 
thread after the second thread exits. Then invoke TestThread.run() in 
the main thread (instead of using a seperate thread). Check lsof after 
run() returns. What do you see?

If you do these experiments and still think there is a leak, please file 
a Jira.. file a jira even if you don't do the experiments :).

IMHO, I still don't see any suspicious behavior.. may be 'lsof' when 
your app sees 'too many open files' exception will clear this up us.

Hope this helps.
Raghu.

> What I don't understand is why this doesn't get cleaned up when done in
> a separate thread but does when its done in a single thread.
> 
> This is a problem in the real system because I run out of fd's and am no
> longer able to open any more files after a few weeks.
> This forces me to do a system restart to flush things out.
> 
> Cheers Arv
> 


RE: File Descriptors not cleaned up

Posted by Arv Mistry <ar...@kindsight.net>.
 
Raghu,

In the test program I see 3 fd's used when the fs.open() is called. Two
of these are pipe and 1 eventpoll.
These 3 are never cleaned up and stay around. I track this by running it
in the debug mode and put a break point and use
Lsof -p <pid> to see the fd's. I do a diff of the output before the open
and after the open.

What I don't understand is why this doesn't get cleaned up when done in
a separate thread but does when its done in a single thread.

This is a problem in the real system because I run out of fd's and am no
longer able to open any more files after a few weeks.
This forces me to do a system restart to flush things out.

Cheers Arv

-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Thursday, July 31, 2008 2:33 PM
To: core-user@hadoop.apache.org
Subject: Re: File Descriptors not cleaned up


Also could you respond to the earlier questions regd your test program
(slightly corrected) :

"What do you see in your test program and how is it different from what
you expect? In addition, why is that a problem?"

Raghu.

Arv Mistry wrote:
>  
> Raghu,
> 
> This is a real scenario for our application which nightly does 
> basically what the test stub is doing and what happens there now is 
> after several weeks the system stops processing requests with an 
> exception "Too many files open".
> 
> The default ulimit is 1024 files (I think this is per process) so once

> my process exceeds this it will throw an exception and I am forced to 
> restart.
> 
> Why does this not happen when I'm in a single-threaded mode?
> 
> Will you be able to provide a fix?
> 
> Cheers Arv
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Thursday, July 31, 2008 1:10 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> 
> I might have missed something earlier and might be asking some thing 
> you already answered. Hope its ok :
> 
> What do expect to see in you test program and how is it different from

> what you expect? In addition, why is that a problem?
> 
> Hadoop implementation *does leave* the 3 fds you mentioned. We could 
> get rid of it.. but we don't (yet). We could have another clean up 
> thread or scheduled some thing with JVM timer.
> 
> I can write a test that leaves hundreds or thousands of fds open. We 
> just thought that is case is not practical.
> 
> It is still not clear to me if you are saying your code is leaving 
> lots and lots of fds open or if you are just wondering about the 3 you

> mentioned earlier.
> 
> Please check out a few comments starting at
> https://issues.apache.org/jira/browse/HADOOP-2346?focusedCommentId=125
> 66
> 250#action_12566250
> for more background on design choices made.
> 
> Raghu.
> 
> Arv Mistry wrote:
>> I guess the attachment got stripped, so here it is inline ...
>>
>>  public class TestFsHadoop {
>>
>> 	public static Configuration conf       = null;
>> 	public static FileSystem fs = null;
>>
>> 		
>> 	/**
>> 	 * @param args
>> 	 */
>> 	public static void main(String[] args) {
>> 		
>>     	try {
>>     		    		
>>     		initHdfsReader();
>>     		
>>     		// Just have one handle to the FS. This only closed when
> the
>> application is shutdown.
>>     		fs = getFileSystem();
>>     					
>>     		// Spawn a new thread to do the work
>>     		TestThread t = new TestFsHadoop().new TestThread();
>>     		t.start();			
>>     		
>>     	} catch (Exception e) {
>>     		e.printStackTrace();
>>     	}
>> 	}
>>
>> 	static public FileSystem getFileSystem () {
>> 		
>> 		FileSystem fs = null;
>> 		
>> 		try {
>> 			
>> 			fs = FileSystem.get(conf);
>> 			
>> 			fs.setWorkingDirectory(new
>> Path("/opt/hadoop/data"));
>> 			
>> 		} catch (Exception e) {
>> 			e.printStackTrace();
>> 		}
>> 		
>> 		System.out.println ("Returning FS " + fs);
>> 		
>> 		return (fs);
>> 	}	
>> 	
>> 	
>> 	/**
>> 	 * 
>> 	 */
>> 	static private void initHdfsReader () {
>> 	    
>> 	    try {	 
>> 	    	
>> 	    	conf = new Configuration();
>> 	    	
>> 	    	String url =
>> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-default.xml";
>>
>> 	    	Path path = new Path (url);
>> 	    	
>> 	    	conf.addResource(path);
>> 	    	
>> 	    	url =
>> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-site.xml";
>>
>> 	    	path = new Path (url);
>> 	    	
>> 	    	conf.addResource(path);
>> 				    
>> 		
>> 	    } catch (Exception e) {
>> 	    	e.printStackTrace();
>> 	    }
>> 	}
>> 	
>> 	public class TestThread extends Thread {
>> 		
>> 		/**
>> 		 *
>> 		 */
>> 		public void run() {
>> 					
>> 			try {  		
>>
>> 				// hard-coded to open a file for this
>> test harness				
>> 				 Path p = new
>> Path("/opt/hadoop/data/clickstream/cs_1_20080729_1_of_5.dat.gz");
>> 				 
>> 				FSDataInputStream fis = fs.open(p);
>>
>> 				
>> 				byte[] in = new byte[5];
>> 				
>> 				int bytesRead = 0;
>>
>> 				while (((bytesRead = fis.read(in)) !=
>> -1) && (bytesRead > 0)) {
>> 					// ... Do Stuff ... 
>> 				}
>> 				
>> 				fis.close();
>> 				
>> 			} catch (Exception e) {			
>> 				e.printStackTrace();
>> 			}
>> 		}
>> 	}
>> 	
>> }
>>
>> -----Original Message-----
>> From: Arv Mistry [mailto:arv@kindsight.net]
>> Sent: Thursday, July 31, 2008 9:30 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: File Descriptors not cleaned up
>>
>>  
>> I've simplified the code into a simple test harness with just hadoop 
>> (see attached file)
>>
>> I found that I can only reproduce this problem when I am doing the
>> fs.open() in a different thread. Even though in that same thread I am

>> doing a close().
>>
>> Cheers Arv
>>
>> -----Original Message-----
>> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
>> Sent: Wednesday, July 30, 2008 7:36 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: File Descriptors not cleaned up
>>
>> Arv Mistry wrote:
>>>  
>>> Thanks for responding Raghu,
>>>
>>> This code is run every hour, where I open a file ( a different file 
>>> each
>>> time) and write it across the network to another location. So if 
>>> everytime it adds an additional 3 fd's then after some time I'm 
>>> going
> 
>>> to run out of fd's
>> It should not add 3 fds every time. If you do see a practical case 
>> where you trace running out of fds to these three fds, please let us
> know.
>> Raghu.
>>

Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Also could you respond to the earlier questions regd your test program 
(slightly corrected) :

"What do you see in your test program and how is it different from what 
you expect? In addition, why is that a problem?"

Raghu.

Arv Mistry wrote:
>  
> Raghu,
> 
> This is a real scenario for our application which nightly does basically
> what the test stub is doing and what happens there now is after several
> weeks the system stops processing requests with an exception "Too many
> files open".
> 
> The default ulimit is 1024 files (I think this is per process) so once
> my process exceeds this it will throw an exception and I am forced to
> restart.
> 
> Why does this not happen when I'm in a single-threaded mode?
> 
> Will you be able to provide a fix?
> 
> Cheers Arv
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
> Sent: Thursday, July 31, 2008 1:10 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> 
> I might have missed something earlier and might be asking some thing you
> already answered. Hope its ok :
> 
> What do expect to see in you test program and how is it different from
> what you expect? In addition, why is that a problem?
> 
> Hadoop implementation *does leave* the 3 fds you mentioned. We could get
> rid of it.. but we don't (yet). We could have another clean up thread or
> scheduled some thing with JVM timer.
> 
> I can write a test that leaves hundreds or thousands of fds open. We
> just thought that is case is not practical.
> 
> It is still not clear to me if you are saying your code is leaving lots
> and lots of fds open or if you are just wondering about the 3 you
> mentioned earlier.
> 
> Please check out a few comments starting at
> https://issues.apache.org/jira/browse/HADOOP-2346?focusedCommentId=12566
> 250#action_12566250
> for more background on design choices made.
> 
> Raghu.
> 
> Arv Mistry wrote:
>> I guess the attachment got stripped, so here it is inline ...
>>
>>  public class TestFsHadoop {
>>
>> 	public static Configuration conf       = null;
>> 	public static FileSystem fs = null;
>>
>> 		
>> 	/**
>> 	 * @param args
>> 	 */
>> 	public static void main(String[] args) {
>> 		
>>     	try {
>>     		    		
>>     		initHdfsReader();
>>     		
>>     		// Just have one handle to the FS. This only closed when
> the 
>> application is shutdown.
>>     		fs = getFileSystem();
>>     					
>>     		// Spawn a new thread to do the work
>>     		TestThread t = new TestFsHadoop().new TestThread();
>>     		t.start();			
>>     		
>>     	} catch (Exception e) {
>>     		e.printStackTrace();
>>     	}
>> 	}
>>
>> 	static public FileSystem getFileSystem () {
>> 		
>> 		FileSystem fs = null;
>> 		
>> 		try {
>> 			
>> 			fs = FileSystem.get(conf);
>> 			
>> 			fs.setWorkingDirectory(new
>> Path("/opt/hadoop/data"));
>> 			
>> 		} catch (Exception e) {
>> 			e.printStackTrace();
>> 		}
>> 		
>> 		System.out.println ("Returning FS " + fs);
>> 		
>> 		return (fs);
>> 	}	
>> 	
>> 	
>> 	/**
>> 	 * 
>> 	 */
>> 	static private void initHdfsReader () {
>> 	    
>> 	    try {	 
>> 	    	
>> 	    	conf = new Configuration();
>> 	    	
>> 	    	String url =
>> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-default.xml";
>>
>> 	    	Path path = new Path (url);
>> 	    	
>> 	    	conf.addResource(path);
>> 	    	
>> 	    	url =
>> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-site.xml";
>>
>> 	    	path = new Path (url);
>> 	    	
>> 	    	conf.addResource(path);
>> 				    
>> 		
>> 	    } catch (Exception e) {
>> 	    	e.printStackTrace();
>> 	    }
>> 	}
>> 	
>> 	public class TestThread extends Thread {
>> 		
>> 		/**
>> 		 *
>> 		 */
>> 		public void run() {
>> 					
>> 			try {  		
>>
>> 				// hard-coded to open a file for this
>> test harness				
>> 				 Path p = new
>> Path("/opt/hadoop/data/clickstream/cs_1_20080729_1_of_5.dat.gz");
>> 				 
>> 				FSDataInputStream fis = fs.open(p);
>>
>> 				
>> 				byte[] in = new byte[5];
>> 				
>> 				int bytesRead = 0;
>>
>> 				while (((bytesRead = fis.read(in)) !=
>> -1) && (bytesRead > 0)) {
>> 					// ... Do Stuff ... 
>> 				}
>> 				
>> 				fis.close();
>> 				
>> 			} catch (Exception e) {			
>> 				e.printStackTrace();
>> 			}
>> 		}
>> 	}
>> 	
>> }
>>
>> -----Original Message-----
>> From: Arv Mistry [mailto:arv@kindsight.net]
>> Sent: Thursday, July 31, 2008 9:30 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: File Descriptors not cleaned up
>>
>>  
>> I've simplified the code into a simple test harness with just hadoop 
>> (see attached file)
>>
>> I found that I can only reproduce this problem when I am doing the
>> fs.open() in a different thread. Even though in that same thread I am 
>> doing a close().
>>
>> Cheers Arv
>>
>> -----Original Message-----
>> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
>> Sent: Wednesday, July 30, 2008 7:36 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: File Descriptors not cleaned up
>>
>> Arv Mistry wrote:
>>>  
>>> Thanks for responding Raghu,
>>>
>>> This code is run every hour, where I open a file ( a different file 
>>> each
>>> time) and write it across the network to another location. So if 
>>> everytime it adds an additional 3 fd's then after some time I'm going
> 
>>> to run out of fd's
>> It should not add 3 fds every time. If you do see a practical case 
>> where you trace running out of fds to these three fds, please let us
> know.
>> Raghu.
>>

RE: File Descriptors not cleaned up

Posted by Arv Mistry <ar...@kindsight.net>.
 
Raghu,

This is a real scenario for our application which nightly does basically
what the test stub is doing and what happens there now is after several
weeks the system stops processing requests with an exception "Too many
files open".

The default ulimit is 1024 files (I think this is per process) so once
my process exceeds this it will throw an exception and I am forced to
restart.

Why does this not happen when I'm in a single-threaded mode?

Will you be able to provide a fix?

Cheers Arv

-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Thursday, July 31, 2008 1:10 PM
To: core-user@hadoop.apache.org
Subject: Re: File Descriptors not cleaned up


I might have missed something earlier and might be asking some thing you
already answered. Hope its ok :

What do expect to see in you test program and how is it different from
what you expect? In addition, why is that a problem?

Hadoop implementation *does leave* the 3 fds you mentioned. We could get
rid of it.. but we don't (yet). We could have another clean up thread or
scheduled some thing with JVM timer.

I can write a test that leaves hundreds or thousands of fds open. We
just thought that is case is not practical.

It is still not clear to me if you are saying your code is leaving lots
and lots of fds open or if you are just wondering about the 3 you
mentioned earlier.

Please check out a few comments starting at
https://issues.apache.org/jira/browse/HADOOP-2346?focusedCommentId=12566
250#action_12566250
for more background on design choices made.

Raghu.

Arv Mistry wrote:
> I guess the attachment got stripped, so here it is inline ...
> 
>  public class TestFsHadoop {
> 
> 	public static Configuration conf       = null;
> 	public static FileSystem fs = null;
> 
> 		
> 	/**
> 	 * @param args
> 	 */
> 	public static void main(String[] args) {
> 		
>     	try {
>     		    		
>     		initHdfsReader();
>     		
>     		// Just have one handle to the FS. This only closed when
the 
> application is shutdown.
>     		fs = getFileSystem();
>     					
>     		// Spawn a new thread to do the work
>     		TestThread t = new TestFsHadoop().new TestThread();
>     		t.start();			
>     		
>     	} catch (Exception e) {
>     		e.printStackTrace();
>     	}
> 	}
> 
> 	static public FileSystem getFileSystem () {
> 		
> 		FileSystem fs = null;
> 		
> 		try {
> 			
> 			fs = FileSystem.get(conf);
> 			
> 			fs.setWorkingDirectory(new
> Path("/opt/hadoop/data"));
> 			
> 		} catch (Exception e) {
> 			e.printStackTrace();
> 		}
> 		
> 		System.out.println ("Returning FS " + fs);
> 		
> 		return (fs);
> 	}	
> 	
> 	
> 	/**
> 	 * 
> 	 */
> 	static private void initHdfsReader () {
> 	    
> 	    try {	 
> 	    	
> 	    	conf = new Configuration();
> 	    	
> 	    	String url =
> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-default.xml";
> 
> 	    	Path path = new Path (url);
> 	    	
> 	    	conf.addResource(path);
> 	    	
> 	    	url =
> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-site.xml";
> 
> 	    	path = new Path (url);
> 	    	
> 	    	conf.addResource(path);
> 				    
> 		
> 	    } catch (Exception e) {
> 	    	e.printStackTrace();
> 	    }
> 	}
> 	
> 	public class TestThread extends Thread {
> 		
> 		/**
> 		 *
> 		 */
> 		public void run() {
> 					
> 			try {  		
> 
> 				// hard-coded to open a file for this
> test harness				
> 				 Path p = new
> Path("/opt/hadoop/data/clickstream/cs_1_20080729_1_of_5.dat.gz");
> 				 
> 				FSDataInputStream fis = fs.open(p);
> 
> 				
> 				byte[] in = new byte[5];
> 				
> 				int bytesRead = 0;
> 
> 				while (((bytesRead = fis.read(in)) !=
> -1) && (bytesRead > 0)) {
> 					// ... Do Stuff ... 
> 				}
> 				
> 				fis.close();
> 				
> 			} catch (Exception e) {			
> 				e.printStackTrace();
> 			}
> 		}
> 	}
> 	
> }
> 
> -----Original Message-----
> From: Arv Mistry [mailto:arv@kindsight.net]
> Sent: Thursday, July 31, 2008 9:30 AM
> To: core-user@hadoop.apache.org
> Subject: RE: File Descriptors not cleaned up
> 
>  
> I've simplified the code into a simple test harness with just hadoop 
> (see attached file)
> 
> I found that I can only reproduce this problem when I am doing the
> fs.open() in a different thread. Even though in that same thread I am 
> doing a close().
> 
> Cheers Arv
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, July 30, 2008 7:36 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> Arv Mistry wrote:
>>  
>> Thanks for responding Raghu,
>>
>> This code is run every hour, where I open a file ( a different file 
>> each
>> time) and write it across the network to another location. So if 
>> everytime it adds an additional 3 fd's then after some time I'm going

>> to run out of fd's
> 
> It should not add 3 fds every time. If you do see a practical case 
> where you trace running out of fds to these three fds, please let us
know.
> 
> Raghu.
> 
>> Cheers Arv
>>
>>
>> -----Original Message-----
>> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
>> Sent: Wednesday, July 30, 2008 4:33 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: File Descriptors not cleaned up
>>
>> Arv Mistry wrote:
>>>  
>>> I've been trying to track down an issue where after some time I get 
>>> "Too many files open " i.e.
>>> we're not cleaning up somewhere ...
>>>
>>> I'm using "lsof -p <pid>" to track the open files and I find it's 
>>> adding
>>> 3 file descriptors everytime I do a
>>> fs.open(<file>) where fs is FileSystem and <file> is a Path object 
>>> to
> 
>>> a gzipped file in hadoop. When I'm done I call
>>> Close() on the FSDataInputStream that the open returned. But those 3

>>> file descriptors never get cleaned up.
>>>
>>> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.
>> Thats ok. Hadoop I/O leaves this set. What makes you think this set 
>> of
> 
>> 3 fds is causing "Too many file open"? I doubt it. Do you see many 
>> sets of these fds being left open?
>>
>>> Is there some other cleanup method I should be calling, other than 
>>> on
> 
>>> the InputStream after the open()?
>> This is no API to clean up these last set of fds currently.
>>
>> Raghu.
>>
>>> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
>>>
>>> Cheers Arv
> 


Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
I might have missed something earlier and might be asking some thing you 
already answered. Hope its ok :

What do expect to see in you test program and how is it different from 
what you expect? In addition, why is that a problem?

Hadoop implementation *does leave* the 3 fds you mentioned. We could get 
rid of it.. but we don't (yet). We could have another clean up thread or 
scheduled some thing with JVM timer.

I can write a test that leaves hundreds or thousands of fds open. We 
just thought that is case is not practical.

It is still not clear to me if you are saying your code is leaving lots 
and lots of fds open or if you are just wondering about the 3 you 
mentioned earlier.

Please check out a few comments starting at 
https://issues.apache.org/jira/browse/HADOOP-2346?focusedCommentId=12566250#action_12566250
for more background on design choices made.

Raghu.

Arv Mistry wrote:
> I guess the attachment got stripped, so here it is inline ...
> 
>  public class TestFsHadoop {
> 
> 	public static Configuration conf       = null;
> 	public static FileSystem fs = null;
> 
> 		
> 	/**
> 	 * @param args
> 	 */
> 	public static void main(String[] args) {
> 		
>     	try {
>     		    		
>     		initHdfsReader();
>     		
>     		// Just have one handle to the FS. This only closed when
> the application is shutdown.
>     		fs = getFileSystem();
>     					
>     		// Spawn a new thread to do the work
>     		TestThread t = new TestFsHadoop().new TestThread();
>     		t.start();			
>     		
>     	} catch (Exception e) {
>     		e.printStackTrace();
>     	}
> 	}
> 
> 	static public FileSystem getFileSystem () {
> 		
> 		FileSystem fs = null;
> 		
> 		try {
> 			
> 			fs = FileSystem.get(conf);
> 			
> 			fs.setWorkingDirectory(new
> Path("/opt/hadoop/data"));
> 			
> 		} catch (Exception e) {
> 			e.printStackTrace();
> 		}
> 		
> 		System.out.println ("Returning FS " + fs);
> 		
> 		return (fs);
> 	}	
> 	
> 	
> 	/**
> 	 * 
> 	 */
> 	static private void initHdfsReader () {
> 	    
> 	    try {	 
> 	    	
> 	    	conf = new Configuration();
> 	    	
> 	    	String url =
> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-default.xml";
> 
> 	    	Path path = new Path (url);
> 	    	
> 	    	conf.addResource(path);
> 	    	
> 	    	url =
> "File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-site.xml";
> 
> 	    	path = new Path (url);
> 	    	
> 	    	conf.addResource(path);
> 				    
> 		
> 	    } catch (Exception e) {
> 	    	e.printStackTrace();
> 	    }
> 	}
> 	
> 	public class TestThread extends Thread {
> 		
> 		/**
> 		 *
> 		 */
> 		public void run() {
> 					
> 			try {  		
> 
> 				// hard-coded to open a file for this
> test harness				
> 				 Path p = new
> Path("/opt/hadoop/data/clickstream/cs_1_20080729_1_of_5.dat.gz");
> 				 
> 				FSDataInputStream fis = fs.open(p);
> 
> 				
> 				byte[] in = new byte[5];
> 				
> 				int bytesRead = 0;
> 
> 				while (((bytesRead = fis.read(in)) !=
> -1) && (bytesRead > 0)) {
> 					// ... Do Stuff ... 
> 				}
> 				
> 				fis.close();  
> 				
> 			} catch (Exception e) {			
> 				e.printStackTrace();
> 			}
> 		}
> 	}
> 	
> }
> 
> -----Original Message-----
> From: Arv Mistry [mailto:arv@kindsight.net] 
> Sent: Thursday, July 31, 2008 9:30 AM
> To: core-user@hadoop.apache.org
> Subject: RE: File Descriptors not cleaned up
> 
>  
> I've simplified the code into a simple test harness with just hadoop
> (see attached file)
> 
> I found that I can only reproduce this problem when I am doing the
> fs.open() in a different thread. Even though in that same thread I am
> doing a close().
> 
> Cheers Arv
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, July 30, 2008 7:36 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> Arv Mistry wrote:
>>  
>> Thanks for responding Raghu,
>>
>> This code is run every hour, where I open a file ( a different file 
>> each
>> time) and write it across the network to another location. So if 
>> everytime it adds an additional 3 fd's then after some time I'm going 
>> to run out of fd's
> 
> It should not add 3 fds every time. If you do see a practical case where
> you trace running out of fds to these three fds, please let us know.
> 
> Raghu.
> 
>> Cheers Arv
>>
>>
>> -----Original Message-----
>> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
>> Sent: Wednesday, July 30, 2008 4:33 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: File Descriptors not cleaned up
>>
>> Arv Mistry wrote:
>>>  
>>> I've been trying to track down an issue where after some time I get 
>>> "Too many files open " i.e.
>>> we're not cleaning up somewhere ...
>>>
>>> I'm using "lsof -p <pid>" to track the open files and I find it's 
>>> adding
>>> 3 file descriptors everytime I do a
>>> fs.open(<file>) where fs is FileSystem and <file> is a Path object to
> 
>>> a gzipped file in hadoop. When I'm done I call
>>> Close() on the FSDataInputStream that the open returned. But those 3 
>>> file descriptors never get cleaned up.
>>>
>>> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.
>> Thats ok. Hadoop I/O leaves this set. What makes you think this set of
> 
>> 3 fds is causing "Too many file open"? I doubt it. Do you see many 
>> sets of these fds being left open?
>>
>>> Is there some other cleanup method I should be calling, other than on
> 
>>> the InputStream after the open()?
>> This is no API to clean up these last set of fds currently.
>>
>> Raghu.
>>
>>> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
>>>
>>> Cheers Arv
> 


RE: File Descriptors not cleaned up

Posted by Arv Mistry <ar...@kindsight.net>.
I guess the attachment got stripped, so here it is inline ...

 public class TestFsHadoop {

	public static Configuration conf       = null;
	public static FileSystem fs = null;

		
	/**
	 * @param args
	 */
	public static void main(String[] args) {
		
    	try {
    		    		
    		initHdfsReader();
    		
    		// Just have one handle to the FS. This only closed when
the application is shutdown.
    		fs = getFileSystem();
    					
    		// Spawn a new thread to do the work
    		TestThread t = new TestFsHadoop().new TestThread();
    		t.start();			
    		
    	} catch (Exception e) {
    		e.printStackTrace();
    	}
	}

	static public FileSystem getFileSystem () {
		
		FileSystem fs = null;
		
		try {
			
			fs = FileSystem.get(conf);
			
			fs.setWorkingDirectory(new
Path("/opt/hadoop/data"));
			
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		System.out.println ("Returning FS " + fs);
		
		return (fs);
	}	
	
	
	/**
	 * 
	 */
	static private void initHdfsReader () {
	    
	    try {	 
	    	
	    	conf = new Configuration();
	    	
	    	String url =
"File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-default.xml";

	    	Path path = new Path (url);
	    	
	    	conf.addResource(path);
	    	
	    	url =
"File:///opt/profilecluster/hadoop-0.17.0/conf/hadoop-site.xml";

	    	path = new Path (url);
	    	
	    	conf.addResource(path);
				    
		
	    } catch (Exception e) {
	    	e.printStackTrace();
	    }
	}
	
	public class TestThread extends Thread {
		
		/**
		 *
		 */
		public void run() {
					
			try {  		

				// hard-coded to open a file for this
test harness				
				 Path p = new
Path("/opt/hadoop/data/clickstream/cs_1_20080729_1_of_5.dat.gz");
				 
				FSDataInputStream fis = fs.open(p);

				
				byte[] in = new byte[5];
				
				int bytesRead = 0;

				while (((bytesRead = fis.read(in)) !=
-1) && (bytesRead > 0)) {
					// ... Do Stuff ... 
				}
				
				fis.close();  
				
			} catch (Exception e) {			
				e.printStackTrace();
			}
		}
	}
	
}

-----Original Message-----
From: Arv Mistry [mailto:arv@kindsight.net] 
Sent: Thursday, July 31, 2008 9:30 AM
To: core-user@hadoop.apache.org
Subject: RE: File Descriptors not cleaned up

 
I've simplified the code into a simple test harness with just hadoop
(see attached file)

I found that I can only reproduce this problem when I am doing the
fs.open() in a different thread. Even though in that same thread I am
doing a close().

Cheers Arv

-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
Sent: Wednesday, July 30, 2008 7:36 PM
To: core-user@hadoop.apache.org
Subject: Re: File Descriptors not cleaned up

Arv Mistry wrote:
>  
> Thanks for responding Raghu,
> 
> This code is run every hour, where I open a file ( a different file 
> each
> time) and write it across the network to another location. So if 
> everytime it adds an additional 3 fd's then after some time I'm going 
> to run out of fd's

It should not add 3 fds every time. If you do see a practical case where
you trace running out of fds to these three fds, please let us know.

Raghu.

> Cheers Arv
> 
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, July 30, 2008 4:33 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> Arv Mistry wrote:
>>  
>> I've been trying to track down an issue where after some time I get 
>> "Too many files open " i.e.
>> we're not cleaning up somewhere ...
>>
>> I'm using "lsof -p <pid>" to track the open files and I find it's 
>> adding
>> 3 file descriptors everytime I do a
>> fs.open(<file>) where fs is FileSystem and <file> is a Path object to

>> a gzipped file in hadoop. When I'm done I call
>> Close() on the FSDataInputStream that the open returned. But those 3 
>> file descriptors never get cleaned up.
>>
>> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.
> 
> Thats ok. Hadoop I/O leaves this set. What makes you think this set of

> 3 fds is causing "Too many file open"? I doubt it. Do you see many 
> sets of these fds being left open?
> 
>> Is there some other cleanup method I should be calling, other than on

>> the InputStream after the open()?
> 
> This is no API to clean up these last set of fds currently.
> 
> Raghu.
> 
>> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
>>
>> Cheers Arv
> 


RE: File Descriptors not cleaned up

Posted by Arv Mistry <ar...@kindsight.net>.
 
I've simplified the code into a simple test harness with just hadoop
(see attached file)

I found that I can only reproduce this problem when I am doing the
fs.open() in a different thread. Even though in that same thread I am
doing a close().

Cheers Arv

-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Wednesday, July 30, 2008 7:36 PM
To: core-user@hadoop.apache.org
Subject: Re: File Descriptors not cleaned up

Arv Mistry wrote:
>  
> Thanks for responding Raghu,
> 
> This code is run every hour, where I open a file ( a different file 
> each
> time) and write it across the network to another location. So if 
> everytime it adds an additional 3 fd's then after some time I'm going 
> to run out of fd's

It should not add 3 fds every time. If you do see a practical case where
you trace running out of fds to these three fds, please let us know.

Raghu.

> Cheers Arv
> 
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, July 30, 2008 4:33 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> Arv Mistry wrote:
>>  
>> I've been trying to track down an issue where after some time I get 
>> "Too many files open " i.e.
>> we're not cleaning up somewhere ...
>>
>> I'm using "lsof -p <pid>" to track the open files and I find it's 
>> adding
>> 3 file descriptors everytime I do a
>> fs.open(<file>) where fs is FileSystem and <file> is a Path object to

>> a gzipped file in hadoop. When I'm done I call
>> Close() on the FSDataInputStream that the open returned. But those 3 
>> file descriptors never get cleaned up.
>>
>> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.
> 
> Thats ok. Hadoop I/O leaves this set. What makes you think this set of

> 3 fds is causing "Too many file open"? I doubt it. Do you see many 
> sets of these fds being left open?
> 
>> Is there some other cleanup method I should be calling, other than on

>> the InputStream after the open()?
> 
> This is no API to clean up these last set of fds currently.
> 
> Raghu.
> 
>> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
>>
>> Cheers Arv
> 


Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Arv Mistry wrote:
>  
> Thanks for responding Raghu,
> 
> This code is run every hour, where I open a file ( a different file each
> time) and write it across the network to another location. So if
> everytime it adds an additional 3 fd's then after some time I'm going to
> run out of fd's

It should not add 3 fds every time. If you do see a practical case where 
you trace running out of fds to these three fds, please let us know.

Raghu.

> Cheers Arv
> 
> 
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
> Sent: Wednesday, July 30, 2008 4:33 PM
> To: core-user@hadoop.apache.org
> Subject: Re: File Descriptors not cleaned up
> 
> Arv Mistry wrote:
>>  
>> I've been trying to track down an issue where after some time I get 
>> "Too many files open " i.e.
>> we're not cleaning up somewhere ...
>>
>> I'm using "lsof -p <pid>" to track the open files and I find it's 
>> adding
>> 3 file descriptors everytime I do a
>> fs.open(<file>) where fs is FileSystem and <file> is a Path object to 
>> a gzipped file in hadoop. When I'm done I call
>> Close() on the FSDataInputStream that the open returned. But those 3 
>> file descriptors never get cleaned up.
>>
>> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.
> 
> Thats ok. Hadoop I/O leaves this set. What makes you think this set of 3
> fds is causing "Too many file open"? I doubt it. Do you see many sets of
> these fds being left open?
> 
>> Is there some other cleanup method I should be calling, other than on 
>> the InputStream after the open()?
> 
> This is no API to clean up these last set of fds currently.
> 
> Raghu.
> 
>> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
>>
>> Cheers Arv
> 


RE: File Descriptors not cleaned up

Posted by Arv Mistry <ar...@kindsight.net>.
 
Thanks for responding Raghu,

This code is run every hour, where I open a file ( a different file each
time) and write it across the network to another location. So if
everytime it adds an additional 3 fd's then after some time I'm going to
run out of fd's

Cheers Arv


-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Wednesday, July 30, 2008 4:33 PM
To: core-user@hadoop.apache.org
Subject: Re: File Descriptors not cleaned up

Arv Mistry wrote:
>  
> I've been trying to track down an issue where after some time I get 
> "Too many files open " i.e.
> we're not cleaning up somewhere ...
> 
> I'm using "lsof -p <pid>" to track the open files and I find it's 
> adding
> 3 file descriptors everytime I do a
> fs.open(<file>) where fs is FileSystem and <file> is a Path object to 
> a gzipped file in hadoop. When I'm done I call
> Close() on the FSDataInputStream that the open returned. But those 3 
> file descriptors never get cleaned up.
> 
> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.

Thats ok. Hadoop I/O leaves this set. What makes you think this set of 3
fds is causing "Too many file open"? I doubt it. Do you see many sets of
these fds being left open?

> Is there some other cleanup method I should be calling, other than on 
> the InputStream after the open()?

This is no API to clean up these last set of fds currently.

Raghu.

> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
> 
> Cheers Arv


Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Arv Mistry wrote:
>  
> I've been trying to track down an issue where after some time I get "Too
> many files open " i.e.
> we're not cleaning up somewhere ...
> 
> I'm using "lsof -p <pid>" to track the open files and I find it's adding
> 3 file descriptors everytime I do a
> fs.open(<file>) where fs is FileSystem and <file> is a Path object to a
> gzipped file in hadoop. When I'm done I call
> Close() on the FSDataInputStream that the open returned. But those 3
> file descriptors never get cleaned up.
> 
> The 3 fd's; 2 are 'pipe' and 1 'eventpoll' everytime.

Thats ok. Hadoop I/O leaves this set. What makes you think this set of 3 
fds is causing "Too many file open"? I doubt it. Do you see many sets of 
these fds being left open?

> Is there some other cleanup method I should be calling, other than on
> the InputStream after the open()?

This is no API to clean up these last set of fds currently.

Raghu.

> I'm using hadoop-0.17.0 and have also tried hadoop-0.17.1
> 
> Cheers Arv