You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/11/10 16:09:00 UTC

Re: File Descriptors not cleaned up

We have just realized one reason for the '/no live node contains block/' 
error from /DFSClient/ is an indication that the /DFSClient/ was unable 
to open a connection due to insufficient available file descriptors.

FsShell is particularly bad about consuming descriptors and leaving the 
containing objects for the Garbage Collector to reclaim the descriptors.

We will submit a patch in a few days.

Raghu Angadi wrote:
> Arv Mistry wrote:
>>  
>> Raghu,
>>
>> In the test program I see 3 fd's used when the fs.open() is called. Two
>> of these are pipe and 1 eventpoll.
>> These 3 are never cleaned up and stay around. I track this by running it
>> in the debug mode and put a break point and use
>> Lsof -p <pid> to see the fd's. I do a diff of the output before the open
>> and after the open.
>
> It important to know _exactly_ where "before" and "after" break points 
> are in your example to answer accurately. In your example, I don't see 
> why extra thread matters. May be if you give me a runnable or close to 
> runnable example, I will know.
>
> But that does *not* mean there is an fd leak.
>
> For e.g., extend your example  like this : After the first thread 
> exists, repeat the same thing again. Do you see 6 more extra fds? You 
> wouldn't, or you shouldn't rather.
>
> If you want to further explore.. now sleep for 15 seconds in the main 
> thread after the second thread exits. Then invoke TestThread.run() in 
> the main thread (instead of using a seperate thread). Check lsof after 
> run() returns. What do you see?
>
> If you do these experiments and still think there is a leak, please 
> file a Jira.. file a jira even if you don't do the experiments :).
>
> IMHO, I still don't see any suspicious behavior.. may be 'lsof' when 
> your app sees 'too many open files' exception will clear this up us.
>
> Hope this helps.
> Raghu.
>
>> What I don't understand is why this doesn't get cleaned up when done in
>> a separate thread but does when its done in a single thread.
>>
>> This is a problem in the real system because I run out of fd's and am no
>> longer able to open any more files after a few weeks.
>> This forces me to do a system restart to flush things out.
>>
>> Cheers Arv
>>
>

Re: File Descriptors not cleaned up

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Jason Venner wrote:
> We have just realized one reason for the '/no live node contains block/' 
> error from /DFSClient/ is an indication that the /DFSClient/ was unable 
> to open a connection due to insufficient available file descriptors.
> 
> FsShell is particularly bad about consuming descriptors and leaving the 
> containing objects for the Garbage Collector to reclaim the descriptors.
> 
> We will submit a patch in a few days.

please do.

We know more since I last replied on July 31st. Hadoop itself does not 
have any finalizers that depend on GC to close fds. If GC is affecting 
number of fds, you are likely a victim of HADOOP-4346.

thanks,
Raghu.