You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/11/10 16:09:00 UTC
Re: File Descriptors not cleaned up
We have just realized one reason for the '/no live node contains block/'
error from /DFSClient/ is an indication that the /DFSClient/ was unable
to open a connection due to insufficient available file descriptors.
FsShell is particularly bad about consuming descriptors and leaving the
containing objects for the Garbage Collector to reclaim the descriptors.
We will submit a patch in a few days.
Raghu Angadi wrote:
> Arv Mistry wrote:
>>
>> Raghu,
>>
>> In the test program I see 3 fd's used when the fs.open() is called. Two
>> of these are pipe and 1 eventpoll.
>> These 3 are never cleaned up and stay around. I track this by running it
>> in the debug mode and put a break point and use
>> Lsof -p <pid> to see the fd's. I do a diff of the output before the open
>> and after the open.
>
> It important to know _exactly_ where "before" and "after" break points
> are in your example to answer accurately. In your example, I don't see
> why extra thread matters. May be if you give me a runnable or close to
> runnable example, I will know.
>
> But that does *not* mean there is an fd leak.
>
> For e.g., extend your example like this : After the first thread
> exists, repeat the same thing again. Do you see 6 more extra fds? You
> wouldn't, or you shouldn't rather.
>
> If you want to further explore.. now sleep for 15 seconds in the main
> thread after the second thread exits. Then invoke TestThread.run() in
> the main thread (instead of using a seperate thread). Check lsof after
> run() returns. What do you see?
>
> If you do these experiments and still think there is a leak, please
> file a Jira.. file a jira even if you don't do the experiments :).
>
> IMHO, I still don't see any suspicious behavior.. may be 'lsof' when
> your app sees 'too many open files' exception will clear this up us.
>
> Hope this helps.
> Raghu.
>
>> What I don't understand is why this doesn't get cleaned up when done in
>> a separate thread but does when its done in a single thread.
>>
>> This is a problem in the real system because I run out of fd's and am no
>> longer able to open any more files after a few weeks.
>> This forces me to do a system restart to flush things out.
>>
>> Cheers Arv
>>
>
Re: File Descriptors not cleaned up
Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Jason Venner wrote:
> We have just realized one reason for the '/no live node contains block/'
> error from /DFSClient/ is an indication that the /DFSClient/ was unable
> to open a connection due to insufficient available file descriptors.
>
> FsShell is particularly bad about consuming descriptors and leaving the
> containing objects for the Garbage Collector to reclaim the descriptors.
>
> We will submit a patch in a few days.
please do.
We know more since I last replied on July 31st. Hadoop itself does not
have any finalizers that depend on GC to close fds. If GC is affecting
number of fds, you are likely a victim of HADOOP-4346.
thanks,
Raghu.