You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Henry Cai (JIRA)" <ji...@apache.org> on 2016/06/26 07:08:33 UTC

[jira] [Commented] (KAFKA-3904) File descriptor leaking (Too many open files) for long running stream process

    [ https://issues.apache.org/jira/browse/KAFKA-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350018#comment-15350018 ] 

Henry Cai commented on KAFKA-3904:
----------------------------------

I have the fix also:

-        FileChannel channel = new RandomAccessFile(lockFile, "rw").getChannel(
+        FileChannel channel = null;
+        synchronized (channels) {
+            channel = channels.get(lockFile);
+            if (channel == null) {
+                channel = new RandomAccessFile(lockFile, "rw").getChannel();
+                channels.put(lockFile, channel);
+                log.info("Creating new channel: {} for file: {}", channel, loc
+            }
+        }
 

> File descriptor leaking (Too many open files) for long running stream process
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-3904
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3904
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Henry Cai
>            Assignee: Henry Cai
>              Labels: api, newbie
>
> I noticed when my application was running long (> 1 day), I will get 'Too many open files' error.
> I used 'lsof' to list all the file descriptors used by the process, it's over 32K, but most of them belongs to the .lock file, e.g. this same lock file shows 2700 times.
> I looked at the code, I think the problem is in:
>     File lockFile = new File(stateDir, ProcessorStateManager.LOCK_FILE_NAME);
>     FileChannel channel = new RandomAccessFile(lockFile, "rw").getChannel();
> Each time new RandomAccessFile is called, a new fd will be created, we probably should either close or reuse this RandomAccessFile object.
> lsof result:
> java    14799 hcai *740u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *743u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *746u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> java    14799 hcai *755u   REG                9,0        0 2415928585 /mnt/stream/join/rocksdb/ads-demo-30/0_16/.lock
> hcai@teststream02001:~$ lsof -p 14799 | grep lock | grep 0_16  | wc
>    2709   24381  319662



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)