You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Tibor Kiss (JIRA)" <ji...@apache.org> on 2017/02/12 09:51:41 UTC

[jira] [Created] (STORM-2355) Storm-HDFS: inotify support

Tibor Kiss created STORM-2355:
---------------------------------

             Summary: Storm-HDFS: inotify support
                 Key: STORM-2355
                 URL: https://issues.apache.org/jira/browse/STORM-2355
             Project: Apache Storm
          Issue Type: New Feature
          Components: storm-hdfs
            Reporter: Tibor Kiss
            Assignee: Tibor Kiss
             Fix For: 2.0.0, 1.1.0


This is a proposal to implement inotify based watch dir monitoring in Storm-HDFS Spout.

Motivation:
Storm-HDFS currently polls the input directory using Hadoop's {{FileSystem.listFiles}}. This operation is expensive since it returns the block locations and all stat information of the files inside the watch directory. Storm-HDFS currently uses only one element's Path of the returned list which is inefficient.

Proposed improvement:
Provide a way to monitor the input directory through HDFS's inotify API.
In order to have backward compatibility with the poll based solution I propose a new class ({{HdfsDirectoryMonitor}}) which implements both the inotify and poll based solution through a iterator. The user can enable inotify based polling through a configuration parameter.

Caveat: 
HDFS inotify is currently only available for root user, but there is ongoing discussion in Hadoop community to extend its support to users. See: HDFS-8940 

Testing related changes:
The {{TestHdfsSpout}} testcase should be parametrized to check for both the poll & inotify based solution.

Further work:
If the design is accepted the poll based solution could easily improved through {{HdfsDirectoryMonitor}} to properly use all the returned items from the work directory (similar to inotify based solution). Such improvement will reduce the number of calls made to {{FileSystem.listFiles}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)