You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Tibor Kiss (JIRA)" <ji...@apache.org> on 2017/02/12 09:51:41 UTC
[jira] [Created] (STORM-2355) Storm-HDFS: inotify support
Tibor Kiss created STORM-2355:
---------------------------------
Summary: Storm-HDFS: inotify support
Key: STORM-2355
URL: https://issues.apache.org/jira/browse/STORM-2355
Project: Apache Storm
Issue Type: New Feature
Components: storm-hdfs
Reporter: Tibor Kiss
Assignee: Tibor Kiss
Fix For: 2.0.0, 1.1.0
This is a proposal to implement inotify based watch dir monitoring in Storm-HDFS Spout.
Motivation:
Storm-HDFS currently polls the input directory using Hadoop's {{FileSystem.listFiles}}. This operation is expensive since it returns the block locations and all stat information of the files inside the watch directory. Storm-HDFS currently uses only one element's Path of the returned list which is inefficient.
Proposed improvement:
Provide a way to monitor the input directory through HDFS's inotify API.
In order to have backward compatibility with the poll based solution I propose a new class ({{HdfsDirectoryMonitor}}) which implements both the inotify and poll based solution through a iterator. The user can enable inotify based polling through a configuration parameter.
Caveat:
HDFS inotify is currently only available for root user, but there is ongoing discussion in Hadoop community to extend its support to users. See: HDFS-8940
Testing related changes:
The {{TestHdfsSpout}} testcase should be parametrized to check for both the poll & inotify based solution.
Further work:
If the design is accepted the poll based solution could easily improved through {{HdfsDirectoryMonitor}} to properly use all the returned items from the work directory (similar to inotify based solution). Such improvement will reduce the number of calls made to {{FileSystem.listFiles}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)