You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Raghav Kumar Gautam (JIRA)" <ji...@apache.org> on 2016/06/16 22:09:05 UTC

[jira] [Updated] (STORM-1910) One topology can't use hdfs spout to read from two locations

     [ https://issues.apache.org/jira/browse/STORM-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghav Kumar Gautam updated STORM-1910:
---------------------------------------
    Description: 
The hdfs uri is passed using config:
{code}
    conf.put(Configs.HDFS_URI, hdfsUri);
{code}
I see two problems with this approach:
1. If someone wants to used two hdfsUri in same or different spouts - then that does not seem feasible.
https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/examples/storm-starter/src/jvm/storm/starter/HdfsSpoutTopology.java#L117-L117
https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L331-L331
{code}
    if ( !conf.containsKey(Configs.SOURCE_DIR) ) {
      LOG.error(Configs.SOURCE_DIR + " setting is required");
      throw new RuntimeException(Configs.SOURCE_DIR + " setting is required");
    }
    this.sourceDirPath = new Path( conf.get(Configs.SOURCE_DIR).toString() );
{code}
2. It does not fail fast i.e. at the time of topology submissing. We can fail fast if the hdfs path is invalid or credentials/permissions are not ok. Such errors at this time can only be detected at runtime by looking at the worker logs.
https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L297-L297

  was:
The hdfs uri is passed using config:
{code}
    conf.put(Configs.HDFS_URI, hdfsUri);
{code}
I see two problems with this approach:
1. If someone wants to used two hdfsUri in same or different spouts - then that does not seem feasible.
https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/examples/storm-starter/src/jvm/storm/starter/HdfsSpoutTopology.java#L117-L117
https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L331-L331
{code}
    if ( !conf.containsKey(Configs.SOURCE_DIR) ) {
      LOG.error(Configs.SOURCE_DIR + " setting is required");
      throw new RuntimeException(Configs.SOURCE_DIR + " setting is required");
    }
    this.sourceDirPath = new Path( conf.get(Configs.SOURCE_DIR).toString() );
{code}
2. It does not fail fast i.e. at the time of topology submissing. We can fail fast if the hdfs path is invalid or credentials/permissions are not ok. Such errors at this time can only be detected at runtime by looking at the worker logs.
https://github.com/hortonworks/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L297-L297


> One topology can't use hdfs spout to read from two locations
> ------------------------------------------------------------
>
>                 Key: STORM-1910
>                 URL: https://issues.apache.org/jira/browse/STORM-1910
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-hdfs
>    Affects Versions: 1.0.1
>            Reporter: Raghav Kumar Gautam
>             Fix For: 1.1.0
>
>
> The hdfs uri is passed using config:
> {code}
>     conf.put(Configs.HDFS_URI, hdfsUri);
> {code}
> I see two problems with this approach:
> 1. If someone wants to used two hdfsUri in same or different spouts - then that does not seem feasible.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/examples/storm-starter/src/jvm/storm/starter/HdfsSpoutTopology.java#L117-L117
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L331-L331
> {code}
>     if ( !conf.containsKey(Configs.SOURCE_DIR) ) {
>       LOG.error(Configs.SOURCE_DIR + " setting is required");
>       throw new RuntimeException(Configs.SOURCE_DIR + " setting is required");
>     }
>     this.sourceDirPath = new Path( conf.get(Configs.SOURCE_DIR).toString() );
> {code}
> 2. It does not fail fast i.e. at the time of topology submissing. We can fail fast if the hdfs path is invalid or credentials/permissions are not ok. Such errors at this time can only be detected at runtime by looking at the worker logs.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L297-L297



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)