You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by mo-getter <gi...@git.apache.org> on 2016/11/03 04:16:46 UTC

[GitHub] storm pull request #1760: Add topology stream-awareness to storm-redis

GitHub user mo-getter opened a pull request:

    https://github.com/apache/storm/pull/1760

    Add topology stream-awareness to storm-redis

    Allows users to control the streams to which the provided Bolts emit, via a StreamMapper interface, rather than emitting only to the "default" stream.
    
    The existing constructors for the Bolts use a DefaultStreamMapper so that there are no breaking changes in behavior. Although, in the future, it might make more sense to use the InputSourceStreamMapper by default, which emits new tuples to the same stream the input tuple came in on (especially for the RedisFilterBolt).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mo-getter/storm master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1760
    
----
commit 28e31f3050cc837164005b22c7b66aacecf2cd7b
Author: Alan Smith <al...@knoesis.org>
Date:   2016-11-03T03:56:08Z

    Add topology stream-awareness to storm-redis.
    
    Allows users to control the streams to which the provided Bolts emit,
    via a StreamMapper interface, rather than emitting only to the default
    stream.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1760: Add topology stream-awareness to storm-redis

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1760#discussion_r86603766
  
    --- Diff: external/storm-redis/src/main/java/org/apache/storm/redis/bolt/RedisFilterBolt.java ---
    @@ -45,18 +48,41 @@
      */
     public class RedisFilterBolt extends AbstractRedisBolt {
         private final RedisFilterMapper filterMapper;
    +    private final StreamMapper streamMapper;
         private final RedisDataTypeDescription.RedisDataType dataType;
         private final String additionalKey;
     
         /**
    -     * Constructor for single Redis environment (JedisPool)
    +     * Constructor for single Redis environment (JedisPool).
    +     * Tuples will be emitted to Storm's default streamId.
          * @param config configuration for initializing JedisPool
          * @param filterMapper mapper containing which datatype, query key that Bolt uses
          */
         public RedisFilterBolt(JedisPoolConfig config, RedisFilterMapper filterMapper) {
    --- End diff --
    
    Do you want to target this for master?  Or do you also want this in 1.x?  If it is just master we can play some games with a default method implementation in the FilterMapper interface.
    
    If you want it in 1.x I would suggest that we leave FilterMapper untouched and create a LookupMapper that also has the same, or similar methods to FilterMapper, but is not a FilterMapper.  Then you can have a wrapper class that is a LookupMapper, but takes a FilterMapper.  The code could then wrap any FilterMapper passed in, and just use the LookupMapper interface.
    
    I prefer the first one with the default methods because it reduce the number of classes and interfaces but is also binary compatible.  If we are not on java 8 like 1.x then we cannot use default methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1760: Add topology stream-awareness to storm-redis

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1760#discussion_r86378199
  
    --- Diff: external/storm-redis/src/main/java/org/apache/storm/redis/bolt/RedisFilterBolt.java ---
    @@ -45,18 +48,41 @@
      */
     public class RedisFilterBolt extends AbstractRedisBolt {
         private final RedisFilterMapper filterMapper;
    +    private final StreamMapper streamMapper;
         private final RedisDataTypeDescription.RedisDataType dataType;
         private final String additionalKey;
     
         /**
    -     * Constructor for single Redis environment (JedisPool)
    +     * Constructor for single Redis environment (JedisPool).
    +     * Tuples will be emitted to Storm's default streamId.
          * @param config configuration for initializing JedisPool
          * @param filterMapper mapper containing which datatype, query key that Bolt uses
          */
         public RedisFilterBolt(JedisPoolConfig config, RedisFilterMapper filterMapper) {
    --- End diff --
    
    This now creates a lot of coupling between the filter mapper and the stream mapper.  Simply because the Filter Mapper is the one that declares the output fields.
    
    ```
        public void declareOutputFields(OutputFieldsDeclarer declarer) {
            filterMapper.declareOutputFields(declarer);
        }
    ```
    
    So either we need to embrace the coupling and have StreamMapper also be a FilterMapper.  (which would require some documentation) or we find a way to fake out FilterMapper and have it declare multiple outputs for what the StreamMapper wants. 
    
    I prefer the first one, because it seems like it would be more flexible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1760: Add topology stream-awareness to storm-redis

Posted by mo-getter <gi...@git.apache.org>.
Github user mo-getter commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1760#discussion_r86495472
  
    --- Diff: external/storm-redis/src/main/java/org/apache/storm/redis/bolt/RedisFilterBolt.java ---
    @@ -45,18 +48,41 @@
      */
     public class RedisFilterBolt extends AbstractRedisBolt {
         private final RedisFilterMapper filterMapper;
    +    private final StreamMapper streamMapper;
         private final RedisDataTypeDescription.RedisDataType dataType;
         private final String additionalKey;
     
         /**
    -     * Constructor for single Redis environment (JedisPool)
    +     * Constructor for single Redis environment (JedisPool).
    +     * Tuples will be emitted to Storm's default streamId.
          * @param config configuration for initializing JedisPool
          * @param filterMapper mapper containing which datatype, query key that Bolt uses
          */
         public RedisFilterBolt(JedisPoolConfig config, RedisFilterMapper filterMapper) {
    --- End diff --
    
    Thanks for taking a look at this. I see your point, and I agree it makes the most sense for both to be done in the same interface. TL;DR is at the bottom ;-)
    
    The same problem also applies to RedisLookupMapper, which also defines `declareOutputFields` separately. I just came across [STORM-1953](https://issues.apache.org/jira/browse/STORM-1953).
    
    About your first option, would it make more sense for a FilterMapper to be a StreamMapper, rather than a StreamMapper be a FilterMapper? I'm afraid that making StreamMapper a FilterMapper would introduce too much ambiguity in the lookup and filter bolts (if the constructors accepted both), since we'd have to rely on only the docs to define which object's `declareOutputFields` would be called. It also would make STORM-1953 worse. Or did you mean the bolts only accept FilterMapper, and have something like this in `execute`:
    ```
    if (filterMapper instanceof StreamMapper) {
        String streamId = ((StreamMapper) filterMapper).getStreamId(input, value);
        collector.emit(streamId, input, value);
    } else {
        collector.emit(input, value);
    }
    ```
    Either way, if you combine them, one downside is that the provided convenience StreamMapper implementations would have to be sacrificed. Making them abstract probably wouldn't be worth it for something like just specifying the stream.
    
    In case you want to see what having FilterMapper and LookupMapper also extend StreamMapper looks like, I implemented that in a branch [here](https://github.com/apache/storm/compare/master...mo-getter:stream-mapper). The flexibility to dynamically choose a stream is there, but the problem is that the trident-related classes also use LookupMapper, and have no need to declare a streamId, yet users will have to implement this method in their LookupMappers. Just returning null is one [not so good] option here, and is also an option when using LookupMapper for bolts (in which case, the existing behavior of emitting to the default stream is maintained).
    
    **TL;DR:** I can't think of a great solution for what you mentioned, while maintaining user-friendliness of the API, without totally redoing the Mapper interfaces, i.e. [STORM-1953](https://issues.apache.org/jira/browse/STORM-1953). On the other hand, the above commit does maintain full backward compatibility and is probably most convenient for users!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1760: Add topology stream-awareness to storm-redis

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1760


---