You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by srdo <gi...@git.apache.org> on 2017/11/19 15:43:49 UTC

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/2428

     STORM-2826: Set key/value deserializer fields when using the convenience builder methods in KafkaSpoutConfig

    Builds on STORM-2825.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-2826

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2428
    
----
commit 299e92f39187d65bd9f9a900710a57429a84dec2
Author: Stig Rohde Døssing <sr...@apache.org>
Date:   2017-11-19T09:36:37Z

    STORM-2825: Fix ClassCastException when storm-kafka-client uses consumer config with String-type 'enable.auto.commit'

commit 8e082347bb25216bd18382ce52a516276bcc14d0
Author: Stig Rohde Døssing <sr...@apache.org>
Date:   2017-11-19T15:42:33Z

    STORM-2826: Set key/value deserializer fields when using the convenience builder methods in KafkaSpoutConfig

----


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r153035015
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -320,20 +320,17 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
                 // when they change the key/value types.
                 this.translator = (RecordTranslator<K, V>) builder.translator;
                 this.retryService = builder.retryService;
    -            
    -            if (keyDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
    -            }
    +
                 if (keyDes != null) {
                     this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDes.getClass());
    -            }
    -            if (valueDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
    -            }
    -            if (valueDes != null) {
    +            } else if (keyDesClazz != null) {
    --- End diff --
    
    One of the changes in https://github.com/apache/storm/pull/2215 is to make users set most Kafka properties through setProp instead of duplicating configuration parameters in KafkaSpoutConfig. These properties include key and value deserializers.
    
    In order to provide backward compatibility that PR tries to ensure that if the keyDes/keyDesClazz field is set, then the corresponding kafkaProps property is also set. The spout doesn't use the fields anymore, it only reads from the kafkaProps map. The setKey/setValue methods were also deprecated, and users were directed to use setProp instead. Additionally the convenience builder methods were changed so they didn't set the keyDes/keyDesClazz fields, but just set properties in kafkaProps instead. 
    
    The issue Alexandre hit during testing was that he was doing something like `kafkaSpoutConfig.getKeyDeserializer().getClass()` on a KafkaSpoutConfig built from one of the convenience builders. Since keyDes/keyDesClazz aren't being set by the default builders anymore, this causes an NPE.
    
    I think we still want to be able to provide the simplified KafkaSpoutConfig API to 1.x, and we still want to encourage users to use setProp instead of setKey/setValue. In order to fix the NPE we also have to set keyDes/keyDesClazz in the convenience builder methods again.
    
    Without this change the code behaves in a very confusing way when mixing use of setKey/setValue and setProp. The added unit tests demonstrate cases where the current code would act counterintuitively. Try pasting the current code into the modified Builder constructor, and you'll see test failures.
    
    The modified constructor is used when setKey/setValue is called. With the current code, calling e.g. setKey will actually overwrite both the key and value properties in kafkaProps if keyDesClazz and valueDesClazz were set in the old Builder. 
    
    For example, if you did `KafkaSpoutConfig.builder(topic).setProp(MyValueDeserializer.class).setKey(MyKeyDeserializer.class)`, you end up with a KafkaSpoutConfig where the value deserializer is `StringDeserializer`. This is because the `builder` method now sets both keyDesClazz and valueDesClazz by default. When setKey calls the copy constructor, it overwrites both the key and value deserializer properties in kafkaProps with the values of the key/valueDesClazz fields. The updated code makes sure that if you call setKey, then only the key deserializer property will be set. 
    
    The change isn't necessary for the constructor further up, because it isn't a copy constructor. The problem with the copy constructor is that it takes in a kafkaProps that already might contain settings for key/value deserializers and overwrites them. The unmodified constructor has an empty kafkaProps, so there's no issue.


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r153035388
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -320,20 +320,17 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
                 // when they change the key/value types.
                 this.translator = (RecordTranslator<K, V>) builder.translator;
                 this.retryService = builder.retryService;
    -            
    -            if (keyDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
    -            }
    +
                 if (keyDes != null) {
                     this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDes.getClass());
    -            }
    -            if (valueDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
    -            }
    -            if (valueDes != null) {
    +            } else if (keyDesClazz != null) {
    --- End diff --
    
    OK. That doesn't looks like easy to understand, but I understand that's unavoidable for backward compatibility.


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    Squashed to one commit. Not entirely sure if this is okay, since authorship is lost for your changes, but maybe it's not a big deal?


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r155909342
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -359,7 +356,7 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
              */
             @Deprecated
             public <NK> Builder<NK, V> setKey(Class<? extends Deserializer<NK>> clazz) {
    --- End diff --
    
    @srdo do you know why this method returns a new builder object? I can't figure a reason for it to so. I suspect that the only reason for that to happen is because the fields of the builder class are final (e.g keyDesClassClazz), and to make the generics work. There is no benefit in having fields inside the builder class to be final. The code snippet bellow also fixes the generics problem. Any reason not to get rid of the builder (with copy constructor) class completely and make this method like this:
    
    ```java
    public Builder<K,V> setKey(Class<? extends Deserializer<K>> clazz) {
                this.keyDesClazz = clazz;
                if (keyDesClazz != null) {
                    this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
                }
                return this;
            }
    ```
    
    We should do something similar to the other 3 methods. In my opinion has become a bit confusing, and I believe this is one of the last few opportunities we have to make it better. Please let me know your thoughts. Thanks.


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r151916643
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -292,17 +292,21 @@ private Builder(String bootstrapServers, SerializableDeserializer<K> keyDes, Cla
                 this.subscription = subscription;
                 this.translator = new DefaultRecordTranslator<>();
                 
    -            if (keyDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
    -            }
    -            if (keyDes != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDes.getClass());
    -            }
    -            if (valueDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
    +            if (!this.kafkaProps.containsKey(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG)) {
    --- End diff --
    
    Yes, I'll explain. In https://github.com/apache/storm/pull/2155 I changed the KafkaSpoutConfig API a bunch to try to avoid having custom methods for properties that users should just set via kafkaProps. Part of the change removes keyDes and keyDesClass, and tells users to set the corresponding properties in kafkaProps instead.
    
    When the changes were backported, I deprecated all constructors or methods referring to those fields, and switched the KafkaSpoutConfig.builder convenience methods to use a constructor that just sets the right properties in kafkaProps. It turns out this is a breaking change for users that build a KafkaSpoutConfig and use `getKeyDeserializer` or `getValueDeserializer` for anything, because they are now null when using the convenience builders, where they defaulted to StringDeserializers before.
    
    In order to retain backwards compatibility, the builders have to set the key/value deserializer fields to StringDeserializer again. I still want to get rid of the fields though, so to allow users to switch to using kafkaProps instead, we'll only use the fields if the corresponding properties in kafkaProps are not set. If we set the properties based on the fields unconditionally, we would overwrite the deserializer settings for users that set the properties in kafkaProps.
    
    > What happens if this if statement is false
    If the expression is false, the field settings are ignored. This is the behavior I think we want, since it means the user must have set the right property in kafkaProps. The consequence of this is a mismatch between what is in kafkaProps, and what is set in the key deserializer field. I suppose we could overwrite the key deserializer field with whatever is in kafkaProps to resolve the conflict?
    
     >The two ifs bellow, on lines 296 and 299, I think they can possibly be both true and with different values, if you are dealing with subtypes. If so, what happens in that case?
    We would use the deserializer from the keyDes field. I think this kind of ambiguity is pretty nasty, but it's consistent with previous behavior. 


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @hmcl Thanks, merged your PR and squashed to two commits. The only change since your PR is a whitespace change, and removing an unnecessary else (there was a throw in the if block)


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r155916418
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -359,7 +356,7 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
              */
             @Deprecated
             public <NK> Builder<NK, V> setKey(Class<? extends Deserializer<NK>> clazz) {
    --- End diff --
    
    Yes, the reasons you mention are the reasons these methods return new builders.
    
    I agree that there is no reason for the fields to be final.
    
    The snippet breaks the ability to change the key/value deserializer types. The existing code allows you to do e.g. 
    ```
    @Test
        public void test() {
           
            KafkaSpoutConfig<String, byte[]> conf = 
                //Use default <String, String> Builder
                KafkaSpoutConfig.builder("localhost:1234", "topic") 
                //Change to byte array value deserializer
                .setValue(ByteArrayDeserializer.class)
                .build();
        }
    ```
    which now fails to compile because the new type bound on setKey/setValue prevents changing from `Deserializer<String>` to `Deserializer<byte[]>`. 
    
    You're right that the existing API is somewhat confusing, but it's getting removed in 2.0.0 and is deprecated here. I'd rather not add breaking changes to it if we can avoid it, because if we were going to break the API in a minor version I think we should just have removed the deprecated methods entirely. 


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @HeartSaVioR @srdo I will finalize my review by Monday PST


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/2428


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @srdo aiming to getting this PR merged more quickly I created a [PR](https://github.com/srdo/storm/pull/1) with a suggested fix off your branch. If you agree with the fix, can you please incorporate it, squash the commits, and push it again here. I will then review it right away. Thanks.


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    +1. Can you please squash. Thanks.


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r151886343
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -292,17 +292,21 @@ private Builder(String bootstrapServers, SerializableDeserializer<K> keyDes, Cla
                 this.subscription = subscription;
                 this.translator = new DefaultRecordTranslator<>();
                 
    -            if (keyDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
    -            }
    -            if (keyDes != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDes.getClass());
    -            }
    -            if (valueDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
    +            if (!this.kafkaProps.containsKey(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG)) {
    --- End diff --
    
    @srdo can you please clarify what you are trying to do? What happens if this if statement is false? Won't it cause kafkaProps to keep whatever value they have set and the fields keyClassDeserializer something else? What are the implications of that ?
    
    The two ifs bellow, on lines 296 and 299, I think they can possibly be both true and with different values, if you are dealing with subtypes. If so, what happens in that case?


---

[GitHub] storm pull request #2428: STORM-2826: Set key/value deserializer fields when...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2428#discussion_r153033776
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java ---
    @@ -320,20 +320,17 @@ private Builder(Builder<?, ?> builder, SerializableDeserializer<K> keyDes, Class
                 // when they change the key/value types.
                 this.translator = (RecordTranslator<K, V>) builder.translator;
                 this.retryService = builder.retryService;
    -            
    -            if (keyDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDesClazz);
    -            }
    +
                 if (keyDes != null) {
                     this.kafkaProps.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, keyDes.getClass());
    -            }
    -            if (valueDesClazz != null) {
    -                this.kafkaProps.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, valueDesClazz);
    -            }
    -            if (valueDes != null) {
    +            } else if (keyDesClazz != null) {
    --- End diff --
    
    
    
    I'm not sure why this change is necessary, and if change is necessary, why we don't change above constructor as well?



---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @srdo don't worry about the authorship. It was just a code review for which I created a PR to make it easier to discuss.


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @srdo although some of these methods have been deprecated for 2.0, customers that are currently in a 1.x.y version will likely use this version for a few years. We will have to maintain this codebase for quite a long time, and therefore I am in favor of making at least the code a bit more readable. I had quite a hard time to understand what the existing code is doing. I have another suggestion, which I also shared with you on a [PR](https://github.com/srdo/storm/pull/1). 
    
    I will leave it up to you which one to pick and I am +1 after that such that we can move forward. Thanks.



---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by arunmahadevan <gi...@git.apache.org>.
Github user arunmahadevan commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @HeartSaVioR can we look at merging this? @hmcl if you have any further comments you can put it here asap.


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    @arunmahadevan I am looking into this now. Thanks.


---

[GitHub] storm issue #2428: STORM-2826: Set key/value deserializer fields when using ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2428
  
    +1
    @hmcl It would be great if you could finish the review, sure, in several days after Thanksgiving week. 


---