You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by dujiashu <gi...@git.apache.org> on 2018/01/03 06:00:16 UTC

[GitHub] storm pull request #2492: Update TridentKafkaEmitter.java

GitHub user dujiashu opened a pull request:

    https://github.com/apache/storm/pull/2492

    Update TridentKafkaEmitter.java

    fix fetchmessages offsetoutofrange bug.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dujiashu/storm master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2492
    
----
commit 919abfa26eec1d1b6b51e937b081c5c444814f70
Author: dujiashu <92...@...>
Date:   2018-01-03T05:58:39Z

    Update TridentKafkaEmitter.java
    
    fix fetchmessages offsetoutofrange bug

----


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @srdo Thanks so much. I always think opaquespout is only used to resolve kafka disk broken when not set replication....   


---

[GitHub] storm issue #2492: Update TridentKafkaEmitter.java

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    how can I pass the checkstyle? Thanks for help.


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @srdo Thanks for you help.
    https://issues.apache.org/jira/browse/STORM-2883


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @dujiashu I meant could you close it on your end? I think only the author can close PRs at this time.


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    When tridentAPI, although set config.usestarttimeifoffsetoutofrange true, it would throw offsetoutofrange exception yet.
    This issue fix this bug.


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @dujiashu Happy to help.
    
    If you agree that we shouldn't make this change, would you mind closing this PR? Thanks.


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @srdo Sure, this PR should be closed.  


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    This breaks the semantics of transactional spouts. Please see http://storm.apache.org/releases/1.1.1/Trident-spouts.html and the sections about transactional and opaque spouts here http://storm.apache.org/releases/1.1.1/Trident-state.html. The latter link describes exactly you case in the example for why transactional spouts should not always be used.
    
    > You might be wondering – why wouldn't you just always use a transactional spout? They're simple and easy to understand. One reason you might not use one is because they're not necessarily very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You're now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt. 
    
    Basically transactional non-opaque spouts guarantee that a given transaction id always corresponds to the same set of tuples. So for example if txid = 1 contains tuples [A, B, C], then if txid = 1 is reemitted, it has to still contain exactly [A, B, C]. This change will break that guarantee, because if e.g. A, B have been deleted from Kafka, we'll hit the new code and txid = 1 might now contain [C, E, F] instead. 
    
    If you need to be able to handle offsets being deleted from Kafka (which is what you're trying to do with config.usestarttimeifoffsetoutofrange), you need to instead use an Opaque spout (https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/org/apache/storm/kafka/trident/OpaqueTridentKafkaSpout.java).


---

[GitHub] storm pull request #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu closed the pull request at:

    https://github.com/apache/storm/pull/2492


---

[GitHub] storm issue #2492: Update TridentKafkaEmitter.java

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @dujiashu Take a look in storm-kafka/target/checkstyle-violations.xml. 
    
    Please also raise an issue for this at https://issues.apache.org/jira, then rename this PR to contain the issue number (see the other active PRs for example). It helps us track bugfixes and lets us generate release notes. Thanks.


---

[GitHub] storm issue #2492: [STORM-2883] Fix storm-kafka trident API bug

Posted by dujiashu <gi...@git.apache.org>.
Github user dujiashu commented on the issue:

    https://github.com/apache/storm/pull/2492
  
    @srdo OK. This my first use the github with make PR. I understatnd it now.


---