You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by jianbzhou <gi...@git.apache.org> on 2017/01/03 13:26:18 UTC

[GitHub] storm issue #1131: STORM-822: Kafka Spout New Consumer API

Github user jianbzhou commented on the issue:

    https://github.com/apache/storm/pull/1131
  
    @hmcl and all, we have communicated via email for a while and going forward let's talk in this thread so everyone is in same page.
    Base on the spout from the community(written by you), we have several fixes and it worked quite stable in our production for about 6 months.
    
    We want to share the latest spout to you and could you please kindly help review and merge to the community version if any fix is reasonable? we want to avoid diverging too much from the community version.
    
    Below are our major fixes:
    1.	For failed message, in next tuple method, originally the spout seek back to the non-continuous offset, so the failed message will be polled again for retry, say we seek back to message 10 for retry, now if kafka log file was purged, earliest offset is 1000, it means we will seek to 10 but reset to 1000 as per the reset policy, and we cannot poll the message 10, so spout not work.
    Our fix is: we manually catch the out of range exception, commit the offset to earliest offset first, then seek to the earliest offset
    
    2.	Currently the way to find next committed offset is very complex, under some edge cases \u2013 a), if no message acked back because bolt has some issue or cannot catch up with the spout emit; b) seek back is happened frequently and it is much faster than the message be acked back
    We give each message a status \u2013 None, emit, acked, failed(if failed number is bigger than the maximum retry, set to acked)
    3.	One of our use cases need ordering in partition level, so after seek back for retry, we re-emit all the follow messages again no matter they have emitted or not, if possible, maybe you can give an option here to configure it \u2013 either re-emit all the message from the failed one, or just emit the failed one, same as current version.
    4.	We record the message count for acked, failed, emitted, just for statistics.
    
    Could you please kindly help review and let us know if you can merge it into the community version? Any comments/concern pls feel free to let us know. Btw, I just send the latest code to you via email. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---