You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by srdo <gi...@git.apache.org> on 2016/08/02 17:23:35 UTC

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/1605

    STORM-2014: Put logic around dropping messages into RetryService, rem\u2026

    \u2026ove maxRetry setting from new KafkaSpout
    
    https://issues.apache.org/jira/browse/STORM-2014
    
    This PR removes maxRetry from the KafkaSpout and changes the RetryService interface slightly so the schedule method can communicate back to the spout that the message should be dropped. Retry logic belongs to the RetryService interface, and it's nice for users if they can easily plug in their own handling of messages that will be dropped (custom logging for example).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-2014

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1605
    
----
commit 432f932a70b0bd21f06cd6f36386404eab194e0a
Author: Stig Rohde D�ssing <st...@gmail.com>
Date:   2016-08-02T17:17:02Z

    STORM-2014: Put logic around dropping messages into RetryService, remove maxRetry setting from new KafkaSpout

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1605#discussion_r79570505
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryExponentialBackoff.java ---
    @@ -144,7 +145,12 @@ public String toString() {
         /**
          * The time stamp of the next retry is scheduled according to the exponential backoff formula ( geometric progression):
          * nextRetry = failCount == 1 ? currentTime + initialDelay : currentTime + delayPeriod^(failCount-1) where failCount = 1, 2, 3, ...
    -     * nextRetry = Min(nextRetry, currentTime + maxDelay)
    +     * nextRetry = Min(nextRetry, currentTime + maxDelay).
    +     * 
    +     * While retrying a record, no new records are committed until the previous polled records have been acked. This guarantees at once delivery of
    --- End diff --
    
    This comment should probably not go here, but rather somewhere in the Spout code. (retry service knows nothing about exactly once or at most once semantics)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    @hmcl ping. Any input on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    I am +1 overall. Please format the commit message to be easy to read and squash the two commits. This is a simple change that should have only one commit. We can merge after that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1605#discussion_r80004187
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java ---
    @@ -330,11 +328,9 @@ public void ack(Object messageId) {
         public void fail(Object messageId) {
             final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId;
             emitted.remove(msgId);
    -        if (msgId.numFails() < maxRetries) {
    -            msgId.incrementNumFails();
    -            retryService.schedule(msgId);
    -        } else { // limit to max number of retries
    -            LOG.debug("Reached maximum number of retries. Message [{}] being marked as acked.", msgId);
    +        msgId.incrementNumFails();
    +        if (!retryService.schedule(msgId)) {
    +            LOG.debug("Retry service indicated message should not be retried. Message [{}] being marked as acked.", msgId);
    --- End diff --
    
    @srdo I am not sure I completely follow your reasoning. The most meaningful piece of information that we want to show to the user here is that if the max number of retries has been reached, no more retries happen. If the user reads a message saying that the retry service marked it not to be retried, he still does not know the cause.
    
    Looking at the code, the `boolean schedule(msgId)` only returns false only if the max number of retries has been reached.
    
    Although the retry service log messages hints ant the max retries cap being reached, the user will have to have both logs enabled to have both spouts. I favor the message as it was before, as it is straight to the point and provides good insight on what is happening.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1605


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by harshach <gi...@git.apache.org>.
Github user harshach commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1605#discussion_r79572783
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutRetryService.java ---
    @@ -29,14 +29,18 @@
      */
     public interface KafkaSpoutRetryService extends Serializable {
         /**
    -     * Schedules this {@link KafkaSpoutMessageId} if not yet scheduled, or updates retry time if it has already been scheduled.
    +     * Schedules this {@link KafkaSpoutMessageId} if not yet scheduled, or
    +     * updates retry time if it has already been scheduled. May also indicate
    --- End diff --
    
    It may also indicate...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by hmcl <gi...@git.apache.org>.
Github user hmcl commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1605#discussion_r79571602
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java ---
    @@ -330,11 +328,9 @@ public void ack(Object messageId) {
         public void fail(Object messageId) {
             final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId;
             emitted.remove(msgId);
    -        if (msgId.numFails() < maxRetries) {
    -            msgId.incrementNumFails();
    -            retryService.schedule(msgId);
    -        } else { // limit to max number of retries
    -            LOG.debug("Reached maximum number of retries. Message [{}] being marked as acked.", msgId);
    +        msgId.incrementNumFails();
    +        if (!retryService.schedule(msgId)) {
    +            LOG.debug("Retry service indicated message should not be retried. Message [{}] being marked as acked.", msgId);
    --- End diff --
    
    I would leave the message as it was before. "Reached maximum number of retries. Message [{}] being marked as acked."
    
    It adds overhead and it is irrelevant to be mentioning the retry service here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1605: STORM-2014: Put logic around dropping messages int...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1605#discussion_r80023848
  
    --- Diff: external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java ---
    @@ -330,11 +328,9 @@ public void ack(Object messageId) {
         public void fail(Object messageId) {
             final KafkaSpoutMessageId msgId = (KafkaSpoutMessageId) messageId;
             emitted.remove(msgId);
    -        if (msgId.numFails() < maxRetries) {
    -            msgId.incrementNumFails();
    -            retryService.schedule(msgId);
    -        } else { // limit to max number of retries
    -            LOG.debug("Reached maximum number of retries. Message [{}] being marked as acked.", msgId);
    +        msgId.incrementNumFails();
    +        if (!retryService.schedule(msgId)) {
    +            LOG.debug("Retry service indicated message should not be retried. Message [{}] being marked as acked.", msgId);
    --- End diff --
    
    @hmcl I was referring to RetryService being an interface users can override, which doesn't mention max retries in the schedule javadoc, and this part of the code technically not "knowing about" max retries, because users may supply a retry service that doesn't schedule tuples for some other reason. I think you're right though, it was probably clearer before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by priyank5485 <gi...@git.apache.org>.
Github user priyank5485 commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    +1 (non-binding). I had talked to @hmcl briefly about this. We should have his review as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1605: STORM-2014: Put logic around dropping messages into Retry...

Posted by srdo <gi...@git.apache.org>.
Github user srdo commented on the issue:

    https://github.com/apache/storm/pull/1605
  
    Squashed. I hope this commit message is clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---