You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Shanthoosh Venkataraman (JIRA)" <ji...@apache.org> on 2018/01/25 21:58:01 UTC

[jira] [Created] (SAMZA-1572) Add fixed retries on failure in KafkaCheckpointManager

Shanthoosh Venkataraman created SAMZA-1572:
----------------------------------------------

             Summary: Add fixed retries on failure in KafkaCheckpointManager
                 Key: SAMZA-1572
                 URL: https://issues.apache.org/jira/browse/SAMZA-1572
             Project: Samza
          Issue Type: Bug
            Reporter: Shanthoosh Venkataraman
            Assignee: Shanthoosh Venkataraman


KafkaCheckpointManager.writeCheckpoint currently goes into a infinite loop when an irrecoverable failure happens, this indefinitely blocks the commit phase (there by preventing processing). This exception is revealed only during the shutdown of the job making shutdown block indefinitely since the markers for shutdown are ignored by runloop which is blocked on commit phase.
{code:java}
2018/01/22 19:18:10.503 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Flush failed. One or more batches of messages were not sent. Retrying. 2018/01/22 19:18:10.604 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:10.804 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:11.204 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:12.005 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:13.605 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:16.805 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:23.205 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:33.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:43.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:18:53.206 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:03.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:13.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exceptio 2018/01/22 19:19:23.207 WARN [KafkaCheckpointManager]  [] Failed to write checkpoint log partition entry org.apache.samza.checkpoint.kafka.KafkaCheckpointLogKey@8148c5bb: org.apache.samza.system.SystemProducerException: Producer was unable to recover from previous exception.. Retrying.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)