You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openwhisk.apache.org by GitBox <gi...@apache.org> on 2018/10/18 14:25:03 UTC
[GitHub] dubee opened a new issue #4076: Invoker in unrecoverable down state

dubee opened a new issue #4076: Invoker in unrecoverable down state
URL: https://github.com/apache/incubator-openwhisk/issues/4076
 
 
   When an invoker fails to create/recreate a producer, the invoker requires manual intervention in order to recover. The logs below show that there might have been a network interruption that caused the health producer to disconnect unexpectedly. From there, the invoker tries to recreate the producer, but recreation fails leaving the invoker in a `down` state until someone reloads the invoker.
   
   Created this issue for documentation purposes at least. Not sure there is an actionable item here, unless we make the producer recreate retry indefinitely on failures.
   
   Invoker error logs:
   ```
   [2018-10-18T12:44:45.641Z] [ERROR] [#tid_sid_unknown] [KafkaProducerConnector] sending message on topic 'health' failed: The server disconnected before a response was received.
   [2018-10-18T12:45:20.808Z] [ERROR] [#tid_sid_unknown] [KafkaProducerConnector] sending message on topic 'health' failed: Expiring 1 record(s) for health-0: 35055 ms has passed since batch creation plus linger time
   [2018-10-18T12:45:20.856Z] [ERROR] [#tid_sid_unknown] [KafkaProducerConnector] creating producer failed: org.apache.kafka.common.KafkaException: Failed to construct kafka producer
   [2018-10-18T12:45:20.857Z] [ERROR] [#tid_sid_unknown] [Invoker] failed to ping the controller: org.apache.kafka.common.KafkaException: Failed to construct kafka producer
   [2018-10-18T12:45:33.379Z] [ERROR] [#tid_sid_unknown] [KafkaConsumerConnector] org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing the current consumed offsets: retrying 3 more times
   [2018-10-18T12:45:33.949Z] [ERROR] [Consumer clientId=consumer-1, groupId=invoker33] Offset commit failed on partition invoker33-0 at offset 603: The coordinator is not aware of this member.
   [2018-10-18T12:45:33.949Z] [ERROR] [Consumer clientId=consumer-1, groupId=invoker33] Offset commit failed on partition invoker33-0 at offset 603: The coordinator is not aware of this member.
   [2018-10-18T12:45:33.951Z] [ERROR] [#tid_sid_dispatcher] [MessageFeed] failed to commit activation consumer offset: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
   ```
   
   Related code line:
   https://github.com/apache/incubator-openwhisk/blob/c33b30a960bd23fe84cb75b75fcd2c1bc7447eac/common/scala/src/main/scala/whisk/connector/kafka/KafkaProducerConnector.scala#L109
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services