You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by jg...@apache.org on 2019/06/08 00:08:00 UTC

[kafka] branch 2.0 updated (42fd2c3 -> 1e7ef95)

This is an automated email from the ASF dual-hosted git repository.

jgus pushed a change to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git.


    from 42fd2c3  KAFKA-8404: Add HttpHeader to RestClient HTTP Request and Connector REST API (#6791)
     new c717ff1  MINOR: Lower producer throughput in flaky upgrade system test
     new 1e7ef95  MINOR: Fix race condition on shutdown of verifiable producer

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 tests/kafkatest/services/verifiable_producer.py | 6 +++++-
 tests/kafkatest/tests/core/upgrade_test.py      | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)


[kafka] 02/02: MINOR: Fix race condition on shutdown of verifiable producer

Posted by jg...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jgus pushed a commit to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git

commit 1e7ef95b81579b91fb6e9a4a90680fcd08ec9822
Author: Jason Gustafson <ja...@confluent.io>
AuthorDate: Fri Jun 7 16:56:21 2019 -0700

    MINOR: Fix race condition on shutdown of verifiable producer
    
    We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following:
    ```
    RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process
    ```
    The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail.
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Gwen Shapira
    
    Closes #6906 from hachikuji/fix-failing-replicat-verification-test
---
 tests/kafkatest/services/verifiable_producer.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/kafkatest/services/verifiable_producer.py b/tests/kafkatest/services/verifiable_producer.py
index 5c7152d..fdca54f 100644
--- a/tests/kafkatest/services/verifiable_producer.py
+++ b/tests/kafkatest/services/verifiable_producer.py
@@ -254,7 +254,11 @@ class VerifiableProducer(KafkaPathResolverMixin, VerifiableClientMixin, Backgrou
             return True
 
     def stop_node(self, node):
-        self.kill_node(node, clean_shutdown=True, allow_fail=False)
+        # There is a race condition on shutdown if using `max_messages` since the
+        # VerifiableProducer will shutdown automatically when all messages have been
+        # written. In this case, the process will be gone and the signal will fail.
+        allow_fail = self.max_messages > 0
+        self.kill_node(node, clean_shutdown=True, allow_fail=allow_fail)
 
         stopped = self.wait_node(node, timeout_sec=self.stop_timeout_sec)
         assert stopped, "Node %s: did not stop within the specified timeout of %s seconds" % \


[kafka] 01/02: MINOR: Lower producer throughput in flaky upgrade system test

Posted by jg...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jgus pushed a commit to branch 2.0
in repository https://gitbox.apache.org/repos/asf/kafka.git

commit c717ff11f86b698c3e22ffa05a6df684787ee7a6
Author: Jason Gustafson <ja...@confluent.io>
AuthorDate: Fri Jun 7 16:53:50 2019 -0700

    MINOR: Lower producer throughput in flaky upgrade system test
    
    We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output:
    
    ```
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null}
    {"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null}
    ```
    This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout.
    ```
    kafka.consumer.ConsumerTimeoutException
    at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50)
    at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109)
    at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
    at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
    at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
    ```
    The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression.
    
    Author: Jason Gustafson <ja...@confluent.io>
    
    Reviewers: Gwen Shapira
    
    Closes #6907 from hachikuji/lower-throughput-for-upgrade-test
---
 tests/kafkatest/tests/core/upgrade_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/kafkatest/tests/core/upgrade_test.py b/tests/kafkatest/tests/core/upgrade_test.py
index c9236ee..bfbb5b3 100644
--- a/tests/kafkatest/tests/core/upgrade_test.py
+++ b/tests/kafkatest/tests/core/upgrade_test.py
@@ -36,7 +36,7 @@ class TestUpgrade(ProduceConsumeValidateTest):
         self.zk.start()
 
         # Producer and consumer
-        self.producer_throughput = 10000
+        self.producer_throughput = 1000
         self.num_producers = 1
         self.num_consumers = 1