You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Roesler (Jira)" <ji...@apache.org> on 2020/07/20 20:37:00 UTC

[jira] [Comment Edited] (KAFKA-10274) Transaction system test uses inconsistent timeouts

    [ https://issues.apache.org/jira/browse/KAFKA-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161519#comment-17161519 ] 

John Roesler edited comment on KAFKA-10274 at 7/20/20, 8:36 PM:
----------------------------------------------------------------

Hi [~hachikuji] ,

I'm trying to get a green system test build for the 2.5.1 release, and this test seems to be failing quite a bit in the last few days.

I see that you already fixed the test back in May in https://issues.apache.org/jira/browse/KAFKA-9802 for 2.5.1, and that you theorized that https://issues.apache.org/jira/browse/KAFKA-10235 may have re-introduced the test failure.

It doesn't look like KAFKA-10235 was backported to 2.5. Maybe it should have been, but then again, your last comment makes me think that we still need your current fix on top of it.

What do you think we should do here? Backport KAFKA-10235 and then the PR for this ticket once it's merged?

Thanks,

-John

 

PS, the results I looked at:

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-18--001.1595065230--confluentinc--2.5]–21e17cd14/report.html

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-19--001.1595151548--confluentinc--2.5]–21e17cd14/report.html

[http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-20--001.1595238538--confluentinc--2.5]–21e17cd14/report.html

from [https://jenkins.confluent.io/job/system-test-kafka/job/2.5/]


was (Author: vvcephei):
Hi [~hachikuji] ,

I'm trying to get a green system test build for the 2.5.1 release, and this test seems to be failing quite a bit in the last few days.

I see that you already fixed the test back in May in https://issues.apache.org/jira/browse/KAFKA-9802 for 2.5.1, and that you theorized that https://issues.apache.org/jira/browse/KAFKA-10235 may have re-introduced the test failure.

It doesn't look like KAFKA-10235 was backported to 2.5. Maybe it should have been, but then again, your last comment makes me think that we still need another fix on top of it.

What do you think we should do here?

Thanks,

-John

 

PS, the results I looked at:

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-18--001.1595065230--confluentinc--2.5–21e17cd14/report.html

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-19--001.1595151548--confluentinc--2.5–21e17cd14/report.html

http://confluent-kafka-2-5-system-test-results.s3-us-west-2.amazonaws.com/2020-07-20--001.1595238538--confluentinc--2.5–21e17cd14/report.html

from https://jenkins.confluent.io/job/system-test-kafka/job/2.5/

> Transaction system test uses inconsistent timeouts
> --------------------------------------------------
>
>                 Key: KAFKA-10274
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10274
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>
> We've seen some failures in the transaction system test with errors like the following:
> {code}
> copier-1 : Message copier didn't make enough progress in 30s. Current progress: 0
> {code}
> Looking at the consumer logs, we see the following messages repeating over and over:
> {code}
> [2020-07-14 06:50:21,466] DEBUG [Consumer clientId=consumer-transactions-test-consumer-group-1, groupId=transactions-test-consumer-group] Fetching committed offsets for partitions: [input-topic-1] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-07-14 06:50:21,468] DEBUG [Consumer clientId=consumer-transactions-test-consumer-group-1, groupId=transactions-test-consumer-group] Failed to fetch offset for partition input-topic-1: There are unstable offsets that need to be cleared. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> {code}
> I think the problem is that the test implicitly depends on the transaction timeout which has been configured to 40s even though it expects progress after 30s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)