You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/07/21 07:02:00 UTC
[jira] [Resolved] (KAFKA-1970) Several tests are not stable

     [ https://issues.apache.org/jira/browse/KAFKA-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manikumar resolved KAFKA-1970.
------------------------------
    Resolution: Auto Closed

Closing inactive issue

> Several tests are not stable
> ----------------------------
>
>                 Key: KAFKA-1970
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1970
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Several:
> - RHEL7.1/x86_64  <=== My reference
> - RHEL7.0/x86_64
> - Ubuntu /x86_64
> - RHEL7.1/PPC64LE
> - RHEL7.0/PPC64BE
> - OpenJDK 1.7
> - IBM JVM 1.7
>            Reporter: Tony Reix
>            Priority: Major
>
> I'm porting Kafka 0.8.1 on RHEL 7.1/PPC64LE.
> Since it looked that tests were unstable, I've launched the tests on several environments, in order to have a wide view.
> I'm using:
>  -  ./gradlew build -x signArchives
> or:
>  -  ./gradlew test -x signArchives
> Results seem to show:
>   - Tests are unstable everywhere (very few on your Ubuntu/x86_64 test env)
>   - IBM JVM shows some more issues than OpenJDK
>   - Sometimes, tests are not lauched, with no reason.
>         But not on my reference environment (RHEL7.1/x86_64/OPenJDK)
> - Open JDK :                         Tests runs results:
>   - dorado-vm2 - RHEL7.0/x86_64 :
>                                 - 238 tests completed, 82 failed
>                                 - 238 tests completed, 94 failed
>                                 - BUILD SUCCESSFUL
>   - dorado-vm3 - Ubuntu /x86_64 :
>                                 - BUILD SUCCESSFUL
>   - soe01x     - RHEL7.1/x86_64 :
>                                 - 238 tests completed, 4 failed  x 2 times
>   - soe07-vm1  - RHEL7.1/PPC64LE:
>                                 - BUILD SUCCESSFUL
>                                 - 238 tests completed, 2 failed
>                                 - 238 tests completed, 3 failed
> - IBM JVM :                         Tests runs results:
>   - dorado-vm2 - RHEL7.0/x86_64 :
>                                 - 1 failed + Tests Blocked
>                                 - BUILD SUCCESSFUL
>   - soe01x     - RHEL7.1/x86_64 :
>                                 - 238 tests completed, 6 failed
>                                 - 238 tests completed, 4 failed  x 3 times
>                                 - 238 tests completed, 5 failed
>   - soe07-vm1  - RHEL7.1/PPC64LE:
>                                 - 238 tests completed, 1 failed
>                                 - BUILD SUCCESSFUL
>   - laurel6    - RHEL7.0/PPC64BE:
>                                 - 238 tests completed, 1 failed
>                                 - BUILD SUCCESSFUL
> =========================================================
> I think that these tests are unstable:
> kafka.server.LogRecoveryTest      > testHWCheckpointNoFailuresMultipleLogSegments
> kafka.server.LogRecoveryTest      > testHWCheckpointWithFailuresMultipleLogSegments
> kafka.admin.DeleteTopicTest       > testAutoCreateAfterDeleteTopic
> kafka.admin.DeleteTopicTest       > testPreferredReplicaElectionDuringDeleteTopic
> kafka.server.RequestPurgatoryTest > testRequestExpiry
> =========================================================
> These tests are failing often (always on my reference environment (RHEL7.1/x86_64/OpenJDK), but not on Ubuntu) :
> kafka.server.LogOffsetTest        > testEmptyLogsGetOffsets
> kafka.server.LogOffsetTest        > testGetOffsetsBeforeLatestTime
> kafka.server.LogOffsetTest        > testGetOffsetsBeforeEarliestTime
> kafka.server.LogOffsetTest        > testGetOffsetsBeforeNow 
> =========================================================
> As an example or random failures, on my reference environment (RHEL7.1/x86_64/OpenJDK) , the test:
>   kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLogSegments
> failed 2 times out of 12.
> =========================================================
> On Ubuntu/x86_64/OpenJDK , out of 3 runs of :
>    gradlew test -x signArchive
> I've got:
>  - 3 Full success
>  - 1 launch that did NOT run the tests
> =========================================================
> Still on x86_64/OpenJDK , I'm surprised to always have 4 failures with RHEL 7.1 and none on Ubuntu.
> Some issue within RHEL 7.1 and/or Java ?
> =========================================================
> On RHEL 7.1 / PPC64LE / IBM JVM, I see a wide unstability.
> I've run 12 tests.
> Parsing them about "FAILED" tests with:
>  for i in 1 2 10 11 12 13 14 15 16 17 18 19; do $i; N=`grep FAILED gradlew.build.IBMJVM.res$i | wc -l`; echo $i":"$N; done 
> gave:
> 1:3
> 2:0
> 10:0
> 11:3
> 12:4
> 13:49
> 14:3
> 15:4
> 16:0
> 17:0
> 18:0
> 19:0
> Doing the same about "PASSED" tests, I've got:
> 1:372
> 2:238
> 10:0
> 11:372
> 12:236
> 13:191
> 14:237
> 15:236
> 16:238
> 17:0
> 18:0
> 19:0
> Showing:
>  - 4 launches did NOT run the tests
>  - 2 launches were SUCCESSFUL
>  - for the others, there were 3, 4 or 49 FAILED tests
> =========================================================
> Conclusions: I think that it would be useful for Kafka project:
> - to run tests with IBM JVM in addition to OpenJDK.
> - to run tests on a different Linux distrib than Ubuntu: RHEL .
> - to check (by running it many times) that the following test is stable in your standard test environment:
>    kafka.server.LogRecoveryTest > testHWCheckpointNoFailuresMultipleLogSegments
> On my side, there are other causes of unstability in my specific environments (PPC64) that I have to study.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)