You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Tony Reix (JIRA)" <ji...@apache.org> on 2015/03/03 15:19:04 UTC

[jira] [Commented] (FLUME-2625) There are several unstable tests within FLUME

    [ https://issues.apache.org/jira/browse/FLUME-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345103#comment-14345103 ] 

Tony Reix commented on FLUME-2625:
----------------------------------

Hi Hari, As I said in another JIRA, I have no specific skills about Flume. I'm mainly involved in testing on PPC64, thus warning community about unstability. As a non-expert of FLUME, finding tests that fail on IBMJVM/PPC64 (much more often than on OpenJDK, or always, due to internal differences (faster JVM, or GC working differently)) and not (or once about 30 or more runs) on openJDK/x86_64 is a pain. Warning Flume people was aimed to help. However, I have no useful skills for this and I'm involved in new testings now.

> There are several unstable tests within FLUME
> ---------------------------------------------
>
>                 Key: FLUME-2625
>                 URL: https://issues.apache.org/jira/browse/FLUME-2625
>             Project: Flume
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: v1.5.0.1
>         Environment: RHEL 7.1 / x86_64 / Open JDK 1.7
>            Reporter: Tony Reix
>
> Hi,
> I'm working on porting FLUME in a RHEL 7.1 / PPC64LE / IBM JVM 1.7 environment.
> As an example, I've found that the test .source.TestSyslogUdpSource fails, but not always, only 7 times out of 10 tries. Testing on RHEL 7.1 / x86_64 / IBM JVM, I've also had random failures.
> Running the same .source.TestSyslogUdpSource test in RHEL 7.1 / x86_64 / Open JDK 1.7 environment, I've found that this test fails only once out of 30 tries: it is an "unstable" test.
> In order to find which test issues are specific to PPC64 or IBMJVM environment, I've run 10 times all the FLUME tests in the RHEL 7.1 / x86_64 / Open JDK 1.7 environment, which I call my "reference" environment.
> Then, using a tool that compares all the results, I've found that there are 16 tests that are "unstable" in my "reference" (x86_64/OpenJDK) .
> By "unstable", I mean to say that the results vary, though the environment is exactly the same.
> These tests are:
> .api.TestLoadBalancingRpcClient
> .api.TestThriftRpcClient
> .channel.file.TestFileChannelRestart
> .channel.TestSpillableMemoryChannel
> .instrumentation.http.TestHTTPMetricsServer
> .sink.TestAvroSink
> .sink.TestThriftSink
> .source.avroLegacy.TestLegacyAvroSource
> .source.http.TestHTTPSource
> .source.TestAvroSource
> .source.TestExecSource
> .source.TestMultiportSyslogTCPSource
> .source.TestSyslogTcpSource
> .source.TestSyslogUdpSource
> .source.TestThriftSource
> .source.thriftLegacy.TestThriftLegacySource
> About ".source.TestSyslogUdpSource" test, my analysis is that the test code is not reliable since the test checks that some data is correct without checking that all the "messages" have arrived (sometimes, a message has not arrived in time, and a reference is NULL).
> Adding "sleep(1000) to the test with IBM JVM, the test then failed only 3 times out of 10.
> So, I think that several FLUME tests are coded in a way that is not 100% reliable. Or it could also be that some core code of FLUME is not 100% reliable.
> I mean to say that some code may have been written based on the specific behaviour of the OpenJDK Java Virtual Machine, which was used for testing. Some change about how the order of threads are launched, or about the time needed to send messages in the JVM/OS, may lead to issues that are not correctly handled by the code (mainly test code, but maybe core code too). And it seems that, though being perfectly correct, the IBM JVM does not work the same way compared to OpenJDK.
> So, this is a pain. Mainly in my PPC64LE/IBMJVM environment.
> I think that these 16 tests must be analysed and improved.
> Also, running tests with OpenJDK  AND  IBM JVM in your development and test/Jenkins environments would help to see these random issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)