You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Tristan Stevens <tr...@apache.org> on 2022/01/24 07:18:27 UTC

Flume failing travis builds

Hi all,
It seems that for some reason the Travis builds are failing again. One of them has been since the Log4j and SLF4J bump (odd) and the other since the Kafka upgrade.

Anybody got some cycles in investigate whether these are just flaky tests and/or whether there’s something more sinister in there?

Thanks
Tristan


Re: Flume failing travis builds

Posted by Ralph Goers <ra...@dslextreme.com>.
I will take a look at them since I am actively working on Flume. However, I can say that I have low confidence in those builds. Log4j has been having problems with its builds. Frequently it seems the problem is that the build is successful for a module but the build fails because the tooling can’t read the files produced by surefire.

Ralph

> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org> wrote:
> 
> Hi all,
> It seems that for some reason the Travis builds are failing again. One of them has been since the Log4j and SLF4J bump (odd) and the other since the Kafka upgrade.
> 
> Anybody got some cycles in investigate whether these are just flaky tests and/or whether there’s something more sinister in there?
> 
> Thanks
> Tristan
> 


Re: Flume failing travis builds

Posted by Ralph Goers <ra...@dslextreme.com>.
I moved the CI build to GitHub Actions and was able to get the build for both MacOS and Linux to complete successfully.

Now I will take a look at what other dependencies need upgrading.

Ralph

> On Jan 27, 2022, at 10:24 PM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> I am going to admit I am becoming very annoyed with these Travis builds. 
> 
> For one, I have looked at the build history and as far as I can tell none of them have 
> ever worked. Several of them have check marks on them but when you look at the job 
> log you will see they failed.
> 
> Next, it appears that the build is running using the command 
> 
> ./mvnw test —quiet --fail-fast --threads 2.0C
> 
> When I run that locally (without the —quiet) the build also fails, but differently than how 
> Travis does.  I see the output below. You will notice that stuff isn’t running in the order listed by the reactor. 
> I suspect that this may be caused by running the build in parallel. In fact, when the command started I saw
> 
> [WARNING] The following plugins are not marked @threadSafe in Flume NG SDK:
> [INFO] --- maven-remote-resources-plugin:1.7.0:process (process-resource-bundles) @ flume-ng-clients ---
> [WARNING] com.thoughtworks.paranamer:paranamer-maven-plugin:2.8
> [WARNING] Enable debug to see more precisely which goals are not marked @threadSafe.
> [WARNING] *****************************************************************
> 
> I then reran it locally without the —threads option and it completed successfully so I am starting to think 
> some of the weirdness is due to running the build in parallel. I have committed a change to remove that 
> option and enabled debug logging to the travis log so I can have more information if/when it fails.
> 
> 
> [ERROR] Failures: 
> [ERROR]   TestKafkaSink.testStaticTopic:214->checkMessageArrived:195 No message matches static-topic-test
> [INFO] 
> [ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Build Support ...................................... SUCCESS [  0.555 s]
> [INFO] Apache Flume 1.10.0-SNAPSHOT ....................... SUCCESS [  0.751 s]
> [INFO] Flume NG SDK ....................................... SUCCESS [07:24 min]
> [INFO] Flume NG Hadoop Credential Store Config Filter ..... SUCCESS [  0.058 s]
> [INFO] Flume NG Config Filters API ........................ SUCCESS [  0.302 s]
> [INFO] Flume NG Configuration ............................. SUCCESS [  5.684 s]
> [INFO] Flume Auth ......................................... SUCCESS [  6.836 s]
> [INFO] Flume NG Core ...................................... SUCCESS [13:39 min]
> [INFO] Flume NG Sinks ..................................... SUCCESS [  0.036 s]
> [INFO] Flume NG HDFS Sink ................................. SKIPPED
> [INFO] Flume NG IRC Sink .................................. SUCCESS [ 12.079 s]
> [INFO] Flume NG Channels .................................. SUCCESS [  0.060 s]
> [INFO] Flume NG JDBC channel .............................. SUCCESS [ 53.409 s]
> [INFO] Flume NG file-based channel ........................ SKIPPED
> [INFO] Flume NG Spillable Memory channel .................. SKIPPED
> [INFO] Flume NG Node ...................................... SKIPPED
> [INFO] Flume NG Embedded Agent ............................ SKIPPED
> [INFO] Flume NG HBase Sink ................................ SKIPPED
> [INFO] Flume NG HBase2 Sink ............................... SUCCESS [01:48 min]
> [INFO] Flume NG ElasticSearch Sink ........................ SUCCESS [01:28 min]
> [INFO] Flume NG Morphline Solr Sink ....................... SUCCESS [ 43.347 s]
> [INFO] Flume Shared Utils ................................. SUCCESS [  0.056 s]
> [INFO] Flume Shared Kafka ................................. SUCCESS [  1.046 s]
> [INFO] Flume Shared Kafka Test Utils ...................... SUCCESS [ 11.672 s]
> [INFO] Flume Kafka Sink ................................... FAILURE [02:11 min]
> [INFO] Flume HTTP/S Sink .................................. SUCCESS [ 31.688 s]
> [INFO] Flume NG Hive Sink ................................. SUCCESS [01:13 min]
> [INFO] Flume Sources ...................................... SUCCESS [  0.035 s]
> [INFO] Flume Scribe Source ................................ SUCCESS [ 18.953 s]
> [INFO] Flume JMS Source ................................... SUCCESS [01:06 min]
> [INFO] Flume Twitter Source ............................... SUCCESS [ 11.377 s]
> [INFO] Flume Kafka Source ................................. SKIPPED
> [INFO] Flume Taildir Source ............................... SUCCESS [ 29.437 s]
> [INFO] flume-kafka-channel ................................ SKIPPED
> [INFO] Flume legacy Sources ............................... SUCCESS [  0.061 s]
> [INFO] Flume legacy Avro source ........................... SUCCESS [ 18.226 s]
> [INFO] Flume legacy Thrift Source ......................... SUCCESS [ 13.142 s]
> [INFO] Flume NG Environment Variable Config Filter ........ SUCCESS [  1.492 s]
> [INFO] flume-ng-hadoop-credential-store-config-filter ..... SUCCESS [  3.651 s]
> [INFO] Flume NG External Process Config Filter ............ SUCCESS [  1.507 s]
> [INFO] Flume NG Clients ................................... SUCCESS [  0.058 s]
> [INFO] Flume NG Log4j Appender ............................ SKIPPED
> [INFO] Flume NG Tools ..................................... SKIPPED
> [INFO] Flume NG distribution .............................. SKIPPED
> [INFO] Flume NG Integration Tests 1.10.0-SNAPSHOT ......... SUCCESS [ 12.521 s]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 23:36 min (Wall Clock)
> [INFO] Finished at: 2022-01-27T15:55:00-07:00
> 
> 
> Ralph
> 
>> On Jan 26, 2022, at 5:04 AM, Apache <ra...@dslextreme.com> wrote:
>> 
>> I just noticed that the Travis builds do not rebuild the whole project. That is definitely the reason for one of the build failures and probably both. As it stands these builds are worthless.
>> 
>> Ralph
>> 
>>> On Jan 26, 2022, at 1:24 AM, Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> The last change to ExecSource was in June 2018. 
>>> 
>>> I am not sure that what Travis is picking up is valid.  The problem I am having 
>>> with TestKafkaSink is that I have modified the validation method such that every 
>>> assertion error should generate a custom message. Yet none do. And, in fact, 
>>> in the code path it should be following it should never perform an assertEquals 
>>> where expected and actual should have a value of 10.
>>> 
>>> The second set of test failures are all because the test expects the local hostname 
>>> to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of ip6-localhost. 
>>> This is despite my having configured the surefire plugin to be configured with 
>>> java.net.preferIPv4Stack=true.
>>> 
>>> There are two more errors in TestReliableSpoolingEventReader. For this it looks 
>>> like events somehow occur in a different sequence on the Travis server then they 
>>> do on my Mac and my Linux VM. Note that this is also a class that was not touched.
>>> 
>>> Ralph
>>> 
>>>> On Jan 25, 2022, at 3:03 PM, Tristan Stevens <tr...@apache.org> wrote:
>>>> 
>>>> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
>>>> reassuring. Curious about execsource although I've got a feeling that did
>>>> change since the last release?????
>>>> 
>>>> Tristan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Tue, 25 Jan 2022, 19:03 Ralph Goers, <ra...@dslextreme.com> wrote:
>>>>> 
>>>>> There seem to be two builds running and both fail but fail in different
>>>>> places.
>>>>> 
>>>>> The first build seems to be failing in a way it shouldn’t. The test is for
>>>>> not specifying any Kafka partitions.
>>>>> The behavior of how Kafka handles this changed in version 2.4 so it should
>>>>> only be checking to see if it
>>>>> received all the evants, but it appears it is somehow in the logic to
>>>>> check that all the partitions have an
>>>>> equal number of events. I’ve added more info into the assert message to
>>>>> help diagnose this.
>>>>> 
>>>>> The second build is failing in changes I just made to upgrade Netty &
>>>>> Avro. It appears to be failing
>>>>> checking the local host name. I will have to add some info to the error to
>>>>> determine what it getting for a
>>>>> hostname.
>>>>> 
>>>>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>>>>> TestExecSource (which hasn’t
>>>>> been changed). It seems it is calling process.waitFor() and getting a
>>>>> returned value of 1. I changed the
>>>>> test to call waitFor before calling destroy and it passed. It then failed
>>>>> in TestFileChannelRestart giving me
>>>>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>>>>> interval should be increased.
>>>>> I added logic to retry in this situation but there is a unit test that
>>>>> tries to force that error so I had to have
>>>>> it bypass the fix in that case.
>>>>> 
>>>>> I committed those changes and will look at the results of the next Travis
>>>>> build to see what additional info
>>>>> it can provide.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> It seems that for some reason the Travis builds are failing again. One
>>>>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>>>>> the Kafka upgrade.
>>>>>> 
>>>>>> Anybody got some cycles in investigate whether these are just flaky
>>>>> tests and/or whether there’s something more sinister in there?
>>>>>> 
>>>>>> Thanks
>>>>>> Tristan
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
> 


Re: Flume failing travis builds

Posted by Ralph Goers <ra...@dslextreme.com>.
I am going to admit I am becoming very annoyed with these Travis builds. 

For one, I have looked at the build history and as far as I can tell none of them have 
ever worked. Several of them have check marks on them but when you look at the job 
log you will see they failed.

Next, it appears that the build is running using the command 

./mvnw test —quiet --fail-fast --threads 2.0C

When I run that locally (without the —quiet) the build also fails, but differently than how 
Travis does.  I see the output below. You will notice that stuff isn’t running in the order listed by the reactor. 
I suspect that this may be caused by running the build in parallel. In fact, when the command started I saw

[WARNING] The following plugins are not marked @threadSafe in Flume NG SDK:
[INFO] --- maven-remote-resources-plugin:1.7.0:process (process-resource-bundles) @ flume-ng-clients ---
[WARNING] com.thoughtworks.paranamer:paranamer-maven-plugin:2.8
[WARNING] Enable debug to see more precisely which goals are not marked @threadSafe.
[WARNING] *****************************************************************

I then reran it locally without the —threads option and it completed successfully so I am starting to think 
some of the weirdness is due to running the build in parallel. I have committed a change to remove that 
option and enabled debug logging to the travis log so I can have more information if/when it fails.


[ERROR] Failures: 
[ERROR]   TestKafkaSink.testStaticTopic:214->checkMessageArrived:195 No message matches static-topic-test
[INFO] 
[ERROR] Tests run: 18, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Build Support ...................................... SUCCESS [  0.555 s]
[INFO] Apache Flume 1.10.0-SNAPSHOT ....................... SUCCESS [  0.751 s]
[INFO] Flume NG SDK ....................................... SUCCESS [07:24 min]
[INFO] Flume NG Hadoop Credential Store Config Filter ..... SUCCESS [  0.058 s]
[INFO] Flume NG Config Filters API ........................ SUCCESS [  0.302 s]
[INFO] Flume NG Configuration ............................. SUCCESS [  5.684 s]
[INFO] Flume Auth ......................................... SUCCESS [  6.836 s]
[INFO] Flume NG Core ...................................... SUCCESS [13:39 min]
[INFO] Flume NG Sinks ..................................... SUCCESS [  0.036 s]
[INFO] Flume NG HDFS Sink ................................. SKIPPED
[INFO] Flume NG IRC Sink .................................. SUCCESS [ 12.079 s]
[INFO] Flume NG Channels .................................. SUCCESS [  0.060 s]
[INFO] Flume NG JDBC channel .............................. SUCCESS [ 53.409 s]
[INFO] Flume NG file-based channel ........................ SKIPPED
[INFO] Flume NG Spillable Memory channel .................. SKIPPED
[INFO] Flume NG Node ...................................... SKIPPED
[INFO] Flume NG Embedded Agent ............................ SKIPPED
[INFO] Flume NG HBase Sink ................................ SKIPPED
[INFO] Flume NG HBase2 Sink ............................... SUCCESS [01:48 min]
[INFO] Flume NG ElasticSearch Sink ........................ SUCCESS [01:28 min]
[INFO] Flume NG Morphline Solr Sink ....................... SUCCESS [ 43.347 s]
[INFO] Flume Shared Utils ................................. SUCCESS [  0.056 s]
[INFO] Flume Shared Kafka ................................. SUCCESS [  1.046 s]
[INFO] Flume Shared Kafka Test Utils ...................... SUCCESS [ 11.672 s]
[INFO] Flume Kafka Sink ................................... FAILURE [02:11 min]
[INFO] Flume HTTP/S Sink .................................. SUCCESS [ 31.688 s]
[INFO] Flume NG Hive Sink ................................. SUCCESS [01:13 min]
[INFO] Flume Sources ...................................... SUCCESS [  0.035 s]
[INFO] Flume Scribe Source ................................ SUCCESS [ 18.953 s]
[INFO] Flume JMS Source ................................... SUCCESS [01:06 min]
[INFO] Flume Twitter Source ............................... SUCCESS [ 11.377 s]
[INFO] Flume Kafka Source ................................. SKIPPED
[INFO] Flume Taildir Source ............................... SUCCESS [ 29.437 s]
[INFO] flume-kafka-channel ................................ SKIPPED
[INFO] Flume legacy Sources ............................... SUCCESS [  0.061 s]
[INFO] Flume legacy Avro source ........................... SUCCESS [ 18.226 s]
[INFO] Flume legacy Thrift Source ......................... SUCCESS [ 13.142 s]
[INFO] Flume NG Environment Variable Config Filter ........ SUCCESS [  1.492 s]
[INFO] flume-ng-hadoop-credential-store-config-filter ..... SUCCESS [  3.651 s]
[INFO] Flume NG External Process Config Filter ............ SUCCESS [  1.507 s]
[INFO] Flume NG Clients ................................... SUCCESS [  0.058 s]
[INFO] Flume NG Log4j Appender ............................ SKIPPED
[INFO] Flume NG Tools ..................................... SKIPPED
[INFO] Flume NG distribution .............................. SKIPPED
[INFO] Flume NG Integration Tests 1.10.0-SNAPSHOT ......... SUCCESS [ 12.521 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23:36 min (Wall Clock)
[INFO] Finished at: 2022-01-27T15:55:00-07:00


Ralph

> On Jan 26, 2022, at 5:04 AM, Apache <ra...@dslextreme.com> wrote:
> 
> I just noticed that the Travis builds do not rebuild the whole project. That is definitely the reason for one of the build failures and probably both. As it stands these builds are worthless.
> 
> Ralph
> 
>> On Jan 26, 2022, at 1:24 AM, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> The last change to ExecSource was in June 2018. 
>> 
>> I am not sure that what Travis is picking up is valid.  The problem I am having 
>> with TestKafkaSink is that I have modified the validation method such that every 
>> assertion error should generate a custom message. Yet none do. And, in fact, 
>> in the code path it should be following it should never perform an assertEquals 
>> where expected and actual should have a value of 10.
>> 
>> The second set of test failures are all because the test expects the local hostname 
>> to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of ip6-localhost. 
>> This is despite my having configured the surefire plugin to be configured with 
>> java.net.preferIPv4Stack=true.
>> 
>> There are two more errors in TestReliableSpoolingEventReader. For this it looks 
>> like events somehow occur in a different sequence on the Travis server then they 
>> do on my Mac and my Linux VM. Note that this is also a class that was not touched.
>> 
>> Ralph
>> 
>>> On Jan 25, 2022, at 3:03 PM, Tristan Stevens <tr...@apache.org> wrote:
>>> 
>>> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
>>> reassuring. Curious about execsource although I've got a feeling that did
>>> change since the last release?????
>>> 
>>> Tristan
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Tue, 25 Jan 2022, 19:03 Ralph Goers, <ra...@dslextreme.com> wrote:
>>>> 
>>>> There seem to be two builds running and both fail but fail in different
>>>> places.
>>>> 
>>>> The first build seems to be failing in a way it shouldn’t. The test is for
>>>> not specifying any Kafka partitions.
>>>> The behavior of how Kafka handles this changed in version 2.4 so it should
>>>> only be checking to see if it
>>>> received all the evants, but it appears it is somehow in the logic to
>>>> check that all the partitions have an
>>>> equal number of events. I’ve added more info into the assert message to
>>>> help diagnose this.
>>>> 
>>>> The second build is failing in changes I just made to upgrade Netty &
>>>> Avro. It appears to be failing
>>>> checking the local host name. I will have to add some info to the error to
>>>> determine what it getting for a
>>>> hostname.
>>>> 
>>>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>>>> TestExecSource (which hasn’t
>>>> been changed). It seems it is calling process.waitFor() and getting a
>>>> returned value of 1. I changed the
>>>> test to call waitFor before calling destroy and it passed. It then failed
>>>> in TestFileChannelRestart giving me
>>>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>>>> interval should be increased.
>>>> I added logic to retry in this situation but there is a unit test that
>>>> tries to force that error so I had to have
>>>> it bypass the fix in that case.
>>>> 
>>>> I committed those changes and will look at the results of the next Travis
>>>> build to see what additional info
>>>> it can provide.
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>>> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org>
>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> It seems that for some reason the Travis builds are failing again. One
>>>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>>>> the Kafka upgrade.
>>>>> 
>>>>> Anybody got some cycles in investigate whether these are just flaky
>>>> tests and/or whether there’s something more sinister in there?
>>>>> 
>>>>> Thanks
>>>>> Tristan
>>>>> 
>>>> 
>>>> 
>> 
> 


Re: Flume failing travis builds

Posted by Apache <ra...@dslextreme.com>.
I just noticed that the Travis builds do not rebuild the whole project. That is definitely the reason for one of the build failures and probably both. As it stands these builds are worthless.

Ralph

> On Jan 26, 2022, at 1:24 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> The last change to ExecSource was in June 2018. 
> 
> I am not sure that what Travis is picking up is valid.  The problem I am having 
> with TestKafkaSink is that I have modified the validation method such that every 
> assertion error should generate a custom message. Yet none do. And, in fact, 
> in the code path it should be following it should never perform an assertEquals 
> where expected and actual should have a value of 10.
> 
> The second set of test failures are all because the test expects the local hostname 
> to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of ip6-localhost. 
> This is despite my having configured the surefire plugin to be configured with 
> java.net.preferIPv4Stack=true.
> 
> There are two more errors in TestReliableSpoolingEventReader. For this it looks 
> like events somehow occur in a different sequence on the Travis server then they 
> do on my Mac and my Linux VM. Note that this is also a class that was not touched.
> 
> Ralph
> 
>> On Jan 25, 2022, at 3:03 PM, Tristan Stevens <tr...@apache.org> wrote:
>> 
>> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
>> reassuring. Curious about execsource although I've got a feeling that did
>> change since the last release?????
>> 
>> Tristan
>> 
>> 
>> 
>> 
>> 
>>> On Tue, 25 Jan 2022, 19:03 Ralph Goers, <ra...@dslextreme.com> wrote:
>>> 
>>> There seem to be two builds running and both fail but fail in different
>>> places.
>>> 
>>> The first build seems to be failing in a way it shouldn’t. The test is for
>>> not specifying any Kafka partitions.
>>> The behavior of how Kafka handles this changed in version 2.4 so it should
>>> only be checking to see if it
>>> received all the evants, but it appears it is somehow in the logic to
>>> check that all the partitions have an
>>> equal number of events. I’ve added more info into the assert message to
>>> help diagnose this.
>>> 
>>> The second build is failing in changes I just made to upgrade Netty &
>>> Avro. It appears to be failing
>>> checking the local host name. I will have to add some info to the error to
>>> determine what it getting for a
>>> hostname.
>>> 
>>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>>> TestExecSource (which hasn’t
>>> been changed). It seems it is calling process.waitFor() and getting a
>>> returned value of 1. I changed the
>>> test to call waitFor before calling destroy and it passed. It then failed
>>> in TestFileChannelRestart giving me
>>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>>> interval should be increased.
>>> I added logic to retry in this situation but there is a unit test that
>>> tries to force that error so I had to have
>>> it bypass the fix in that case.
>>> 
>>> I committed those changes and will look at the results of the next Travis
>>> build to see what additional info
>>> it can provide.
>>> 
>>> Ralph
>>> 
>>> 
>>>> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org>
>>> wrote:
>>>> 
>>>> Hi all,
>>>> It seems that for some reason the Travis builds are failing again. One
>>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>>> the Kafka upgrade.
>>>> 
>>>> Anybody got some cycles in investigate whether these are just flaky
>>> tests and/or whether there’s something more sinister in there?
>>>> 
>>>> Thanks
>>>> Tristan
>>>> 
>>> 
>>> 
> 


Re: Flume failing travis builds

Posted by Ralph Goers <ra...@dslextreme.com>.
The last change to ExecSource was in June 2018. 

I am not sure that what Travis is picking up is valid.  The problem I am having 
with TestKafkaSink is that I have modified the validation method such that every 
assertion error should generate a custom message. Yet none do. And, in fact, 
in the code path it should be following it should never perform an assertEquals 
where expected and actual should have a value of 10.

The second set of test failures are all because the test expects the local hostname 
to be either “localhost” or “127.0.0.1”. Instead, it is getting a value of ip6-localhost. 
This is despite my having configured the surefire plugin to be configured with 
java.net.preferIPv4Stack=true.

There are two more errors in TestReliableSpoolingEventReader. For this it looks 
like events somehow occur in a different sequence on the Travis server then they 
do on my Mac and my Linux VM. Note that this is also a class that was not touched.

Ralph

> On Jan 25, 2022, at 3:03 PM, Tristan Stevens <tr...@apache.org> wrote:
> 
> Thanks Ralph. Seems the unit tests are picking up valid problems, which is
> reassuring. Curious about execsource although I've got a feeling that did
> change since the last release?????
> 
> Tristan
> 
> 
> 
> 
> 
> On Tue, 25 Jan 2022, 19:03 Ralph Goers, <ra...@dslextreme.com> wrote:
> 
>> There seem to be two builds running and both fail but fail in different
>> places.
>> 
>> The first build seems to be failing in a way it shouldn’t. The test is for
>> not specifying any Kafka partitions.
>> The behavior of how Kafka handles this changed in version 2.4 so it should
>> only be checking to see if it
>> received all the evants, but it appears it is somehow in the logic to
>> check that all the partitions have an
>> equal number of events. I’ve added more info into the assert message to
>> help diagnose this.
>> 
>> The second build is failing in changes I just made to upgrade Netty &
>> Avro. It appears to be failing
>> checking the local host name. I will have to add some info to the error to
>> determine what it getting for a
>> hostname.
>> 
>> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
>> TestExecSource (which hasn’t
>> been changed). It seems it is calling process.waitFor() and getting a
>> returned value of 1. I changed the
>> test to call waitFor before calling destroy and it passed. It then failed
>> in TestFileChannelRestart giving me
>> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
>> interval should be increased.
>> I added logic to retry in this situation but there is a unit test that
>> tries to force that error so I had to have
>> it bypass the fix in that case.
>> 
>> I committed those changes and will look at the results of the next Travis
>> build to see what additional info
>> it can provide.
>> 
>> Ralph
>> 
>> 
>>> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org>
>> wrote:
>>> 
>>> Hi all,
>>> It seems that for some reason the Travis builds are failing again. One
>> of them has been since the Log4j and SLF4J bump (odd) and the other since
>> the Kafka upgrade.
>>> 
>>> Anybody got some cycles in investigate whether these are just flaky
>> tests and/or whether there’s something more sinister in there?
>>> 
>>> Thanks
>>> Tristan
>>> 
>> 
>> 


Re: Flume failing travis builds

Posted by Tristan Stevens <tr...@apache.org>.
Thanks Ralph. Seems the unit tests are picking up valid problems, which is
reassuring. Curious about execsource although I've got a feeling that did
change since the last release?????

Tristan





On Tue, 25 Jan 2022, 19:03 Ralph Goers, <ra...@dslextreme.com> wrote:

> There seem to be two builds running and both fail but fail in different
> places.
>
> The first build seems to be failing in a way it shouldn’t. The test is for
> not specifying any Kafka partitions.
> The behavior of how Kafka handles this changed in version 2.4 so it should
> only be checking to see if it
> received all the evants, but it appears it is somehow in the logic to
> check that all the partitions have an
> equal number of events. I’ve added more info into the assert message to
> help diagnose this.
>
> The second build is failing in changes I just made to upgrade Netty &
> Avro. It appears to be failing
> checking the local host name. I will have to add some info to the error to
> determine what it getting for a
> hostname.
>
> I then ran the build in an Ubuntu VM on my MacBook and it got an error in
> TestExecSource (which hasn’t
> been changed). It seems it is calling process.waitFor() and getting a
> returned value of 1. I changed the
> test to call waitFor before calling destroy and it passed. It then failed
> in TestFileChannelRestart giving me
> IOExceptions saying the checkpoint hadn’t completed and the checkpoint
> interval should be increased.
> I added logic to retry in this situation but there is a unit test that
> tries to force that error so I had to have
>  it bypass the fix in that case.
>
> I committed those changes and will look at the results of the next Travis
> build to see what additional info
> it can provide.
>
> Ralph
>
>
> > On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org>
> wrote:
> >
> > Hi all,
> > It seems that for some reason the Travis builds are failing again. One
> of them has been since the Log4j and SLF4J bump (odd) and the other since
> the Kafka upgrade.
> >
> > Anybody got some cycles in investigate whether these are just flaky
> tests and/or whether there’s something more sinister in there?
> >
> > Thanks
> > Tristan
> >
>
>

Re: Flume failing travis builds

Posted by Ralph Goers <ra...@dslextreme.com>.
There seem to be two builds running and both fail but fail in different places. 

The first build seems to be failing in a way it shouldn’t. The test is for not specifying any Kafka partitions. 
The behavior of how Kafka handles this changed in version 2.4 so it should only be checking to see if it 
received all the evants, but it appears it is somehow in the logic to check that all the partitions have an 
equal number of events. I’ve added more info into the assert message to help diagnose this.

The second build is failing in changes I just made to upgrade Netty & Avro. It appears to be failing 
checking the local host name. I will have to add some info to the error to determine what it getting for a 
hostname.

I then ran the build in an Ubuntu VM on my MacBook and it got an error in TestExecSource (which hasn’t 
been changed). It seems it is calling process.waitFor() and getting a returned value of 1. I changed the 
test to call waitFor before calling destroy and it passed. It then failed in TestFileChannelRestart giving me 
IOExceptions saying the checkpoint hadn’t completed and the checkpoint interval should be increased. 
I added logic to retry in this situation but there is a unit test that tries to force that error so I had to have
 it bypass the fix in that case.

I committed those changes and will look at the results of the next Travis build to see what additional info 
it can provide.

Ralph


> On Jan 24, 2022, at 12:18 AM, Tristan Stevens <tr...@apache.org> wrote:
> 
> Hi all,
> It seems that for some reason the Travis builds are failing again. One of them has been since the Log4j and SLF4J bump (odd) and the other since the Kafka upgrade.
> 
> Anybody got some cycles in investigate whether these are just flaky tests and/or whether there’s something more sinister in there?
> 
> Thanks
> Tristan
>