You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Joan Touzet <jo...@atypical.net> on 2019/05/02 20:41:33 UTC

Sad state of Jenkins CI - please help fix our eunit tests!

Hi everyone,

Lately, our Jenkins CI runs on master (after merges) have been failing a
lot:

    https://s.apache.org/yuwY

Just in the last run (#537), we have failures in eunit tests for
couch_mrview, mem3 and ddoc_cache that need active investigation. [1]

Arguably, the reason no one is actively monitoring this and fixing the
tests is because Jenkins does not (yet) gate commits from landing on master.

This will change in the not-too-distant future. Travis CI has been
slower and slower as of late, and with ownership/leadership change of
Travis (the company) there's some trepidation in the community at large
about its long-term survivability as well.

IBM has graciously committed to a targeted hardware donation for build
machines for our CI needs, to help us get runs done faster and in a
controlled environment. I'll be working with them once that machine
arrives to set up the new CI environment and ensure it does what we all
expect. If anyone has any input on what that should look like, do reply
to this email and let me know.

Fixing Jenkins also will fix our broken snapshot package builds, which
very soon *will include ARM64 support* that the community has been
asking for, for a long time. Until we have regular greens on the board
for ARM64, I'm not willing to approve greenlighting public packages or a
Docker container for this platform. (Same goes for other platforms.)


In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
failing test, I can do that, let me know.

-Joan "all green all the time?" Touzet


[1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
and CentOS's/EPEL's very recent decision to drop the python3 alias in
favour of version-specific ones (python3.4, python3.6). I'll add a
workaround for this via a /usr/local symlink in the image today.


Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Joan Touzet <wo...@apache.org>.
Hi Garren,

Spoke too soon - we have a single failure in the elixir test suite you 
can help with. See below:

https://builds.apache.org/blue/organizations/jenkins/CouchDB/detail/jenkins-arm-anywhere/8/pipeline/60

[2019-05-03T22:49:38.382Z] ReduceTest

[2019-05-03T22:49:38.382Z]   * test More complex array key view row testing
   * test More complex array key view row testing (185.5ms)

[2019-05-03T22:49:38.382Z]

[2019-05-03T22:49:38.382Z]   1) test More complex array key view row 
testing (ReduceTest)

[2019-05-03T22:49:38.382Z]      test/reduce_test.exs:70

[2019-05-03T22:49:38.382Z]      Assertion with == failed

[2019-05-03T22:49:38.382Z]      code:  assert 
Couch.get("/#{db_name}").body()["doc_count"] == total_docs

[2019-05-03T22:49:38.382Z]      left:  11

[2019-05-03T22:49:38.382Z]      right: 12

[2019-05-03T22:49:38.382Z]      stacktrace:

[2019-05-03T22:49:38.382Z]        test/reduce_test.exs:99: anonymous 
fn/4 in ReduceTest."test More complex array key view row testing"/1

[2019-05-03T22:49:38.382Z]        (elixir) lib/enum.ex:2941: 
Enum.reduce_range_inc/4

[2019-05-03T22:49:38.382Z]        test/reduce_test.exs:80: anonymous 
fn/4 in ReduceTest."test More complex array key view row testing"/1

[2019-05-03T22:49:38.382Z]        (elixir) lib/enum.ex:2941: 
Enum.reduce_range_inc/4

[2019-05-03T22:49:38.383Z]        test/reduce_test.exs:79: (test)


On 2019-05-03 2:38 p.m., Joan Touzet wrote:
> Oops, my replies went to each of you personally.
> 
> Thanks to both Garren and Peng Hui for their offers!
> 
> Jenkins fails on EUnit, which means it doesn't get to the Elixir tests, 
> so we don't know if they're failing. The EUnit tests need fixing with 
> priority.
> 
> -Joan
> 
> On 2019-05-03 5:53 a.m., Peng Hui Jiang wrote:
>> Hi Joan,
>>
>> Me too. I can work on some of them on coming Sunday. One of them in 
>> couch_mrview is due to timedout.
>>
>>     couch_mrview_purge_docs_fabric_tests:106: 
>> test_purge_hook_before_compaction...*timed out*
>>
>>
>> Best regards,
>> Peng Hui
>>
>> Inactive hide details for Garren Smith ---03/05/2019 04:32:39 PM---Hi 
>> Joan, I will be able to help later next week. If you coulGarren Smith 
>> ---03/05/2019 04:32:39 PM---Hi Joan, I will be able to help later next 
>> week. If you could let me know of any
>>
>> From: Garren Smith <ga...@apache.org>
>> To: dev@couchdb.apache.org
>> Date: 03/05/2019 04:32 PM
>> Subject: Re: Sad state of Jenkins CI - please help fix our eunit tests!
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> Hi Joan,
>>
>> I will be able to help later next week. If you could let me know of any
>> failing elixir tests I can start there.
>>
>> Cheers
>> Garren
>>
>> On Thu, May 2, 2019 at 10:48 PM Joan Touzet <jo...@atypical.net> wrote:
>>
>>  > Hi everyone,
>>  >
>>  > Lately, our Jenkins CI runs on master (after merges) have been 
>> failing a
>>  > lot:
>>  >
>>  > https://s.apache.org/yuwY
>>  >
>>  > Just in the last run (#537), we have failures in eunit tests for
>>  > couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>>  >
>>  > Arguably, the reason no one is actively monitoring this and fixing the
>>  > tests is because Jenkins does not (yet) gate commits from landing on
>>  > master.
>>  >
>>  > This will change in the not-too-distant future. Travis CI has been
>>  > slower and slower as of late, and with ownership/leadership change of
>>  > Travis (the company) there's some trepidation in the community at 
>> large
>>  > about its long-term survivability as well.
>>  >
>>  > IBM has graciously committed to a targeted hardware donation for build
>>  > machines for our CI needs, to help us get runs done faster and in a
>>  > controlled environment. I'll be working with them once that machine
>>  > arrives to set up the new CI environment and ensure it does what we 
>> all
>>  > expect. If anyone has any input on what that should look like, do 
>> reply
>>  > to this email and let me know.
>>  >
>>  > Fixing Jenkins also will fix our broken snapshot package builds, which
>>  > very soon *will include ARM64 support* that the community has been
>>  > asking for, for a long time. Until we have regular greens on the board
>>  > for ARM64, I'm not willing to approve greenlighting public packages 
>> or a
>>  > Docker container for this platform. (Same goes for other platforms.)
>>  >
>>  >
>>  > In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
>>  > failing test, I can do that, let me know.
>>  >
>>  > -Joan "all green all the time?" Touzet
>>  >
>>  >
>>  > [1]: The failure in 537 on CentOS 7 comes from my recent image 
>> rebuild,
>>  > and CentOS's/EPEL's very recent decision to drop the python3 alias in
>>  > favour of version-specific ones (python3.4, python3.6). I'll add a
>>  > workaround for this via a /usr/local symlink in the image today.
>>  >
>>  >
>>
>>
>>

Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Joan Touzet <wo...@apache.org>.
Oops, my replies went to each of you personally.

Thanks to both Garren and Peng Hui for their offers!

Jenkins fails on EUnit, which means it doesn't get to the Elixir tests, 
so we don't know if they're failing. The EUnit tests need fixing with 
priority.

-Joan

On 2019-05-03 5:53 a.m., Peng Hui Jiang wrote:
> Hi Joan,
> 
> Me too. I can work on some of them on coming Sunday. One of them in 
> couch_mrview is due to timedout.
> 
>     couch_mrview_purge_docs_fabric_tests:106: 
> test_purge_hook_before_compaction...*timed out*
> 
> 
> Best regards,
> Peng Hui
> 
> Inactive hide details for Garren Smith ---03/05/2019 04:32:39 PM---Hi 
> Joan, I will be able to help later next week. If you coulGarren Smith 
> ---03/05/2019 04:32:39 PM---Hi Joan, I will be able to help later next 
> week. If you could let me know of any
> 
> From: Garren Smith <ga...@apache.org>
> To: dev@couchdb.apache.org
> Date: 03/05/2019 04:32 PM
> Subject: Re: Sad state of Jenkins CI - please help fix our eunit tests!
> 
> ------------------------------------------------------------------------
> 
> 
> 
> Hi Joan,
> 
> I will be able to help later next week. If you could let me know of any
> failing elixir tests I can start there.
> 
> Cheers
> Garren
> 
> On Thu, May 2, 2019 at 10:48 PM Joan Touzet <jo...@atypical.net> wrote:
> 
>  > Hi everyone,
>  >
>  > Lately, our Jenkins CI runs on master (after merges) have been failing a
>  > lot:
>  >
>  > https://s.apache.org/yuwY
>  >
>  > Just in the last run (#537), we have failures in eunit tests for
>  > couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>  >
>  > Arguably, the reason no one is actively monitoring this and fixing the
>  > tests is because Jenkins does not (yet) gate commits from landing on
>  > master.
>  >
>  > This will change in the not-too-distant future. Travis CI has been
>  > slower and slower as of late, and with ownership/leadership change of
>  > Travis (the company) there's some trepidation in the community at large
>  > about its long-term survivability as well.
>  >
>  > IBM has graciously committed to a targeted hardware donation for build
>  > machines for our CI needs, to help us get runs done faster and in a
>  > controlled environment. I'll be working with them once that machine
>  > arrives to set up the new CI environment and ensure it does what we all
>  > expect. If anyone has any input on what that should look like, do reply
>  > to this email and let me know.
>  >
>  > Fixing Jenkins also will fix our broken snapshot package builds, which
>  > very soon *will include ARM64 support* that the community has been
>  > asking for, for a long time. Until we have regular greens on the board
>  > for ARM64, I'm not willing to approve greenlighting public packages or a
>  > Docker container for this platform. (Same goes for other platforms.)
>  >
>  >
>  > In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
>  > failing test, I can do that, let me know.
>  >
>  > -Joan "all green all the time?" Touzet
>  >
>  >
>  > [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
>  > and CentOS's/EPEL's very recent decision to drop the python3 alias in
>  > favour of version-specific ones (python3.4, python3.6). I'll add a
>  > workaround for this via a /usr/local symlink in the image today.
>  >
>  >
> 
> 
> 

Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Peng Hui Jiang <ji...@cn.ibm.com>.
Hi Joan,

Me too. I can work on some of them on coming Sunday. One of them in
couch_mrview is due to timedout.

    couch_mrview_purge_docs_fabric_tests:106:
test_purge_hook_before_compaction...*timed out*


Best regards,
Peng Hui



From:	Garren Smith <ga...@apache.org>
To:	dev@couchdb.apache.org
Date:	03/05/2019 04:32 PM
Subject:	Re: Sad state of Jenkins CI - please help fix our eunit tests!



Hi Joan,

I will be able to help later next week. If you could let me know of any
failing elixir tests I can start there.

Cheers
Garren

On Thu, May 2, 2019 at 10:48 PM Joan Touzet <jo...@atypical.net> wrote:

> Hi everyone,
>
> Lately, our Jenkins CI runs on master (after merges) have been failing a
> lot:
>
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apache.org_yuwY&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=PKZ65oA9tV05sXjYYyZUJf_d-ASaaLXiLw-gQdWPDsQ&m=dX82Mu2MaWc0BAXHZteM55gvSegewhH-pdgVxkm3nuM&s=2wmcYkj4yrtjR288gqxEHcUAq2DNJYtm-sKEIg7IMOI&e=

>
> Just in the last run (#537), we have failures in eunit tests for
> couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>
> Arguably, the reason no one is actively monitoring this and fixing the
> tests is because Jenkins does not (yet) gate commits from landing on
> master.
>
> This will change in the not-too-distant future. Travis CI has been
> slower and slower as of late, and with ownership/leadership change of
> Travis (the company) there's some trepidation in the community at large
> about its long-term survivability as well.
>
> IBM has graciously committed to a targeted hardware donation for build
> machines for our CI needs, to help us get runs done faster and in a
> controlled environment. I'll be working with them once that machine
> arrives to set up the new CI environment and ensure it does what we all
> expect. If anyone has any input on what that should look like, do reply
> to this email and let me know.
>
> Fixing Jenkins also will fix our broken snapshot package builds, which
> very soon *will include ARM64 support* that the community has been
> asking for, for a long time. Until we have regular greens on the board
> for ARM64, I'm not willing to approve greenlighting public packages or a
> Docker container for this platform. (Same goes for other platforms.)
>
>
> In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
> failing test, I can do that, let me know.
>
> -Joan "all green all the time?" Touzet
>
>
> [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
> and CentOS's/EPEL's very recent decision to drop the python3 alias in
> favour of version-specific ones (python3.4, python3.6). I'll add a
> workaround for this via a /usr/local symlink in the image today.
>
>



Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Garren Smith <ga...@apache.org>.
Hi Joan,

I will be able to help later next week. If you could let me know of any
failing elixir tests I can start there.

Cheers
Garren

On Thu, May 2, 2019 at 10:48 PM Joan Touzet <jo...@atypical.net> wrote:

> Hi everyone,
>
> Lately, our Jenkins CI runs on master (after merges) have been failing a
> lot:
>
>     https://s.apache.org/yuwY
>
> Just in the last run (#537), we have failures in eunit tests for
> couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>
> Arguably, the reason no one is actively monitoring this and fixing the
> tests is because Jenkins does not (yet) gate commits from landing on
> master.
>
> This will change in the not-too-distant future. Travis CI has been
> slower and slower as of late, and with ownership/leadership change of
> Travis (the company) there's some trepidation in the community at large
> about its long-term survivability as well.
>
> IBM has graciously committed to a targeted hardware donation for build
> machines for our CI needs, to help us get runs done faster and in a
> controlled environment. I'll be working with them once that machine
> arrives to set up the new CI environment and ensure it does what we all
> expect. If anyone has any input on what that should look like, do reply
> to this email and let me know.
>
> Fixing Jenkins also will fix our broken snapshot package builds, which
> very soon *will include ARM64 support* that the community has been
> asking for, for a long time. Until we have regular greens on the board
> for ARM64, I'm not willing to approve greenlighting public packages or a
> Docker container for this platform. (Same goes for other platforms.)
>
>
> In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
> failing test, I can do that, let me know.
>
> -Joan "all green all the time?" Touzet
>
>
> [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
> and CentOS's/EPEL's very recent decision to drop the python3 alias in
> favour of version-specific ones (python3.4, python3.6). I'll add a
> workaround for this via a /usr/local symlink in the image today.
>
>

Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Joan Touzet <wo...@apache.org>.
Bumping this thread because:

1. Travis CI is becoming increasingly unusable and is blocking valid
   PRs from merging.
2. Removing Travis and not putting any gate on merging PRs is, to me,
   an unacceptable medium-to-long term solution.
3. We're very close to having our own hardware to run Jenkins CI runs
   instead of Travis.
4. It won't fix the problem of PRs being blocked because these test
   cases are still failing >80% of the time in Jenkins as well as
   Travis.
5. It'll get worse when we enable cross-platform builds; see my other
   thread.

Fix the tests, or everyone loses.

-Joan "I asked nicely, now I'm telling you" Touzet



On 2019-05-02 16:41, Joan Touzet wrote:
> Hi everyone,
> 
> Lately, our Jenkins CI runs on master (after merges) have been failing a
> lot:
> 
>     https://s.apache.org/yuwY
> 
> Just in the last run (#537), we have failures in eunit tests for
> couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
> 
> Arguably, the reason no one is actively monitoring this and fixing the
> tests is because Jenkins does not (yet) gate commits from landing on master.
> 
> This will change in the not-too-distant future. Travis CI has been
> slower and slower as of late, and with ownership/leadership change of
> Travis (the company) there's some trepidation in the community at large
> about its long-term survivability as well.
> 
> IBM has graciously committed to a targeted hardware donation for build
> machines for our CI needs, to help us get runs done faster and in a
> controlled environment. I'll be working with them once that machine
> arrives to set up the new CI environment and ensure it does what we all
> expect. If anyone has any input on what that should look like, do reply
> to this email and let me know.
> 
> Fixing Jenkins also will fix our broken snapshot package builds, which
> very soon *will include ARM64 support* that the community has been
> asking for, for a long time. Until we have regular greens on the board
> for ARM64, I'm not willing to approve greenlighting public packages or a
> Docker container for this platform. (Same goes for other platforms.)
> 
> 
> In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
> failing test, I can do that, let me know.
> 
> -Joan "all green all the time?" Touzet
> 
> 
> [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
> and CentOS's/EPEL's very recent decision to drop the python3 alias in
> favour of version-specific ones (python3.4, python3.6). I'll add a
> workaround for this via a /usr/local symlink in the image today.
> 


Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by Joan Touzet <wo...@apache.org>.
Thanks for the officer, Serge, but unless you're an IBM employee, you
won't be allowed to help!

Because IBM is providing the hardware, they are also taking
responsibility for provisioning the Jenkins workers on that box. This
was a condition of them providing the machine: that they would retain
full control outside of Jenkins, and wouldn't be providing credentials
to others. They already have their own automated provisioning
infrastructure (which, coincidentally, is Chef-based) and team to manage
it. IBMers, speak up if I'm wrong here.

Our CouchDB build process for Jenkins runs in Docker containers as
defined over here:

    https://github.com/apache/couchdb-ci

and is driven by the top-level Jenkins file in the CouchDB main repo.
This is how we avoid having to manage CouchDB dependencies on the
Jenkins hardware, and makes the job of provisioning Jenkins workers a
**LOT** easier. (They just need to install Docker, Java, Jenkins, qemu,
and a couple of other support packages, plus deploy the Jenkins secret
that allows each worker to register with the ASF's build master. Simples.)

If you have suggestions for improvement on the CouchDB CI repo, or want
to help with expanding it to support other configurations, pull requests
are always welcome. :D

Cheers,
Joan "moar CI" Touzet

On 2019-05-09 15:04, salsa-dev@tut.by wrote:
> I can help to create automated provisioning of the system with the help of Chef software.
> 
> Serge
> 
> 02.05.2019, 23:48, "Joan Touzet" <jo...@atypical.net>:
>> Hi everyone,
>>
>> Lately, our Jenkins CI runs on master (after merges) have been failing a
>> lot:
>>
>>     https://s.apache.org/yuwY
>>
>> Just in the last run (#537), we have failures in eunit tests for
>> couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>>
>> Arguably, the reason no one is actively monitoring this and fixing the
>> tests is because Jenkins does not (yet) gate commits from landing on master.
>>
>> This will change in the not-too-distant future. Travis CI has been
>> slower and slower as of late, and with ownership/leadership change of
>> Travis (the company) there's some trepidation in the community at large
>> about its long-term survivability as well.
>>
>> IBM has graciously committed to a targeted hardware donation for build
>> machines for our CI needs, to help us get runs done faster and in a
>> controlled environment. I'll be working with them once that machine
>> arrives to set up the new CI environment and ensure it does what we all
>> expect. If anyone has any input on what that should look like, do reply
>> to this email and let me know.
>>
>> Fixing Jenkins also will fix our broken snapshot package builds, which
>> very soon *will include ARM64 support* that the community has been
>> asking for, for a long time. Until we have regular greens on the board
>> for ARM64, I'm not willing to approve greenlighting public packages or a
>> Docker container for this platform. (Same goes for other platforms.)
>>
>> In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
>> failing test, I can do that, let me know.
>>
>> -Joan "all green all the time?" Touzet
>>
>> [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
>> and CentOS's/EPEL's very recent decision to drop the python3 alias in
>> favour of version-specific ones (python3.4, python3.6). I'll add a
>> workaround for this via a /usr/local symlink in the image today.


Re: Sad state of Jenkins CI - please help fix our eunit tests!

Posted by sa...@tut.by.
I can help to create automated provisioning of the system with the help of Chef software.

Serge

02.05.2019, 23:48, "Joan Touzet" <jo...@atypical.net>:
> Hi everyone,
>
> Lately, our Jenkins CI runs on master (after merges) have been failing a
> lot:
>
>     https://s.apache.org/yuwY
>
> Just in the last run (#537), we have failures in eunit tests for
> couch_mrview, mem3 and ddoc_cache that need active investigation. [1]
>
> Arguably, the reason no one is actively monitoring this and fixing the
> tests is because Jenkins does not (yet) gate commits from landing on master.
>
> This will change in the not-too-distant future. Travis CI has been
> slower and slower as of late, and with ownership/leadership change of
> Travis (the company) there's some trepidation in the community at large
> about its long-term survivability as well.
>
> IBM has graciously committed to a targeted hardware donation for build
> machines for our CI needs, to help us get runs done faster and in a
> controlled environment. I'll be working with them once that machine
> arrives to set up the new CI environment and ensure it does what we all
> expect. If anyone has any input on what that should look like, do reply
> to this email and let me know.
>
> Fixing Jenkins also will fix our broken snapshot package builds, which
> very soon *will include ARM64 support* that the community has been
> asking for, for a long time. Until we have regular greens on the board
> for ARM64, I'm not willing to approve greenlighting public packages or a
> Docker container for this platform. (Same goes for other platforms.)
>
> In short: *PLEASE HELP FIX THE FAILING TESTS*. If you want a bug per
> failing test, I can do that, let me know.
>
> -Joan "all green all the time?" Touzet
>
> [1]: The failure in 537 on CentOS 7 comes from my recent image rebuild,
> and CentOS's/EPEL's very recent decision to drop the python3 alias in
> favour of version-specific ones (python3.4, python3.6). I'll add a
> workaround for this via a /usr/local symlink in the image today.