You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Joan Touzet <wo...@apache.org> on 2017/03/24 22:59:21 UTC

Re: (Updated-2) Stabilizing our automated builds - help needed!

Another update to my original email.

TL;DR: 3 more sporadic failures in the test suite have shown up, and the
build issue on CentOS 6 has been ironed out. Help still needed!

I decided to move to JIRA tickets for everything, but don't let this
fool you: these test cases need some tender loving care now, not
whenever you can get around to it. Please, if you can help, do!

Again, if we don't get these ironed out, our CI workflow will continue
to be less than useful. If builds fail, for instance, we can't
auto-generate packages for testing. And if manual intervention is
required to restart builds that fail for validation, the workflow is
slowed down - each build and test run takes 15-20 minutes, and in ASF
Jenkins we can't run all of our builds in parallel due to limited
resources.

My last resort is to simply disable unreliable test cases, or move them
out of the default build-and-test workflow. I really don't want to have
to do this...but it remains an option.

-Joan

# Open Issues

COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3340: EUnit: syslog timeout
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error


# Recently Closed Issues

1. test/javascript/tests/reduce_builtin.js                        
    Error: {exit_status,139}

  Exit code 139 is a SIGSEGV which may have been related to a libc bug
  on or around 2017-03-21. Going to ignore this one unless it pops up
  again.

2. Documentation build failure on CentOS 6

  I've tweaked the couchdb user in the Docker CentOS 6 image to see
  /usr/local/bin/python ahead of /usr/bin/python, and symlinked
  /usr/bin/python34 to /usr/local/bin/python. This allows the build to
  succeed, though 2 "Python 2-isms" that sneaked into our run scripts
  needed to be fixed first. You can see the small tweaks necessary here:

https://gitbox.apache.org/repos/asf?p=couchdb.git;a=commitdiff;h=5bfedc49b8b1c8a68d39e81a31b3f522d30292e7

3. npm-related build failure

  All Docker images have been upgraded to node 6 / npm 3 which has
  eliminated the problem.

Re: (Updated-7) Stabilizing our automated builds - help still needed!

Posted by Joan Touzet <wo...@apache.org>.
Hi everyone,

Now that the initial Debian/Ubuntu packaging is done, my attention turns
to our test suite full time. I won't automate building of packages until
we have a reliable test suite.
 
# TL;DR
* 3 new failures: 1 JS, 2 EUnit.
* Total issues open: *18*
* Recurrences of 3 previously reported bugs.
* Lack of help on these issues continues to impact our ability to
  automatically roll nightly builds/packages. :(

# New Issues
COUCHDB-3402: JS: dev/run timing out starting up nodes
  I am fairly certain this is because the ASF Jenkins workers are
  heavily loaded, and we need to be more generous with dev/run's
  timeouts. However, I'd like to know for sure, and we have issues
  with Travis runs as well where the answers are buried in logfiles
  we don't have access to after the run fails. So there will be some
  groundwork done here to start saving logs out for us to analyze after
  the fact for both Travis (using after_failure) and Jenkins (using
  the build script) to a common destination (possibly itself
  Couch-flavoured.)

COUCHDB-3401: EUnit: should_not_remember_docs_in_index_after_backup_restore 500 error
  We're failing an assert on an http status code that should be 200 but
  instead returned 500. Logs would be nice. One sighting of this in the
  wild so far.

COUCHDB-3396: EUnit: failed_to_start_child
  Multiple failures in chttpd_db_doc_size_tests across multiple platforms.

An unrelated issue, COUCHDB-3398, has been added to add a new EUnit
test to the test suite.

# Continuing Open Issues (qty. 15)
COUCHDB-3384: EUnit: couch_replicator_compact_tests failure
COUCHDB-3383: EUnit: couchdb_file_compression_tests timed out
COUCHDB-3382: EUnit: couchdb_auth_tests assertEqual failure
COUCHDB-3360: Broken couch_mrview test modules
COUCHDB-3356: JS: jsonp test fail
COUCHDB-3354: JS: replicator_db_compact_rep_db failed
COUCHDB-3352: JS: couchjs SIGSEGVs
COUCHDB-3348: EUnit: global_changes_tests context setup failed
COUCHDB-3346: JS: reduce.js "JSON is not a function"
COUCHDB-3345: JS: stats.js silent failure
COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error
 
# Full list of test case issues in JIRA
 
    https://issues.apache.org/jira/issues/?filter=12340503

-Joan

Re: (Updated-6) Stabilizing our automated builds - help still needed!

Posted by Joan Touzet <wo...@apache.org>.
Hi everyone,

Bit late on my updates to this, been a busy last 3 weeks.

# TL;DR
* 6 new failures: 2 JS, 4 EUnit.
* Recurrences of 4 previously reported bugs.
* Cloudant has offered to look at a few of the EUnit failures.
* Lack of help on these issues is now directly impacting our ability to
  automatically roll nightly builds/packages. :(

# New Issues
COUCHDB-3354: JS: replicator_db_compact_rep_db failed
  Not much to go on other than that the JS test returns 'fail'.

COUCHDB-3356: JS: jsonp test fail
  Similar to 3354, no additional data other than 'fail'.

COUCHDB-3360: Broken couch_mrview test modules
  This isn't so much as a test suite failure, as a set of badly
  written and integrated tests. The tests are named wrong; when
  named correctly, the tests fail. More investigation is required.

COUCHDB-3382: EUnit: couchdb_auth_tests assertEqual failure
  Big stacktrace here, pointing to a failed match. The test
  expected "rocko" but received an undefined value.

COUCHDB-3383: EUnit: couchdb_file_compression_tests timed out
  Simple as the subject says...no additional data. Bump timeout value?

COUCHDB-3384: EUnit: couch_replicator_compact_tests failure
  Assertion failure here, "Failed to pause source database writer."
  I continue to wonder if we need longer timeouts for underresourced
  Travis/Jenkins Docker test running instances.

# Continuing Open Issues
COUCHDB-3352: JS: couchjs SIGSEGVs
COUCHDB-3348: EUnit: global_changes_tests context setup failed
COUCHDB-3346: JS: reduce.js "JSON is not a function"
COUCHDB-3345: JS: stats.js silent failure
COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error
 
# Full list of test case issues in JIRA
 
    https://issues.apache.org/jira/issues/?filter=12340503

Re: (Updated-5) Stabilizing our automated builds - help needed!

Posted by Joan Touzet <wo...@apache.org>.
Hi everyone,

Time for the weekend update, now that the monorepo merge has landed.

# TL;DR
1 new ticket filed for a recurring issue. No help has materialized
in helping get rid of currently failing tests :(  Please help!

# New Issues
COUCHDB-3352: JS: couchjs SIGSEGVs
  Seen this one before but didn't file a ticket on it. We're getting
  an Error: {exit_status,139} which suggests couchjs is SIGSEGVing.
  Currently seeing this on reduce_builtin.js and am seeing it across
  multiple Linux distributions (CentOS 7, Ubuntu 14.04).

# Open Issues
COUCHDB-3348: EUnit: global_changes_tests context setup failed
COUCHDB-3346: JS: reduce.js "JSON is not a function"
COUCHDB-3345: JS: stats.js silent failure
COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error

# Full list of test case issues in JIRA

  https://issues.apache.org/jira/issues/?filter=12340503

-Joan

Re: (Updated-4) Stabilizing our automated builds - help needed!

Posted by Joan Touzet <wo...@apache.org>.
Welcome to today's update. This may be the last one for a couple of
days, pending more activity in fixing failures.


# TL;DR
We had our first all-platform Jenkins pass on Monday! All 12 (well, 10,
2 combinations are skipped) test platforms succeeded.

That said, we have 1 new test failure & 1 test harness issue resolved.

# New Issues

COUCHDB-3348: EUnit: global_changes_tests context setup failed
  Response to checking the global_changes endpoint coming back as
  {badmatch, false}. Not Good. 1 sporadic failure here.

# Open Issues

COUCHDB-3346: JS: reduce.js "JSON is not a function"
COUCHDB-3345: JS: stats.js silent failure
COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error

# Resolved Issues

COUCHDB-3347: dev/run failure on Jenkins CI CentOS 6 (Python 3)

  Two "smart" apostrophes in rel/overlay/etc/default.ini triggered
  this. It didn't turn up on local runs because the interactive shell
  has a different locale set than the non-interactive shell inside
  Docker, and the latter was trying to decode it to ASCII (and failing).



----- Original Message -----
> From: "Joan Touzet" <wo...@apache.org>
> To: dev@couchdb.apache.org
> Sent: Monday, March 27, 2017 4:26:46 PM
> Subject: Re: (Updated-3) Stabilizing our automated builds - help needed!
> 
> Monday update: we have 2 new test failures, 1 test harness failure,
> and 1 resolved issue (Thanks Jay Doane!)
> 
> # New Issues
> 
> COUCHDB-3345: JS: stats.js silent failure
>   The test simply shows 'fail' with no tracebacks or further info.
>   Silent failures are troubling.
> 
> COUCHDB-3346: JS: reduce.js "JSON is not a function"
>   This is especially odd.
> 
> COUCHDB-3347: dev/run failure on Jenkins CI CentOS 6 (Python 3)
>   Looks like another Python 3 incompatibility, though this one
>   is sporadic. More investigation is required.
> 
> 
> # Open Issues
> 
> COUCHDB-3344: EUnit: compaction_daemon_tests timing out
> COUCHDB-3343: JS: show_documents failure
> COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
> COUCHDB-3341: EUnit: config listener unknown failure
> COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error
> 
> 
> # Recently resolved issues
> 
> COUCHDB-3340: EUnit: syslog timeout
> 
>   This has a PR that was just merged that blames a slow
>   getaddrbyhost()
>   call to find the syslog host.
> 

Re: (Updated-3) Stabilizing our automated builds - help needed!

Posted by Joan Touzet <wo...@apache.org>.
Monday update: we have 2 new test failures, 1 test harness failure,
and 1 resolved issue (Thanks Jay Doane!)

# New Issues

COUCHDB-3345: JS: stats.js silent failure
  The test simply shows 'fail' with no tracebacks or further info.
  Silent failures are troubling.

COUCHDB-3346: JS: reduce.js "JSON is not a function"
  This is especially odd.

COUCHDB-3347: dev/run failure on Jenkins CI CentOS 6 (Python 3)
  Looks like another Python 3 incompatibility, though this one
  is sporadic. More investigation is required.


# Open Issues

COUCHDB-3344: EUnit: compaction_daemon_tests timing out
COUCHDB-3343: JS: show_documents failure
COUCHDB-3342: JS: design_docs expected '"ok"', got '"ko"'
COUCHDB-3341: EUnit: config listener unknown failure
COUCHDB-3339: EUnit: couch_mrview_red_views_tests throw:error


# Recently resolved issues

COUCHDB-3340: EUnit: syslog timeout

  This has a PR that was just merged that blames a slow getaddrbyhost()
  call to find the syslog host.