You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Joan Touzet <wo...@apache.org> on 2017/07/21 15:53:00 UTC

2.1 Bug Review IRC meeting minutes

Hey everybody!

Here's the logfile from today's bug meeting. In short: we are
in very good shape for release candidates in the next week or
so.

Where possible, updates have been made to the tickets themselves,
so consider this log more for posterity than anything else.

Ticket count: 11
* 6 test suite issues (1 of which is a dup)
* 1 minor feature issue (Fauxton, PR is up)
* 1 Windows issue (non-blocking)
* 3 release-related chore/documentation tickets

-Joan

11:07 <+Wohali> let's get started
11:08 <+Wohali> #703 is brand new and I've never seen it before. Because it's a PR from a repo we don't control we don't have the logfiles.
11:09 <+Wohali> however this is the config:set timeout that vatamane was taking about in irc yesterday
11:09 <+davisp> For #703 lets add a dump of the _stats endpoint when we fail
11:09 <+davisp> Right, its likely that it was file_server2 or error_logger that got backed up
11:09 <+davisp> Will make a note of that on the ticket
11:09 <+Wohali> probably file_server2 since the ,true at the end right?
11:10 <+Wohali> ok moving on
11:10 <+Wohali> #701. couchdb_1283 go poopie again
11:10 <+Wohali> 14:35 <+davisp> vatamane: Yeah, looks like we should be able to do a meck:expect on the compaction function, then do some message passing and then just make the last call meck:passthrough(Args) which carries on with the original implementation.
11:10 <+Wohali> i think is relevant?
11:11 <+rnewson> "PR from a repo we don't control"?
11:11 <+Wohali> rnewson: travis has a private key in the .travis.yml file that is hooked to the apache/couchdb repo
11:11 <+Wohali> PRs from other repos (like cloudant/) can't use the private creds so the envvar doesn't get set
11:11 <+rnewson> oh
11:11 <+davisp> And likely that wouldn't have helped much here
11:12 <+davisp> Assuming my theory is anywhere near correct
11:12 <+davisp> At least, that's the only time I've ever seen it error out in production
11:12 <+Wohali> same thing affects #701
11:12 <+davisp> Ticket updated
11:13 <+davisp> I'll work on 701 today. I know what the issue is there
11:13 <+Wohali> ok
11:13 <+Wohali> assinging paul
11:13 <+davisp> The compaction process finishes before we get a chance to suspend it
11:13 <+davisp> Hence bad arg when trying to suspend a dead process
11:13 <+Wohali> moving on
11:13 <+Wohali> #593 is mine, i'm waiting on my Fauxton PR to be reviewed and +1'd. garren gets back Monday. michellep hasn't been available
11:14 <+Wohali> the erlang code has landed already
11:14 <+Wohali> #695
11:14 <+Wohali> haven't seen this one anywhere useful yes
11:14 <+Wohali> yet*
11:14 <+Wohali> i.e. no couch.logs to review
11:15 <+davisp> vatamane: Is that the one you duplicated locally and added longer timeouts for?
11:15 <+Wohali> vatamane is on vacation today and isn't here
11:15 <+Wohali> 15:07 < vatamane> I won't be around on Friday (vacation)
11:15 <+davisp> Oh right
11:15 <+davisp> Am looking at PRs to see if he just forgot to mention the ticket
11:15 <+Wohali> garren is on vacation today as well, and chewbranca is out for another couple of weeks
11:16 <+Wohali> ok
11:16 <+davisp> He is?
11:16 <+Wohali> 23:38 < chewbranca> davisp: alright, I've wrapped up my review on the ddoc_cache PR. And on that note, I'm officially on vacation for the next few weeks, so you might have trouble getting me to do a third round of review ;-)
11:16 <+davisp> I think he's back unless he's leaving again
11:17 <+davisp> Ah, he got back the other day. But its like 8a his time so wouldn't expect to see him around
11:17 <+Wohali> ah that was Jun 30 so yeah
11:17 < jaydoane> I can try to repro 695 using a slow docker container
11:18 <+Wohali> ok #574 is one that really troubles me
11:18 <+Wohali> it's been repeating a whole lot for over a month and no action
11:18 <+davisp> Wohali: jaydoane: Ahh, the one that Nick reproduced was 633 by setting the IO bandwidth to 5KiB
11:18 <+Wohali> yeah 633 is cloesd already
11:18 <+Wohali> unless it recurs
11:18 <+davisp> jaydoane: But you might try a similar thing and see if its similar
11:18 <+davisp> Right
11:18 <+davisp> the command he used should be in backlog here. I'll try and find it
11:19 < ASFBot> joant@atypical.net master * 42f26d5 (NOTICE) https://gitbox.apache.org/repos/asf?p=couchdb.git;h=42f26d5 :
11:19 < ASFBot> >> Explicitly mention Facebook "BSD+Patents" license in NOTICE per LEGAL-303
11:19 < jaydoane> I was unable to repro 574 using extremely low disk IO
11:19 <+davisp> here it is: VBoxManage bandwidthctl ${VM} set Limit --limit 5KB
11:19 <+Wohali> is my analysis of 574 valid?
11:20 <+Wohali> it doesn't look to me like a disk IO issue
11:20 <+Wohali> the failure is in couch_att somewhere
11:21 <+davisp> Haven't read it all but I agree it doesn't appear to be IO related
11:21 <+davisp> Seems like a race between process tear down in that something dies because of that too large error and cascades badly
11:21 < jaydoane> actually found this in my logs, but it's not the same stack trace in the ticket https://www.irccloud.com/pastebin/XKbSWEHH/
11:22 <+Wohali> for me this is my #1 issue for us to look at before release since it's actually affecting replication
11:22 <+Wohali> the rest feel like mainly badly written test cases that need help
11:23 <+Wohali> jaydoane rnewson given davisp's limited cycles could either of you look at this one?
11:23 <+davisp> I'll try and find time but might not be till later today or Monday if no one else gets to it
11:24 <+rnewson> my cycles are pretty limited too tbh
11:24 <+Wohali> ok
11:24 <+davisp> Though for anyone not familiar with the MP parsing code that's a deep dark cave of despair. Feel free to get to it before me
11:24 < jaydoane> I spent the better part of yesterday trying to repro 574, but got nothing so far -- not sure if slowing the tests will help, but I can keep trying (maybe soaking)
11:24 <+rnewson> for bugs that I can go "aha, I know what that is" I can turn out a fix
11:24 <+rnewson> a deep dive into MP parsing is the worst
11:25 <+Wohali> :(
11:25 <+Wohali> ok
11:25 <+Wohali> #674
11:25 <+davisp> I'm inclined to bump 674. I've added a log message for it but I don't believe its failed since I've added it
11:25 <+Wohali> paul merged more debug logging a week ago and i don't think we've seen it since
11:26 <+davisp> Where bump == not block the release
11:26 <+Wohali> ok
11:26 <+Wohali> unless it recurs, I'm in favour
11:26 <+davisp> Yap, hopefully if it recurs that log message will lead us to the fix
11:26 <+davisp> And or at least make us comfortable deleting the dumb assertion
11:27 <+Wohali> done
11:27 <+Wohali> #673
11:27 <+Wohali> this appears to be an issue with how JS is doing the server reset
11:27 <+Wohali> we shouldn't be returning control to the test script prior to the node actually being up
11:27 <+Wohali> but, somehow, we are
11:27 <+Wohali> I'll take this one
11:28 <+Wohali> I recently reworked the restart() logic a bit to wait longer, to try and fix some of the stats tests (which I ultimately disabled)
11:28 <+Wohali> so it's possible that something I did is causing issues? i dunno.
11:28 <+davisp> Oooh
11:28 <+Wohali> #669
11:28 <+davisp> I think I see it
11:28 <+Wohali> oh?
11:28 <+davisp> We're checking the local port and not the clustered port in ensure_all_nodes_alive
11:28 <+Wohali> ahh
11:29 <+davisp> and couch_httpd comes up before chttpd
11:29 <+Wohali> local port comes up first, right
11:29 <+davisp> Adding a note and a link.
11:29 <+Wohali> thanks, that's an easy fix
11:29 <+Wohali> weird that sometimes we're too fast, and other times too slow :)
11:29 <+Wohali> I think #669 and #673 are dupes
11:30 <+davisp> Me too and noted as such
11:30 <+Wohali> your belt and suspenders style is dashing!
11:30 <+davisp> on 673
11:30 <+Wohali> saw, and thx
11:31 <+Wohali> #683 is a chore, someone has to go and read git log and write up something human consumable.
11:31 <+Wohali> if no one else volunteers I can take it...
11:31 <+Wohali> since the motion passed I will be nuking and recreating the 2.1.x branch
11:31 <+davisp> +1
11:31 <+Wohali> once we get all of this stuff cloesd out
11:32 -!- m-i [~m_i@2a01cb0803d61900e12665bddce157a9.ipv6.abo.wanadoo.fr] has quit [Remote host closed the connection]
11:32 <+davisp> Cause that means i can merge the new ddoc_cache soon
11:32 <+davisp> Yap
11:32 <+Wohali> yeah :) soon
11:32 <+Wohali> need to branch the other repos, too
11:32 <+Wohali> and it'd be nice to get tags on the other repos if we're pointing to stuff
11:32 <+Wohali> i'll add a ticket for that so I don't forget
11:33 <+davisp> +1
11:33 <+Wohali> done, #704
11:33 <+Wohali> #642, good news if you didn't see it: https://repo-nightly.couchdb.org/
11:34 <+Wohali> jenkins issues are almost all completely worked out so we'll have a top level master/ and 2.1.x/ tree soon
11:34 <+davisp> Saw it. As far as I'm concerned you should just merge that when you're comfortable. I dunno anyone else that knows Jenkins well enough to have an opinion
11:34 <+Wohali> appreciate it, I can't actually test it on other branches unless the file actually exists, so I'll avoid the RTC model for this one thing
11:35 <+Wohali> but again we have the very latest packages for each branch (we don't keep back versions), plus the latest 10 source code tarballs
11:35 <+Wohali> for dev@ consumption only in line wiht the ASF requirements on this stuff
11:35 <+davisp> yep
11:35 <+davisp> Ahh, ok fair enough
11:35 <+Wohali> we also got real bintray deb/rpm repos for actual releases, which is great
11:35 <+Wohali> and they're working on getting us access to docker for apache/couchdb as our image namespace
11:35 <+davisp> Cool
11:36 <+Wohali> only thing left for me to follow-up on is snaps, and I have the credentials I need
11:36 <+Wohali> we won't be pushing latest snaps, just released builds
11:36 <+davisp> You're saying a lot of words that I know in other contexts...
11:36 <+Wohali> basically, lots of semi-authorized real package/container goodness.
11:37 <+Wohali> it's been a long haul.
11:37 <+davisp> That bit I know :D
11:37 <+Wohali> and #698 we just got a report that the fixed Windows package I made doesn't work on Windows Server 2016 :/
11:37 <+Wohali> so, i just downloaded that OS so I can test on it
11:37 <+Wohali> it'll get low priority
11:37 <+Wohali> and can be fixed after release
11:38 <+Wohali> any questions?
11:38 <+Wohali> and, any one mind if I email the summary of this meeting to dev@ ?
11:38 < jaydoane> good idea