You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by ASF IRC Bot <as...@urd.zones.apache.org> on 2016/02/01 20:33:20 UTC

Summary of IRC Meeting in #aurora

Summary of IRC Meeting in #aurora at Mon Feb  1 19:06:57 2016:

Attendees: mkhutornenko, adeshmukh, zmanji, benley, jcohen

- Preface
- Deprecation cycles
  - Action: jcohen to follow up w/ dev thread re: changing deprecation policy.
- AURORA-1603
- Rollback testing
  - Action: jcohen to email dev@ w.r.t. rollback testing.


IRC log follows:

## Preface ##
[Mon Feb  1 19:07:49 2016] <jcohen>: Ok, let’s start w/ roll call, as always everyone is encouraged to parctipate!
[Mon Feb  1 19:07:53 2016] <jcohen>: here :)
[Mon Feb  1 19:07:57 2016] <benley>: Here
[Mon Feb  1 19:07:59 2016] <adeshmukh>: here
[Mon Feb  1 19:08:43 2016] <mkhutornenko>: here
[Mon Feb  1 19:08:54 2016] <zmanji>: here
[Mon Feb  1 19:10:25 2016] <jcohen>: Ok, first things first…
## Deprecation cycles ##
[Mon Feb  1 19:11:08 2016] <jcohen>: As we increase the cadence of releases, our policy of killing deprecated fields after one release cycle becomes more burdensome.
[Mon Feb  1 19:11:54 2016] <jcohen>: Given that we’re trying to at least keep up with Mesos’s release cycle which is now timed, it seems like this will be a continuing problem for us, since we can expect releases fairly regularly.
[Mon Feb  1 19:12:21 2016] <jcohen>: Curious what people think about moving from a release-based deprecation to a timed deprecation
[Mon Feb  1 19:12:28 2016] <benley>: I'd be in favor.
[Mon Feb  1 19:12:50 2016] <jcohen>: (i.e. instead of deprecated in release X, removed in release X + 1, instead it would be removed N days after the release in which it was deprecated)
[Mon Feb  1 19:13:14 2016] <zmanji>: I'm also in favor of time based because I like the frequent releases but some of the deprecations are pretty difficult to do
[Mon Feb  1 19:13:26 2016] <benley>: Or perhaps "2 releases, or at least NN days"
[Mon Feb  1 19:14:20 2016] <mkhutornenko>: +1 to a timed approach. I think Mesos follows the same practice
[Mon Feb  1 19:14:23 2016] <jcohen>: Yeah, I want to ensure we keep a balance between giving operators enough time to adopt changes to deprecated fields versus us having to keep them around for too long.
[Mon Feb  1 19:15:19 2016] <jcohen>: It seems all are in favor. Given the absence of wfarner, jsirois, would it make sense to continue this discussion on the dev list where we can come up with a final, revised policy?
[Mon Feb  1 19:16:07 2016] <jcohen>: #action jcohen to follow up w/ dev thread re: changing deprecation policy.
## AURORA-1603 ##
[Mon Feb  1 19:16:35 2016] <jcohen>: https://issues.apache.org/jira/browse/AURORA-1603
[Mon Feb  1 19:16:40 2016] <jcohen>: AURORA-1603
[Mon Feb  1 19:16:55 2016] <jcohen>: mkhutornenko: you want to walk through what happened here?
[Mon Feb  1 19:17:55 2016] <mkhutornenko>: The details of the root cause are too intricate to follow along here but I can give a brief overview of what happened
[Mon Feb  1 19:18:39 2016] <mkhutornenko>: we tried to deploy a master version into one of our clusters and immediately noticed an issue with duplicate instances showing up in job page: https://issues.apache.org/jira/browse/AURORA-1604
[Mon Feb  1 19:19:10 2016] <mkhutornenko>: we immediately attempted to rollback to a previous known good version but the scheduler was unable to restart
[Mon Feb  1 19:19:47 2016] <mkhutornenko>: we have found stack trace (listed in https://issues.apache.org/jira/browse/AURORA-1603) and had to restore scheduler from backup
[Mon Feb  1 19:20:17 2016] <mkhutornenko>: that led to a few other issues found in our recovery instructions not being updated with recent changes
[Mon Feb  1 19:20:35 2016] <mkhutornenko>: https://issues.apache.org/jira/browse/AURORA-1605
[Mon Feb  1 19:21:06 2016] <mkhutornenko>: all in all, we were able to recover but it took us a few hours to reconcile this problem
[Mon Feb  1 19:22:44 2016] <jcohen>: Thanks Maxim. This dovetails nicely to my next topic…
## Rollback testing ##
[Mon Feb  1 19:23:14 2016] <mkhutornenko>: btw, master is not in a working state currently, so I wouldn’t recommend deploying from it
[Mon Feb  1 19:23:33 2016] <jcohen>: Do folks think it would be beneficial to come up with some sort of test suite that ensures it’s possible to roll back between commits?
[Mon Feb  1 19:23:53 2016] <jcohen>: I don’t know how many people deploy from master as opposed to from releases
[Mon Feb  1 19:24:10 2016] <jcohen>: Obviously it’s not a problem that comes up frequently, but it can lead to serious issues when it does arise
[Mon Feb  1 19:24:32 2016] <mkhutornenko>: I think build-to-build rollback verification is important and would benefit overall quality
[Mon Feb  1 19:25:16 2016] <jcohen>: Our jenkins job does not currently run e2e tests unfortunately
[Mon Feb  1 19:25:47 2016] <jcohen>: if it did, it seems like the easiest thing to do would be to run e2e tests, then git checkout HEAD^ and try to rebuild/restart the scheduler
[Mon Feb  1 19:26:35 2016] <mkhutornenko>: we are planning to alter our internal deploy sequence to verify build-to-build upgrade/rollback cycle in a test cluster but would be nice to have a solution everyone could benefit from
[Mon Feb  1 19:27:32 2016] <jcohen>: It might be worth reviving AURORA-476
[Mon Feb  1 19:27:36 2016] <jcohen>: AURORA-476
[Mon Feb  1 19:28:24 2016] <jcohen>: Again, I’ll redirect this to the dev list for further discussion.
[Mon Feb  1 19:28:33 2016] <mkhutornenko>: +1
[Mon Feb  1 19:28:39 2016] <jcohen>: #action jcohen to email dev@ w.r.t. rollback testing.
[Mon Feb  1 19:29:04 2016] <jcohen>: That’s all I’ve got on my list, anyone else have any topics?
[Mon Feb  1 19:30:54 2016] <jcohen>: Ok folks, that’ll do it then. Have a good week everyone!
[Mon Feb  1 19:32:53 2016] <jcohen>: ASFBot: meeting end
[Mon Feb  1 19:33:05 2016] <zmanji>: ASFBot: meeting end


Meeting ended at Mon Feb  1 19:33:05 2016