You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/09/21 01:48:06 UTC

[kudu-CR] periodic: prevent runaway callback loops

Hello Todd Lipcon, Alexey Serbin,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/8116

to review the following change.

Change subject: periodic: prevent runaway callback loops
......................................................................

periodic: prevent runaway callback loops

One of the key semantics of PeriodicTimer is that Stop() doesn't wait for
an outstanding scheduled callback to finish. It returns immediately and
when the scheduled callback runs, it'll see that Stop() was called and exit.

However, if Start() is called after Stop() but before the callback runs,
both "callback loops" will remain alive. The existing locking will ensure
that their periods converge and that the functor is only invoked once per
period, but it's still an unnecessary resource drain.

This patch addresses this by assigning a "generation" to each callback loop.
When a callback runs, it'll exit if its generation is older than the timer's
current generation. That means an additional N scheduled callbacks for N
Stop/Start call pairs, but after one period all but one of the callbacks
will detect the generational change and exit.

All of this wouldn't be necessary if Stop() could cancel the underlying
callback, but Messenger::ScheduleOnReactor() doesn't support cancelation,
and it'd be far more work to add that than to employ this workaround.

Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
---
M src/kudu/rpc/periodic-test.cc
M src/kudu/rpc/periodic.cc
M src/kudu/rpc/periodic.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
5 files changed, 118 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8116/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/reactor.cc
File src/kudu/rpc/reactor.cc:

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/reactor.cc@101
PS1, Line 101: METRIC_DEFINE_counter(server, reactor_delayed_tasks_completed,
is this actually useful in a production context or just for tests? if the latter, I think a simple variable is probably better (and can just reach in through friend classes or somesuch to get access to it). I hesitate to pollute public metrics namespaces with implementation details like this.



-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 21 Sep 2017 04:50:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8116

to look at the new patch set (#2).

Change subject: periodic: prevent runaway callback loops
......................................................................

periodic: prevent runaway callback loops

One of the key semantics of PeriodicTimer is that Stop() doesn't wait for
an outstanding scheduled callback to finish. It returns immediately and
when the scheduled callback runs, it'll see that Stop() was called and exit.

However, if Start() is called after Stop() but before the callback runs,
both "callback loops" will remain alive. The existing locking will ensure
that their periods converge and that the functor is only invoked once per
period, but it's still an unnecessary resource drain.

This patch addresses this by assigning a "generation" to each callback loop.
When a callback runs, it'll exit if its generation is older than the timer's
current generation. That means an additional N scheduled callbacks for N
Stop/Start call pairs, but after one period all but one of the callbacks
will detect the generational change and exit.

All of this wouldn't be necessary if Stop() could cancel the underlying
callback, but Messenger::ScheduleOnReactor() doesn't support cancelation,
and it'd be far more work to add that than to employ this workaround.

While making this change I decided to replace the various atomic primitives
with basic ones protected by the existing spinlock; I found it easier to
grok the critical sections this way.

Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
---
M src/kudu/rpc/periodic-test.cc
M src/kudu/rpc/periodic.cc
M src/kudu/rpc/periodic.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
5 files changed, 153 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8116/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8116

to look at the new patch set (#3).

Change subject: periodic: prevent runaway callback loops
......................................................................

periodic: prevent runaway callback loops

One of the key semantics of PeriodicTimer is that Stop() doesn't wait for
an outstanding scheduled callback to finish. It returns immediately and
when the scheduled callback runs, it'll see that Stop() was called and exit.

However, if Start() is called after Stop() but before the callback runs,
both "callback loops" will remain alive. The existing locking will ensure
that their periods converge and that the functor is only invoked once per
period, but it's still an unnecessary resource drain.

This patch addresses this by assigning a "generation" to each callback loop.
When a callback runs, it'll exit if its generation is older than the timer's
current generation. That means an additional N scheduled callbacks for N
Stop/Start call pairs, but after one period all but one of the callbacks
will detect the generational change and exit.

All of this wouldn't be necessary if Stop() could cancel the underlying
callback, but Messenger::ScheduleOnReactor() doesn't support cancelation,
and it'd be far more work to add that than to employ this workaround.

While making this change I decided to replace the various atomic primitives
with basic ones protected by the existing spinlock; I found it easier to
grok the critical sections this way.

Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
---
M src/kudu/rpc/periodic-test.cc
M src/kudu/rpc/periodic.cc
M src/kudu/rpc/periodic.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
5 files changed, 154 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8116/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 29 Sep 2017 18:46:35 +0000
Gerrit-HasComments: No

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc@193
PS1, Line 193: 
> In a sense it's exactly two tasks per period: the first one runs "timer->St
Thank you for the clarification.



-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:46:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................

periodic: prevent runaway callback loops

One of the key semantics of PeriodicTimer is that Stop() doesn't wait for
an outstanding scheduled callback to finish. It returns immediately and
when the scheduled callback runs, it'll see that Stop() was called and exit.

However, if Start() is called after Stop() but before the callback runs,
both "callback loops" will remain alive. The existing locking will ensure
that their periods converge and that the functor is only invoked once per
period, but it's still an unnecessary resource drain.

This patch addresses this by assigning a "generation" to each callback loop.
When a callback runs, it'll exit if its generation is older than the timer's
current generation. That means an additional N scheduled callbacks for N
Stop/Start call pairs, but after one period all but one of the callbacks
will detect the generational change and exit.

All of this wouldn't be necessary if Stop() could cancel the underlying
callback, but Messenger::ScheduleOnReactor() doesn't support cancelation,
and it'd be far more work to add that than to employ this workaround.

While making this change I decided to replace the various atomic primitives
with basic ones protected by the existing spinlock; I found it easier to
grok the critical sections this way.

Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Reviewed-on: http://gerrit.cloudera.org:8080/8116
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/rpc/periodic-test.cc
M src/kudu/rpc/periodic.cc
M src/kudu/rpc/periodic.h
M src/kudu/rpc/reactor.cc
4 files changed, 143 insertions(+), 38 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, but someone else must approve
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8116

to look at the new patch set (#4).

Change subject: periodic: prevent runaway callback loops
......................................................................

periodic: prevent runaway callback loops

One of the key semantics of PeriodicTimer is that Stop() doesn't wait for
an outstanding scheduled callback to finish. It returns immediately and
when the scheduled callback runs, it'll see that Stop() was called and exit.

However, if Start() is called after Stop() but before the callback runs,
both "callback loops" will remain alive. The existing locking will ensure
that their periods converge and that the functor is only invoked once per
period, but it's still an unnecessary resource drain.

This patch addresses this by assigning a "generation" to each callback loop.
When a callback runs, it'll exit if its generation is older than the timer's
current generation. That means an additional N scheduled callbacks for N
Stop/Start call pairs, but after one period all but one of the callbacks
will detect the generational change and exit.

All of this wouldn't be necessary if Stop() could cancel the underlying
callback, but Messenger::ScheduleOnReactor() doesn't support cancelation,
and it'd be far more work to add that than to employ this workaround.

While making this change I decided to replace the various atomic primitives
with basic ones protected by the existing spinlock; I found it easier to
grok the critical sections this way.

Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
---
M src/kudu/rpc/periodic-test.cc
M src/kudu/rpc/periodic.cc
M src/kudu/rpc/periodic.h
M src/kudu/rpc/reactor.cc
4 files changed, 143 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/16/8116/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc@193
PS1, Line 193: kPeriods * 2
Just a question to make sure I understand this correctly.

I'm not sure I understand how it's twice of kPeriods.  Do I understand correctly that on average it's about one task per period?  If so, it's more kPeriods + X, where X is a small number, like 1 or 2, no?



-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 21 Sep 2017 19:21:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] periodic: prevent runaway callback loops

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8116 )

Change subject: periodic: prevent runaway callback loops
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc
File src/kudu/rpc/periodic-test.cc:

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/periodic-test.cc@193
PS1, Line 193: kPeriods * 2
> Just a question to make sure I understand this correctly.
In a sense it's exactly two tasks per period: the first one runs "timer->Stop();timer->Start()" and reschedules itself, and the second one just notices that the current generation has changed and exits.

But, because "timer->Stop();timer->Start()" resets the period, we don't end up running a full 10 periods, it's more like 8 or 9. Hence ASSERT_LE and not ASSERT_EQ.

I suppose it's possible for the number of tasks to _exceed_ kPeriods*2 if SleepFor() ends up sleeping for a few extra periods, but I didn't see that happen when I looped the test 1000 times in TSAN mode with stress threads.


http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/reactor.cc
File src/kudu/rpc/reactor.cc:

http://gerrit.cloudera.org:8080/#/c/8116/1/src/kudu/rpc/reactor.cc@101
PS1, Line 101: METRIC_DEFINE_counter(server, reactor_delayed_tasks_completed,
> is this actually useful in a production context or just for tests? if the l
Just for tests, though I thought it was a neat curiosity of sorts.

I'll do something equivalent in PeriodicTimer instead.



-- 
To view, visit http://gerrit.cloudera.org:8080/8116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37bfd547db2a6c58d15d228979c88b9b871f72c5
Gerrit-Change-Number: 8116
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Sep 2017 21:15:06 +0000
Gerrit-HasComments: Yes