You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2018/02/25 00:46:49 UTC

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9447


Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................

IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms

Previously, the maintenance thread in KrpcDataStreamMgr will wake up
once every 10s to check for early senders which timed out. However,
FLAGS_datastream_sender_timeout_ms can be set to a value smaller than
10s. In which case, we may not notice the timed-out senders until much
later. This change fixes the problem by changing the wakeup period of
the maintenance thread to be min of FLAGS_datastream_sender_timeout_ms/2
and 10000 milliseconds. Also, this change addresses a TODO in the code by
moving the check for closed receivers in the 'closed_stream_cache_' from
the handler for EOS RPC to the maintenance thread's loop.

Testing done: test_exchange_large_delay.py which failed previously
when KRPC is enabled. Core debug builds with KRPC enabled.

Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
2 files changed, 25 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/9447/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 22:29:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9447/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9447/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
is that the right JIRA? Wasn't that one addressed with Lars' error message change?


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@367
PS1, Line 367: DCHECK_GE(STREAM_EXPIRATION_TIME_MS, 10000);
what does this mean? what's the reason for not using STREAM_EXPIRATION_TIME_MS directly below?


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@369
PS1, Line 369: (1
is 1ms a good lower bound, or should we make it 10? Though, I guess it only matters when --datastream_sender_timeout_ms is set to something really really low.


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@375
PS1, Line 375: int64_t now = MonotonicMillis();
maybe hoist that up, and use the same 'now' value for the entire iteration.



-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2018 18:45:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................

IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms

Previously, the maintenance thread in KrpcDataStreamMgr will wake up
once every 10s to check for early senders which timed out. However,
FLAGS_datastream_sender_timeout_ms can be set to a value smaller than
10s. In which case, we may not notice the timed-out senders until much
later. This change fixes the problem by changing the wakeup period of
the maintenance thread to be min of FLAGS_datastream_sender_timeout_ms/2
and 10000 milliseconds. Also, this change addresses a TODO in the code by
moving the check for closed receivers in the 'closed_stream_cache_' from
the handler for EOS RPC to the maintenance thread's loop.

Testing done: test_exchange_large_delay.py which failed previously
when KRPC is enabled. Core debug builds with KRPC enabled.

Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Reviewed-on: http://gerrit.cloudera.org:8080/9447
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
2 files changed, 25 insertions(+), 24 deletions(-)

Approvals:
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9447

to look at the new patch set (#2).

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................

IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms

Previously, the maintenance thread in KrpcDataStreamMgr will wake up
once every 10s to check for early senders which timed out. However,
FLAGS_datastream_sender_timeout_ms can be set to a value smaller than
10s. In which case, we may not notice the timed-out senders until much
later. This change fixes the problem by changing the wakeup period of
the maintenance thread to be min of FLAGS_datastream_sender_timeout_ms/2
and 10000 milliseconds. Also, this change addresses a TODO in the code by
moving the check for closed receivers in the 'closed_stream_cache_' from
the handler for EOS RPC to the maintenance thread's loop.

Testing done: test_exchange_large_delay.py which failed previously
when KRPC is enabled. Core debug builds with KRPC enabled.

Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
---
M be/src/runtime/data-stream-test.cc
M be/src/runtime/krpc-data-stream-mgr.cc
2 files changed, 25 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/9447/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9447/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9447/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
> is that the right JIRA? Wasn't that one addressed with Lars' error message 
Yes, I originally thought that it was just the error message not matching. However, for some reasons, we weren't catching this other issue. May be the custom cluster test didn't properly enable KRPC before Lars' recent changes to add a flag to enable KRPC during testing.


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@367
PS1, Line 367: DCHECK_GE(STREAM_EXPIRATION_TIME_MS, 10000);
> what does this mean? what's the reason for not using STREAM_EXPIRATION_TIME
Not sure if it's entirely desirable to use the min(FLAGS_datastream_sender_timeout_ms, STREAM_EXPIRATION_TIME_MS) / 2 as the period of the maintenance thread. While this provides the guarantee that the thread is late by at most half of the timeout period, this also means the amount of lateness is directly proportional to the timeout period. This seems quite inappropriate if the sender sets a timeout to a large value (e.g. 10 mins) and it only gets notified 14~15 mins later. Having the upper bound of 10 seconds seem to make the behavior more deterministic.

I removed this DCHECK as STREAM_EXPIRATION_TIME_MS is an internal timeout and there is no serious need to have time guarantee on how late we remove expired entries as long as garbage collection happens periodically.


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@369
PS1, Line 369: (1
> is 1ms a good lower bound, or should we make it 10? Though, I guess it only
FLAGS_datastream_sender_timeout_ms has millisecond granularity so waking up once every millisecond should work even for the min possible value it's set to.

Yes, it only matters if FLAGS_datastream_sender_timeout_ms is set to very small value which I believe is uncommon.


http://gerrit.cloudera.org:8080/#/c/9447/1/be/src/runtime/krpc-data-stream-mgr.cc@375
PS1, Line 375: int64_t now = MonotonicMillis();
> maybe hoist that up, and use the same 'now' value for the entire iteration.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 07:58:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2025/


-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 18:47:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6512: Maintenace thread period should respect FLAGS datastream sender timeout ms

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9447 )

Change subject: IMPALA-6512: Maintenace thread period should respect FLAGS_datastream_sender_timeout_ms
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I804cef7cc991007ec44375f8eac804aa2df46bd7
Gerrit-Change-Number: 9447
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 16:14:57 +0000
Gerrit-HasComments: No