You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2019/04/09 15:53:16 UTC

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12968


Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits and error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
3 files changed, 28 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12968/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc@561
PS1, Line 561:   // TODO: Cancelling the RequestContext will wait on in-flight IO. We should
> RequestContext::Cancel() calls ScanRange::CancelInternal() - I might miss i
Ah I see what you mean and why it's misleading.

So when I wrote that comment I was trying to say that "in flight reads have finished" is *not* a post-condition of CancelInternal(). But I can see that that comment could be taken to imply some kind of non-blocking behaviour, and it *can* get blocked on an in-flight I/O because of file_reader_->lock().

So yeah, it could be clarified for sure.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 22:35:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.h@128
PS1, Line 128:   bool done_ = false;
Can we switch this to an atomic? It'll be easier to reason about the safeness of the "dirty" read. I don't see a real downside to the change.


http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc@561
PS1, Line 561:   // TODO: Cancelling the RequestContext will wait on in-flight IO. We should
> The comment on ScanRange::CancelInternal() claims that it doesn't wait for 
I think the comment is actually correct - the waiting happens in ScanRange::Cancel(). Unless I missed something.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:36:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 00:49:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2746/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 00:40:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12968

to look at the new patch set (#5).

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits an error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/runtime/io/request-ranges.h
4 files changed, 51 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12968/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Apr 2019 20:27:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc@561
PS1, Line 561:   // TODO: Cancelling the RequestContext will wait on in-flight IO. We should
The comment on ScanRange::CancelInternal() claims that it doesn't wait for in-flight IO. Does it need to be updated?



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 16:39:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.h@128
PS1, Line 128:   bool done_ = false;
> Can we switch this to an atomic? It'll be easier to reason about the safene
That makes sense. I switched this to an AtomicBool.


http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc@561
PS1, Line 561:   // TODO: Cancelling the RequestContext will wait on in-flight IO. We should
> Ah I see what you mean and why it's misleading.
I updated that comment in request-ranges.h. Let me know if there is a way it could be clearer.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 23:52:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc@295
PS3, Line 295: we do a dirty
             :     // check of done_ 
> This comment looks stale now
Changed this from done_ to done() (same for the comment below about "periodic check of done_")



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 00:12:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2700/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 16:30:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc@295
PS3, Line 295: we do a dirty
             :     // check of done_ 
This comment looks stale now



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 00:04:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4019/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Apr 2019 08:11:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12968

to look at the new patch set (#3).

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits an error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/runtime/io/request-ranges.h
4 files changed, 51 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12968/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits an error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Reviewed-on: http://gerrit.cloudera.org:8080/12968
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/runtime/io/request-ranges.h
4 files changed, 51 insertions(+), 31 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Apr 2019 02:28:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12968

to look at the new patch set (#4).

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits an error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/runtime/io/request-ranges.h
4 files changed, 51 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12968/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 2: Code-Review+1

(4 comments)

I'll leave it to Lars or other reviewers to do the +2.

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@9
PS2, Line 9: and
nit: an ?


http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@11
PS2, Line 11: cascade
            : of blocked threads
Makes me wonder if there are actually other similar problematic sequence in the HdfsScanNode code ?!


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h@117
PS2, Line 117:   /// together, this lock must be taken first.
Probably worth documenting briefly why this is a "timed_mutex" instead of a plain "mutex".


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc@292
PS2, Line 292: The lock can be held for significant periods of time if the scan
             :     // node is being cancelled.
Would it be clearer to say another thread could be holding this lock for an extended period of time. When I first read it, I kind of interpret it as this lock is held by _this_ function for a long time ?

Also, may also help to call out that this function may be called under some other locks so that's why the remedies are being done here to break out of the loop earlier if the scan node is being cancelled.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Apr 2019 22:25:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Lars Volker, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12968

to look at the new patch set (#2).

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................

IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb

When HdfsScanNode is cancelled or hits and error, SetDoneInternal() holds
HdfsScanNode::lock_ while it runs RequestContext::Cancel(), which can
wait on IO threads to complete outstanding IOs. This can cause a cascade
of blocked threads that causes Prepare() to take a significant time and
cause datastream sender timeouts.

The specific scenario seen has this set of threads:
Thread 1: A DiskIoMgr thread is blocked on IO in hdfsOpenFile() or
  hdfsRead(), holding HdfsFileReader::lock_.
Thread 2: An HDFS scanner thread is blocked in
  HdfsScanNode::SetDoneInternal() -> RequestContext::Cancel()
  -> ScanRange::CancelInternal(), waiting on HdfsFileReader::lock_.
  It is holding HdfsScanNode::lock_.
Thread 3: A thread in ThreadResourceMgr::DestroyPool() -> (a few layers)
  -> HdfsScanNode::ThreadTokenAvailableCb() is blocked waiting on
  HdfsScanNode::lock_ while holding ThreadResourceMgr::lock_.
Thread 4: A thread in FragmentInstanceState::Prepare()
  -> RuntimeState::Init() -> ThreadResourceMgr::CreatePool() is blocked
  waiting on ThreadResourceMgr::lock_.

When Prepare() takes a significant time, datastream senders will time out
waiting for the datastream receivers to start up. This causes failed
queries. S3 has higher latencies for IO and does not have file handle
caching, so S3 is more susceptible to this issue than other platforms.

This changes HdfsScanNode::ThreadTokenAvailableCb() to periodically do a
dirty check of HdfsScanNode::done_ when waiting to acquire the lock. This
avoids the blocking experienced by Thread 3 in the example above.

Testing:
 - Ran tests on normal HDFS and repeatedly on S3

Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
---
M be/src/common/names.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/runtime/io/request-ranges.h
4 files changed, 45 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12968/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2716/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Apr 2019 00:34:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc@295
PS3, Line 295: we do a dirty
             :     // check of done_ 
> I meant, is the read still "dirty", now that done is an atomic? (sry for no
Oh, good point, removed the "dirty"



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 23:57:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2766/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Apr 2019 00:40:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Apr 2019 16:57:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/1/be/src/exec/hdfs-scan-node.cc@561
PS1, Line 561:   // TODO: Cancelling the RequestContext will wait on in-flight IO. We should
> I think the comment is actually correct - the waiting happens in ScanRange:
RequestContext::Cancel() calls ScanRange::CancelInternal() - I might miss it, but I don't see where we call ScanRange::Cancel(), too.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Apr 2019 21:59:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@9
PS2, Line 9: and
> nit: an ?
Fixed


http://gerrit.cloudera.org:8080/#/c/12968/2//COMMIT_MSG@11
PS2, Line 11: cascade
            : of blocked threads
> Makes me wonder if there are actually other similar problematic sequence in
I'm thinking about whether there are things to do to detect these types of problems. The problem with this case is that it really required multiple different conditions (slow IO, lots of cancels) to hit it.


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.h@117
PS2, Line 117:   /// together, this lock must be taken first.
> Probably worth documenting briefly why this is a "timed_mutex" instead of a
Added a couple sentences about why it is a timed_mutex and where we use a timeout.


http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/2/be/src/exec/hdfs-scan-node.cc@292
PS2, Line 292: The lock can be held for significant periods of time if the scan
             :     // node is being cancelled.
> Would it be clearer to say another thread could be holding this lock for an
Good point, fixed up the comment to clarify that this would be blocked on another thread and that it matters because we can hold locks that block other threads.



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 23:56:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Apr 2019 02:50:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8322: Add periodic dirty check of done in ThreadTokenAvailableCb

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12968 )

Change subject: IMPALA-8322: Add periodic dirty check of done_ in ThreadTokenAvailableCb
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12968/3/be/src/exec/hdfs-scan-node.cc@295
PS3, Line 295: we do a dirty
             :     // check of done()
> Changed this from done_ to done() (same for the comment below about "period
I meant, is the read still "dirty", now that done is an atomic? (sry for not being more clear)



-- 
To view, visit http://gerrit.cloudera.org:8080/12968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4881a3e5bfda64e8d60af95ad13b450cf7f8c130
Gerrit-Change-Number: 12968
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 23:38:22 +0000
Gerrit-HasComments: Yes