You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org> on 2020/01/03 16:39:06 UTC

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Fang-Yu Rao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14975


Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/runtime-filter-bank.cc
6 files changed, 147 insertions(+), 45 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 7: Code-Review+2

Thanks! I'll submit this for merging once the Jenkins server is working again


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 7
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jan 2020 22:52:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 150 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 6:

(2 comments)

Patch looks good, thanks for your work on this! Just two more tiny things, and I think we can go ahead and submit this

http://gerrit.cloudera.org:8080/#/c/14975/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/6/be/src/runtime/coordinator.cc@1142
PS6, Line 1142:   DCHECK(it != filter_routing_table_->id_to_filter.end());
This will need to be moved above the 'if' or it can never be hit.

Or, another thing we do sometimes to make it clearer would be to just do a DCHECK(false) inside the 'if'.

Either way, it'd be good to also have a brief comment saying that this shouldn't be possible.


http://gerrit.cloudera.org:8080/#/c/14975/6/be/src/runtime/coordinator.cc@1192
PS6, Line 1192: (
nit: sort of weird to have a '(' by itself like this, maybe move it to the next line, or just remove the parenthesis entirely



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Jan 2020 19:32:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 158 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5431/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Jan 2020 23:30:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 179 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.h@21
PS2, Line 21: #include <condition_variable>
I don't think you need this here


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.cc@593
PS2, Line 593:     if (state->num_inflight_rpcs() == 0) {
Its a little subtle why this is correct - since we hold FilterState::lock_ while we issue all of the PublishFilter rpcs, for us to be able to get the lock again here on line 591 means that all PublishFilters that will be issued for this FilterState must have already been issued, this filter has been disabled, and if we reach 0 we know that there won't be any more PublishFilters issued.

Can you add a DCHECK(state->disabled()) and a brief comment mentioning this?


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@58
PS2, Line 58: /// Once a filter is disabled, subsequent updates for that filter are ignored.
We should document the thread safety of this class now, eg. if you take my advice from some of my other comments it will be something like "This class is not thread safe. Callers must always take 'lock()' themselves when calling any FilterState functions if thread safety is needed"


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@92
PS2, Line 92: get_lock()
just 'lock()'


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@161
PS2, Line 161: update_filter_done_cv_
publish_filter_done_cv_


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@178
PS2, Line 178:   SpinLock update_lock;
This isn't used anywhere anymore, right?


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.h@100
PS2, Line 100: /// Lock ordering: (lower-numbered acquired before higher-numbered)
Please update this comment to reflect the new protocol - what's listed here as filter_lock_ was replaced by FilterRoutingTable::lock in a previous patch so you can go ahead and update that too, and filter_update_lock_ has been replaced with FilterState::lock_ (and of course check that we are in fact following the ordering shown here for those locks)


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1050
PS2, Line 1050:   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     filter.second.WaitForPublishFilter();
              :   }
              : 
              :   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     FilterState* state = &filter.second;
              :     state->DisableAndRelease(filter_mem_tracker_);
              :   }
> I have noticed that this patch failed the core tests in the ASAN build due 
Yes - you definitely have to do WaitForPublishFilter() and DisableAndRelease() atomically with respect to FilterState::lock_ somehow, since otherwise after caling WaitForPublishFilter() you could get another call to UpdateFilter() which will attempt to make PublishFilter rpcs even though we think all the PublishFilter rpcs for this FilterState are done.

Possibly the most straight forward way of doing that (if you follow my other comments) is to not have WaitForPublishFilter() take FilterState::lock_, but instead take FilterState::get_lock() ourselves here in the 'for' loop and hold it while calling both WaitForPublishFilter() and DisableAndRelease()


http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1315
PS2, Line 1315:   unique_lock<SpinLock> l(lock_);
I think that this is the only place where you take FilterState::lock_ from within a FilterState function instead of using FilterState::get_lock(). Of course, usually that sort of thing is better encapsulation, but there isn't really a good way to avoid having a FilterState::get_lock() function, eg. because its needed in Coordinator::UpdateFilter, so I think it might be better to just be consistent and always require that callers take FilterState::get_lock() themselves before calling any FilterState functions.



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Jan 2020 20:09:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 8
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Jan 2020 23:14:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5436/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 7
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Jan 2020 18:18:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Reviewed-on: http://gerrit.cloudera.org:8080/14975
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 150 insertions(+), 75 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 9
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5417/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 19:51:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5361/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Jan 2020 17:08:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5404/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Jan 2020 18:15:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 7
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Jan 2020 01:29:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 148 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 6
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................

IMPALA-9154: Make runtime filter propagation asynchronous

This patch fixes a bug introduced by IMPALA-7984 that ports the
functions implementing the aggregation and propagation of runtime
filters from Thrift RPC to KRPC.

Specifically, in IMPALA-7984, the propagation of an aggregated
runtime filter was implemented using the synchronous KRPC. Hence, when
there is a very limited number of KRPC threads for Impala's data stream
service, e.g., 1, there will be a deadlock if the node running the
Coordinator is trying to propagate the aggregated filter to the same
node running the Coordinator since there is no available thread to
receive the aggregated filter.

This patch makes the propagation of an aggregated runtime filter
asynchronous to address the issue described above. To prevent the
memory consumed by the aggregated filter from being reclaimed when the
aggregated filter is still referenced by some inflight KRPC's, we add an
additional field in the class Coordinator::FilterState to keep track of
the number of inflight KRPC's for the propagation of this aggregated
filter to make sure that we will reclaim the memory only when all the
associated KRPC's have completed. Moreover, when ReleaseExecResources()
is invoked by the Coordinator to release all the resources associated
with query execution, including the memory consumed by the aggregated
runtime filters, we make sure the consumed memory by the aggregated
filters is released only when the inflight KRPC's associated with each
aggregated filter have finished.

Testing:
- Passed primitive_many_fragments.test with the database tpch30 in an
  Impala minicluster started with the parameter
  --impalad_args=--datastream_service_num_svc_threads=1.
- Passed the exhaustive tests in the DEBUG build.
- Passed the core tests in the ASAN build.

Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
5 files changed, 150 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/14975/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 7
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5444/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 8
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Jan 2020 18:24:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 8: Code-Review+2

This didn't get submitted cause the parent patch was a draft and not +2ed. I'm gonna rerun.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 8
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Jan 2020 18:24:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 4:

(10 comments)

Patch is looking very good, most of my feedback this time is related to your comments. At a high level - remember to try to keep comments concise, and to approach them not from a perspective of "why I did it this way for this specific patch" but from a perspective of "what would someone coming along later reading this code need to know to understand how it works"

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-backend-state.cc@582
PS4, Line 582: e.g., request dropped due to
             :   // backpressure,
unnecessary


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-backend-state.cc@596
PS4, Line 596: // Recall that the thread that called Coordinator::BackendState::PublishFilter()
             :       // associated with this FilterState acquired the corresponding FilterState::lock_
             :       // while issuing the PublishFilter()'s related to this FilterState. Thus, when
             :       // 'num_flight_rpcs_' of this FilterState reaches 0, this FilterState should
             :       // already been disabled, i.e., no more PublishFilter() related to this FilterState
             :       // will be issued.
This can be made more concise, eg:
Since we disable the filter once complete and hold FilterState::lock_ while issuing all PublishFilter() rpcs, at this point there can't be any more PublishFilter() rpcs issued.


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h@108
PS4, Line 108: since the memory could
             :   /// still be referenced by some inflight KRPC's.
             :   /// Refer to Coordinator::UpdateFilter() for further detail.
Unnecessary, can be removed


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h@111
PS4, Line 111: DisableButNotRelease
I think its fine to just call this Disable()


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h@118
PS4, Line 118: /// Before calling ReleaseExecResources() to disable all the associated runtime filters
             :   /// and release the consumed memory, we need to make sure there is no inflight KRPC
             :   /// propagating the aggregated filter.
This comment is a little weird, cause its focused on how the function is used at one particular call site (which yes, happens to be the only call site for now, but it might not always be).

Let's rewrite it to just be about what the function actually does, eg:
Waits until any inflight PublishFilter rpcs have completed.


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h@159
PS4, Line 159: KRPC's involved in the propagation of the
             :   /// associated runtime filter.
This is a little vague, since there are multiple runtime filter related rpcs. Lets change it to something like "PublishFilter rpcs"


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator-filter-state.h@161
PS4, Line 161: num_inflight_rpcs_
Similarly, I know its verbose, but probably better to call this 'num_inflight_publish_filter_rpcs_'


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc@1054
PS4, Line 1054: l.release();
Shouldn't be needed - unique_lock will automatically release the lock when it goes out of scope (which happens at the end of each loop iteration)


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc@1139
PS4, Line 1139:   DCHECK(it != filter_routing_table_->id_to_filter.end());
So I know that in a previous iteration of this patch, I told you to change a similar place from an 'if' to a DCHECK, but I think this case is a little different than that one.

The reason is that this is directly taking input from an rpc without any previous checking for the validity of that input. Of course, its extremely unlikely that something would happen like the rpc parameters getting corrupted without the rpc layer noticing, but its still the sort of situation where its better to be defensive, eg. in case there's a bug in Impala where we get something about how we generate/send the rpcs wrong, esp. because the DCHECK won't be there in production deployments so its pretty unclear what would even happen if the input was wrong, and it could potentially be something worse than just crashing the impalad, like corrupting on-disk data.

By contrast, if you look at the place where I made the comment in the earlier version of the patch, it was in BackendState::PublishFilter, which logically comes after this 'if' and therefore its basically impossible to imagine a DCHECK for this there ever being hit.


http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc@1189
PS4, Line 1189: // Note that UpdteFilter() could still be invoked even when 'pending_count_' reaches
              :     // 0. For instance, when the corresponding join is a broadcast join where
              :     // 'pending_count_' was initialized to 1 since each local filter is the same.
              :     // In this regard, we need to disable 'state' to prevent another KRPC from calling
              :     // ApplyUpdate() again to update the aggregated filter since this would be a
              :     // redundant operation.
This can be made more concise, eg:
Filter is complete, so we disable it so future UpdateFilter rpcs are ignored (eg. if it was a broadcast join).



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Jan 2020 19:13:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5418/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 13 Jan 2020 20:24:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 2:

(1 comment)

> Patch Set 1:
> 
> (9 comments)
> 
> You've got several formatting issues. Instead of me pointing all of them out, please run clang-format

Hi all, I have some thoughts on the locking protocol when Coordinator::ReleaseExecResources() is called. Please also let me know if you have other ideas. Thanks!

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1050
PS2, Line 1050:   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     filter.second.WaitForPublishFilter();
              :   }
              : 
              :   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     FilterState* state = &filter.second;
              :     state->DisableAndRelease(filter_mem_tracker_);
              :   }
I have noticed that this patch failed the core tests in the ASAN build due to some heap-use-after-free error.

After some initial thoughts, I think this two for-loops should not be separated. Instead, inside WaitForPublishFilter(), after acquiring the lock for a FilterState 'state', we should immediately call state->DisableAndRelease() to prevent the related memory from being accessed later in UpdateFilter() or ApplyUpdate().

This scenario above seems possible when there are more than one instances of FilterState in 'filter_routing_table_' because it is possible that 'num_inflight_rpcs_' for some FilterState was 0 when its WaitForPublishFilter() was called but later on when we are checking the next FilterState, there could be another thread that updates 'num_inflight_rpcs_' of the previous FilterState and tries to access some memory associated with the previous FilterState.

I do not have a concrete proof about my theory above since there is not enough information in the log. But I will first try to combine these two for-loops in a proper way and run the core tests in the ASAN build again.



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Jan 2020 18:06:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 1:

(9 comments)

You've got several formatting issues. Instead of me pointing all of them out, please run clang-format

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-backend-state.cc@573
PS1, Line 573:   if (it == filter_routing_table.get()->id_to_filter.end()) {
This should be possible since filter_routing_table_ will be initialized with all filter ids, right? I think you can leave off the 'if' and just have a DCHECK


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-filter-state.h@91
PS1, Line 91: int*
Its bad encapsulation to return a pointer to a class variable like this. Better to use some sort of setter function. Maybe something like "IncrementNumInflightRpcs()" which you pass either 1 or -1 into as appropriate, or possibly "set_num_inflight_rpcs()"


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-filter-state.h@100
PS1, Line 100: Disable
I think it would be more natural to call this one DisableAndRelease() and the other one just Disable().


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator-filter-state.h@142
PS1, Line 142: Initialized to -1 to indicate that the associated
             :   /// runtime filter has never been involved in the propagation.
I don't think that you rely on this being the case anywhere? Should be fine to just initialize it to 0, which will make the logic around incrementing it simpler.


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.h@41
PS1, Line 41: #include <condition_variable>
move this up with the other "#include <...>" in the first group


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.cc@1144
PS1, Line 1144:     lock_guard<SpinLock> l(filter_routing_table_->update_lock);
So I think we'll need to think about the locking protocol here a little more. 'update_lock' is a single global lock for all filters in a query, and this patch increases contention on it because it 1) increases the size of the critical section here by holding it while all of the calls to BackendState::PublishFilter() execute 2) is taken in PublishFilterCompleteCb() once for each filter for each backend that receives it.

I think that instead, we should make per-filter locks, eg. add a 'lock_' to FilterState (there's a TODO in coordinator-filter-state.h for this). That lock would then be used to protect access to pending_count_, num_inflight_rpcs_, etc.

So, you can remove this line and take FilterState::lock_ after we get the FilterState from the map on line 1156 or so. Accessing the map should be fine because we've already taken filter_routing_table_->lock above, and filter_routing_table_ itself can't get concurrently modified with UpdateFilter() since its setup in InitFilterRoutingTable(), which can't run concurrently with UpdateFilter(), and then the map itself never changes.

One thing that's maybe a little tricky is how to rewrite your hasInflightKRPCsForFilterPropagation() logic to deal with this. The way you're doing it now - where we have a single condition variable that gets signaled each time a filter is done and then you check to see if all filters are done and wait again if not, is a little unusual.

My suggestion would be to also make it per-filter, eg. add a FilterState::update_filters_done_cv_ and then a FilterState::WaitForPublishFilter() and then in ReleaseExecResources() you can just iterate over all the FilterStates and call WaitForPublishFilter() on them.

You'll of course want to check all of my reasoning here.


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.cc@1217
PS1, Line 1217: !IsExecuting() && rpc_params.has_bloom_filter()
I think this is wrong - as written, if !IsExecuting() is true but its a min-max filter, we won't return and will still try to publish the filter.


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/coordinator.cc@1306
PS1, Line 1306:   if (is_bloom_filter()) {
You can save some code duplication by calling DisableButNotRelease() here first and then doing the release.


http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/14975/1/be/src/runtime/runtime-filter-bank.cc@197
PS1, Line 197: 
?



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Jan 2020 21:44:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9154: Make runtime filter propagation asynchronous

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/4/be/src/runtime/coordinator.cc@1139
PS4, Line 1139:   DCHECK(it != filter_routing_table_->id_to_filter.end());
> So I know that in a previous iteration of this patch, I told you to change 
Also, fwiw, another option is to have both the 'if' and the DCHECK - that way we're likely to notice if this is hit in a test run, since the DCHECK will cause a crash, and we're more likely to actually investigate and figure out if something went wrong, but we're still being defensive in production deployments.



-- 
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 4
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Jan 2020 19:22:57 +0000
Gerrit-HasComments: Yes