You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Riza Suminto (Code Review)" <ge...@cloudera.org> on 2022/01/06 17:46:59 UTC

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18126


Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until finally the memory limit is exceeded.

IMPALA-7096 adds 'hdfs_scanner_thread_max_estimated_bytes' flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more flag as an alternative to mitigate OOM called
'hdfs_scanner_thread_headroom_count'. This flag intent to lower the
threshold to reject a new scanner thread launch. For example, setting
hdfs_scanner_thread_headroom_count=1 means ScannerMemLimiter will reject
a new thread from launching if the memtracker does not have spare
capacity to fit at least 2 threads.

Testing:
- Add custom cluster test
  TestScanMemLimitCustom.test_hdfs_scanner_thread_mem_scaling_headroom

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/runtime/scanner-mem-limiter.cc
A testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling-headroom.test
M tests/custom_cluster/test_mem_reservations.py
3 files changed, 85 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:     if (node == element.first) found_scan = scan.get();
> After some internal discussion, we come up with conclusion that query optio
Thanks for adding the query option!



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 07 Oct 2023 14:16:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 6:

(1 comment)

Just one comment, otherwise +1.

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc@1146
PS6, Line 1146:         query_options->__set_mem_limit_coordinators(mem_spec_val.value);
A break is missing from the end of the block here.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 07 Oct 2023 00:30:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Michael Smith, Wenzhe Zhou, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#7).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 89 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14001/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Sep 2023 16:10:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> Alternatively, we can add flag to replace 0.5 in this estimate, and allow u
Would it be possible to use a conservative estimate for decompression memory to more directly fix the estimation?



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 14:36:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@28
PS1, Line 28:     "Number of thread headroom where we decide to stop launching new thread. "
This kind of indirect limit can be hard to understand and tune. Is there something we can do more direct to bound memory better such as reduce row batch sizes?



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 20:28:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#5).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 88 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: whith
Nit: with


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: precedent
Nit: precedence


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: takes
Nit: taking


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@223
PS3, Line 223: is set
"is set to a positive value"?


http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift@849
PS3, Line 849: not set
"is not set to a positive value"?



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Sep 2023 15:32:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Sep 2023 16:00:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG@29
PS5, Line 29: query option such that tuning can be done at per query granularity. If
> nit: granularity
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:31:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

> Patch Set 1:
> 
> Should we look into making the constant COMPRESSED_TEXT_COMPRESSION_RATIO tunable in hdfs-scan-node.cc instead?

COMPRESSED_TEXT_COMPRESSION_RATIO is only used when reading file with text format (THdfsFileFormat::TEXT).
For non-text format, it will use hdfs_scanner_thread_max_estimated_bytes which default to 32MB.

Customer hit OOM over parquet file format. They have try increase hdfs_scanner_thread_max_estimated_bytes higher, but that leads to performance regression for other queries.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Jan 2022 03:48:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> Should definitely update the ScannerMemLimiter claim if  actual mem usage d
>What if we add query option / flag to double >est_non_reserved_bytes for the first N threads where N = >ScanNode::ScannerThreadState::max_num_scanner_threads_?

I still think that we should have a query option set the amount of megabytes to add to the estimate - this would allow an easy workaround for the users. Changing the default logic is risky as it could lead to regressions, even if it is "more right" than the current logic.

The first N scanner threads could often provide a good estimate, but it is also possible that some files checked later have much better compression for some reason and lead to this error. For a long term solution I think that we could do a two step solution, so first only check the header to read the decompressed size and then assign files to scanner threads when we know the sizes. The sizes could be cached to reduce the performance hit.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 09:23:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> I agree that making this tunable via query option is better. It will result
After some internal discussion, we come up with conclusion that query option is be the best way to tune this. I'll implement it in next patch set.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 10 Jul 2023 17:07:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(2 comments)

Only high level comments, didn't check the details of implementation.

http://gerrit.cloudera.org:8080/#/c/18126/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/1//COMMIT_MSG@7
PS1, Line 7: flag
I think that it would be more user friendly if a query option could be used to tune behavior. I expect certain tables to be "notorious mem eaters" and hit this issue, so it would be nice if the users could fine tune their workload to solve this without restarting the cluster.

For example hdfs_scanner_thread_max_estimated_bytes seems like a good candidate - a user would see an error like "can't allocate X megabytes" and then increase the query option with ~X megabytes for the given query. This seems more straightforward to me than using options that affect the number of threads.


http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> I meant conservative estimate where the decision to add a thread is made. T
fyi the estimate for decompressed text file buffer sizes come from here:
https://github.com/apache/impala/blob/cf28a4c5292fdb3504d1fe11183c78243ed148a4/be/src/exec/hdfs-scan-node.cc#L67

Ideally we could get better estimates from stats - Hive seems to collect stats about compressed vs decompressed sizes (table props total_size vs rawDataSize).



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 08:51:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> Would it be possible to use a conservative estimate for decompression memor
Actually, increasing the fraction here might not work.
This code execute if current memory consumption among all running threads is already exceed estimate, which may or may not true when it is scheduling the last scanner thread.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 18:04:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: aking
> Nit: taking
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: with 
> Nit: with
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@222
PS3, Line 222: 
> Nit: precedence
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/be/src/exec/hdfs-scan-node.cc@223
PS3, Line 223: ery op
> "is set to a positive value"?
Done


http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/3/common/thrift/ImpalaService.thrift@849
PS3, Line 849: is not 
> "is not set to a positive value"?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Sep 2023 15:43:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13974/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Sep 2023 22:15:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14151/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 07 Oct 2023 01:04:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 08 Oct 2023 23:38:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 5:

Patch set 4 is a rebase to resolve merge conflict.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 16:57:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> I look around FE code. Impala does not seem to collect totalSize nor rawDat
Should definitely update the ScannerMemLimiter claim if  actual mem usage differs from est. Could use a more conservative estimate with that. Seems like a small downside to limit scanners some when memory limits are set low. Could also delay and retry claim if the MemLimiter would reject the thread.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 01:49:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> >What if we add query option / flag to double >est_non_reserved_bytes for t
I agree that making this tunable via query option is better. It will results in more consistency across all threads too.

However, customer also argue that manually setting and unsetting such query option will be too cumbersome.
https://issues.apache.org/jira/browse/IMPALA-12161



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 17:25:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> Actually, increasing the fraction here might not work.
I meant conservative estimate where the decision to add a thread is made. That should also track the pending thread count when computing memory estimates.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 21:49:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13991/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Sep 2023 18:08:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Reviewed-on: http://gerrit.cloudera.org:8080/18126
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 89 insertions(+), 6 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
Alternatively, we can add flag to replace 0.5 in this estimate, and allow user to set it with value > 0.5.
This too will slow down thread ramp up decision, I think.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 20:43:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Joe McDonnell, Csaba Ringhofer, Bikramjeet Vig, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#2).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until finally the memory limit is exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This flag intent to offer the
same control as hdfs_scanner_thread_max_estimated_bytes, but as a query
option such that tuning can be done at per query granulality. If this
query option is not set, revert to use the value of
hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 86 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Michael Smith, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#6).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granularity. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 88 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18126/6/be/src/service/query-options.cc@1146
PS6, Line 1146:         query_options->__set_mem_limit_coordinators(mem_spec_val.value);
> A break is missing from the end of the block here.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 07 Oct 2023 00:35:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9796/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 08 Oct 2023 19:01:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

Should we look into making the constant COMPRESSED_TEXT_COMPRESSION_RATIO tunable in hdfs-scan-node.cc instead?


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Jan 2022 03:26:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add tuning flag to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add tuning flag to reduce scanner thread launch.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc
File be/src/runtime/scanner-mem-limiter.cc:

http://gerrit.cloudera.org:8080/#/c/18126/1/be/src/runtime/scanner-mem-limiter.cc@90
PS1, Line 90:       addtl_consumption += static_cast<int64_t>(consumption * 0.5);
> fyi the estimate for decompressed text file buffer sizes come from here:
I look around FE code. Impala does not seem to collect totalSize nor rawDataSize in its COMPUTE STATS query.

What if we add query option / flag to double est_non_reserved_bytes for the first N threads where N = ScanNode::ScannerThreadState::max_num_scanner_threads_?
https://github.com/apache/impala/blob/cf28a4c5292fdb3504d1fe11183c78243ed148a4/be/src/exec/hdfs-scan-node.cc#L223
That will only affect the memory accounting in ScannerMemLimiter, not the ScanNode's mem_tracker.
We can add memoization in ScannerMemLimiter such that, by the time there are concurrent scanner threads running (and maybe few completed), ScannerMemLimiter can calculate a better lower bound estimate of est_non_reserved_bytes for next scanner threads.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 22:21:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#4).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

This seems like what IMPALA-7096 should have been originally. Reasonable enough.

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/5//COMMIT_MSG@29
PS5, Line 29: query option such that tuning can be done at per query granulality. If
nit: granularity



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:24:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 6: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 22:32:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/14126/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2023 17:25:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@27
PS2, Line 27: intent
Nit: could be "is intended to offer"


http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@30
PS2, Line 30: is not set
Mention that we also fall back to the flag if the query option is set no zero or a negative value.


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@220
PS2, Line 220: from
             :   // either of hdfs_scanner_thread_max_estimated_bytes flag
Nit: it would be better like this:
"from either the hdfs_scanner_thread_max_estimated_bytes flag ..."


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@221
PS2, Line 221: or
             :   // hdfs_scanner_non_reserved_bytes option
Nit: it would be better like this:
"or the HDFS_SCANNER_NON_RESERVED_BYTES query option."

We could mention that the query option takes precedence over the flag if it is set.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@847
PS2, Line 847: by planner
Nit: by the planner.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@849
PS2, Line 849: thread
Nit: threads.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift@665
PS2, Line 665:   166: optional i64 hdfs_scanner_non_reserved_bytes = 33554432 // 32MB
Shouldn't the default be unset, i.e. for example -1? That way the original behaviour would be conserved, i.e. the 'hdfs_scanner_thread_max_estimated_bytes' flag would be effective unless this option is set.

If the default value stays this, we should mention it also in the commit message.

My reason for having this option unset by default is that a user may set the 'hdfs_scanner_thread_max_estimated_bytes' flag but it won't have any effect and it could be difficult for the user to find out why.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Sep 2023 09:03:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 6: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2023 01:10:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, Daniel Becker, Joe McDonnell, Csaba Ringhofer, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18126

to look at the new patch set (#3).

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................

IMPALA-11068: Add query option to reduce scanner thread launch.

Under heavy decompression workload, Impala running with scanner thread
parallelism (MT_DOP=0) can still hit OOM error due to launching too many
threads too soon. We have logic in ScannerMemLimiter to limit the number
of scanner threads by calculating the thread's memory requirement and
estimating the memory growth rate of all threads. However, it does not
prevent a scanner node from quickly launching many threads and
immediately reaching the memtracker's spare capacity. Even after
ScannerMemLimiter rejects a new thread launch, some existing threads
might continue increasing their non-reserved memory for decompression
work until the memory limit exceeded.

IMPALA-7096 adds hdfs_scanner_thread_max_estimated_bytes flag as a
heuristic to count for non-reserved memory growth. Increasing this flag
value can help reduce thread count, but might severely regress other
queries that do not have heavy decompression characteristics. Similarly
with lowering the NUM_SCANNER_THREADS query option.

This patch adds one more query option as an alternative to mitigate OOM
called HDFS_SCANNER_NON_RESERVED_BYTES. This option is intended to offer
the same control as hdfs_scanner_thread_max_estimated_bytes, but as a
query option such that tuning can be done at per query granulality. If
this query option not set, set to 0, or negative value, backend will
revert to use the value of hdfs_scanner_thread_max_estimated_bytes flag.

Testing:
- Add test case in query-options-test.cc and
  TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling.

Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
---
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M testdata/workloads/functional-query/queries/QueryTest/hdfs-scanner-thread-mem-scaling.test
8 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/18126/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11068: Add query option to reduce scanner thread launch.

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18126 )

Change subject: IMPALA-11068: Add query option to reduce scanner thread launch.
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@27
PS2, Line 27: n is i
> Nit: could be "is intended to offer"
Done


http://gerrit.cloudera.org:8080/#/c/18126/2//COMMIT_MSG@30
PS2, Line 30: tion not s
> Mention that we also fall back to the flag if the query option is set no ze
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@220
PS2, Line 220: from
             :   // either of the HDFS_SCANNER_NON_RESERVED_BYTES query op
> Nit: it would be better like this:
Done. Rephrased.


http://gerrit.cloudera.org:8080/#/c/18126/2/be/src/exec/hdfs-scan-node.cc@221
PS2, Line 221: ion or the
             :   // hdfs_scanner_thread_max_estimated_byte
> Nit: it would be better like this:
Done. Rephrased.


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@847
PS2, Line 847: by the pla
> Nit: by the planner.
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/ImpalaService.thrift@849
PS2, Line 849: al sca
> Nit: threads.
Done


http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18126/2/common/thrift/Query.thrift@665
PS2, Line 665:   166: optional i64 hdfs_scanner_non_reserved_bytes = -1
> Shouldn't the default be unset, i.e. for example -1? That way the original 
Agree. Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/18126
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I03cadf1230eed00d69f2890c82476c6861e37466
Gerrit-Change-Number: 18126
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Sep 2023 17:42:48 +0000
Gerrit-HasComments: Yes