You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Riza Suminto (Code Review)" <ge...@cloudera.org> on 2023/05/12 22:13:05 UTC

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19880


Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 128MB of rows. It always includes an exchange node, so
there will be no collocation of a scanner and a table writer in a single
fragment, thus simplifying the estimation. This patch also allows
CTAS (a kind of DDL query) to be eligible for auto-scaling.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
5 files changed, 137 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@225
PS5, Line 225:     boolean hasHdfsScanORUnion = !hdfsScanNodes.isEmpty() || !unionNodes.isEmpty();
> This code block initialize hdfsScanNodes, unionNodes, and hasHdfsScanORUnio
Sorry I missed that the variables are referenced with from different blocks below. Looks fine.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 18:29:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13092/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 20:50:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, David Rorke, Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19880

to look at the new patch set (#6).

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 256MB of rows. This patch also allows CTAS (a kind of
DDL query) to be eligible for auto-scaling.

This patch also remove comments about NUM_SCANNER_THREADS added by
IMPALA-12029, since it does not applies anymore after IMPALA-12091.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M be/src/scheduling/scheduler.cc
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
9 files changed, 286 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, David Rorke, Wenzhe Zhou, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19880

to look at the new patch set (#3).

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 256MB of rows. It always includes an exchange node, so
there will be no collocation of a scanner and a table writer in a single
fragment, thus simplifying the estimation. This patch also allows
CTAS (a kind of DDL query) to be eligible for auto-scaling.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M be/src/scheduling/scheduler.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
7 files changed, 222 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13026/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 22:33:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, David Rorke, Wenzhe Zhou, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19880

to look at the new patch set (#4).

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 256MB of rows. This patch also allows CTAS (a kind of
DDL query) to be eligible for auto-scaling.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M be/src/scheduling/scheduler.cc
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
8 files changed, 292 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9354/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 18:47:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 18:47:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

Patch set 5 is a rebase of patch set 4 to pick up latest GBN.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 16:41:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13067/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 20:11:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@259
PS3, Line 259:         costBasedMaxWriter =
Seems strange to have min override max here.


http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@284
PS3, Line 284:       } else if (!enforce_hdfs_writer_limit || hdfsScanORUnionNodes.size() == 0
isComputeCost should also override hdfsScanORUnionNodes check so we don't have different writer sizing for different source table types.

We should try to identify when we can still skip adding the exchange with compute cost enabled. As long as effective parallelism is computed across the fragments and the producer has fewer instances, it should be reasonably safe to proceed.


http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@72
PS3, Line 72:   public static final int MIN_WRITE_BYTES = 256 * 1024 * 1024;
The default HDFS block size is actually 128GB. Checking impala generated files with hdfs dfs -stat '%o' seems to confirm this.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 May 2023 11:57:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@186
PS5, Line 186: part
> I see we have other use of <p> and <b> tags in FE code comments. Just not v
Done. Removed the html formatting tags.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 16:54:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 2:

(1 comment)

Patch set 2 tweak the estimate a bit to not exceed maximum parallelism per node.

In worst case where cardinality or average row size of output is unknown, number of writer instances will be fixed at 1 per executor node.

http://gerrit.cloudera.org:8080/#/c/19880/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@236
PS1, Line 236: 
> nit: 128MB
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 May 2023 00:17:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@225
PS5, Line 225:     boolean hasHdfsScanORUnion = !hdfsScanNodes.isEmpty() || !unionNodes.isEmpty();
> nit: this block could be moved lower.
This code block initialize hdfsScanNodes, unionNodes, and hasHdfsScanORUnion.

Here, hdfsScanNodes is referred in line 271 (inside isComputeCost branch).
hasHdfsScanORUnion is referred in line 299 (in exchange skipping branch).

This is the lowest place to put the code block.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 16:56:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@282
PS5, Line 282:           maxScanThread = Math.max(
> This seems to assume that scans are not run concurrently. Is that a correct
No assumption of concurrency here.
Note that if isComputeCost == true, planner will produce MT scan node where 1 scan fragment instance is 1 thread, and all run concurrently at the same time. maxScanThread here map to number of scan fragment instances.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 22:28:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 17:23:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@282
PS5, Line 282:           maxScanThread = Math.max(
> If they run concurrently, I would think a sum would be appropriate here rat
Let me correct myself.

In most cases, one fragment only have 1 SCAN node max.
However, here, we loop over hdfsScanNodes to anticipate case where this is a UNION fragment with multiple SCAN node, such as this:

00:UNION
|  pass-through-operands: all
|
|--02:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
|     HDFS partitions=1824/1824 files=1824 size=199.46MB
|
01:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
   HDFS partitions=1824/1824 files=1824 size=199.46MB

In this case, we have SCAN01 and SCAN02 colocated in single fragment. However, they can not run concurrently because the UNION node will GetNext all rows from SCAN01 before starts pulling rows from SCAN02.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 16:36:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@186
PS5, Line 186: <ul>
> Do we need these <p>, <ul>, <li> in comments?
This is a valid way to format javadoc and make it show up nice in IDE. But I'm OK too with putting them back into single paragraph.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 05:44:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@225
PS5, Line 225:     boolean hasHdfsScanORUnion = !hdfsScanNodes.isEmpty() || !unionNodes.isEmpty();
nit: this block could be moved lower.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 16:19:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@251
PS3, Line 251: getNumDistinctValues
this can return -1 if NDV is unknown - maybe use max(1, Expr.getNumDistinctValues(partitionExprs))?


http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py@1057
PS3, Line 1057: create table 
They shouldn't act differently, but it would be nice to add some tests for INSERTs and INSERT OVERWRITEs too.


http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py@1216
PS3, Line 1216:       result.runtime_profile
nit: +2 indentation



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 May 2023 15:05:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@186
PS5, Line 186: <ul>
Do we need these <p>, <ul>, <li> in comments?



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 05:37:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 256MB of rows. This patch also allows CTAS (a kind of
DDL query) to be eligible for auto-scaling.

This patch also remove comments about NUM_SCANNER_THREADS added by
IMPALA-12029, since it does not applies anymore after IMPALA-12091.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Reviewed-on: http://gerrit.cloudera.org:8080/19880
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/scheduling/scheduler.cc
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
9 files changed, 286 insertions(+), 64 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@251
PS3, Line 251: agment.getNumNodes()
> this can return -1 if NDV is unknown - maybe use max(1, Expr.getNumDistinct
Done. Also changed this to use getNumDistinctValues defined in DistributedPlanner.java


http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@259
PS3, Line 259:       if (maxHdfsWriters > 0) {
> Seems strange to have min override max here.
Done. Changed the order.


http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@284
PS3, Line 284:         }
> isComputeCost should also override hdfsScanORUnionNodes check so we don't h
isComputeCost is removed from this branch clause.
Instead, the logic above it will change expectedNumInputInstance as needed if isComputeCost is true.


http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19880/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@72
PS3, Line 72:   public static final int MIN_WRITE_BYTES = 256 * 1024 * 1024;
> The default HDFS block size is actually 128GB. Checking impala generated fi
I double check with our internal TPC-DS 10TB database written by Impala, at store_sales. Impala indeed write 256MB files instead of 128MB. This is how I check:

show files in store_sales partition (ss_sold_date_sk is null);


http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py
File tests/custom_cluster/test_executor_groups.py:

http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py@1057
PS3, Line 1057: COST_MIN_THRE
> They shouldn't act differently, but it would be nice to add some tests for 
Done


http://gerrit.cloudera.org:8080/#/c/19880/3/tests/custom_cluster/test_executor_groups.py@1216
PS3, Line 1216:   @pytest.mark.execute_serially
> nit: +2 indentation
Done. Also move it above, just before test_query_cpu_count_divisor_default.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 May 2023 20:34:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@238
PS2, Line 238:  maxInstances = IntMath.saturatedMultiply(
             :             inputFragment.getNumNodes(), analyzer.getMaxParallelismPerNode(
can we reuse the value calculated in line 232?


http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@72
PS2, Line 72: 128 * 1024 * 1024
In be/src/exec/parquet/hdfs-parquet-table-writer.h,  HDFS_BLOCK_SIZE is defined as 256 MB


http://gerrit.cloudera.org:8080/#/c/19880/2/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/19880/2/tests/query_test/test_insert.py@463
PS2, Line 463: max_fs_writers <= 0
Does this mean we don't need assert if max_fs_writers <= 0?



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 21:31:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@236
PS1, Line 236: 256MB
nit: 128MB



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 May 2023 22:18:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13110/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 17:15:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@72
PS2, Line 72: 128 * 1024 * 1024
> In be/src/exec/parquet/hdfs-parquet-table-writer.h,  HDFS_BLOCK_SIZE is def
Should this be configurable?



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 01:38:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 3:

(3 comments)

Patch set 3 update the code to have little more parallelism in case of partitioned insert if resulting partition count is higher than what costBasedMaxWriter.

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@238
PS2, Line 238: known.
             :       int costBasedMaxWriter = IntMath.saturatedMultiply(
> can we reuse the value calculated in line 232?
The initialization value is intended just for case when stats is not available.


http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/19880/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@72
PS2, Line 72: 256 * 1024 * 1024
> Should this be configurable?
Thanks for showing that! Changed to 256MB.
I don't see reason to go lower than HDFS block size. To go higher, user can control it by setting low MAX_FS_WRITERS option, with tradeoff of slower insert.

Patch set 3 also update the code to try increase parallelism in case of partitioned insert.


http://gerrit.cloudera.org:8080/#/c/19880/2/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/19880/2/tests/query_test/test_insert.py@463
PS2, Line 463:                    
> Does this mean we don't need assert if max_fs_writers <= 0?
Yes. Rewrote this for clarity.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 19:58:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Kurt Deschler, David Rorke, Wenzhe Zhou, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19880

to look at the new patch set (#2).

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................

IMPALA-12120: Limit output writer parallelism based on write volume

The new processing cost-based planner changes (IMPALA-11604,
IMPALA-12091) will impact output writer parallelism for insert queries,
with the potential for more small files if the processing cost-based
planning results in too many writer fragments. It can further exacerbate
a problem introduced by MT_DOP (see IMPALA-8125).

The MAX_FS_WRITERS query option can help mitigate this. But even without
the MAX_FS_WRITERS set, the default output writer parallelism should
avoid creating excessive writer parallelism for partitioned and
unpartitioned inserts.

This patch implements such a limit when using the cost-based planner. It
limits the number of writer fragments such that each writer fragment
writes at least 128MB of rows. It always includes an exchange node, so
there will be no collocation of a scanner and a table writer in a single
fragment, thus simplifying the estimation. This patch also allows
CTAS (a kind of DDL query) to be eligible for auto-scaling.

Testing:
- Add test cases in test_query_cpu_count_divisor_default
- Add test_processing_cost_writer_limit in test_insert.py
- Pass test_insert.py::TestInsertHdfsWriterLimit
- Pass test_executor_groups.py

Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
---
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
6 files changed, 146 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/19880/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13028/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 May 2023 00:31:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 6: Code-Review+2

carry +1 from Kurt


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 May 2023 18:33:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 May 2023 00:08:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@186
PS5, Line 186: <ul>
> This is a valid way to format javadoc and make it show up nice in IDE. But 
I see we have other use of <p> and <b> tags in FE code comments. Just not very consistent use.


http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@282
PS5, Line 282:           maxScanThread = Math.max(
> No assumption of concurrency here.
If they run concurrently, I would think a sum would be appropriate here rather than max.



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 May 2023 15:00:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12120: Limit output writer parallelism based on write volume

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/19880 )

Change subject: IMPALA-12120: Limit output writer parallelism based on write volume
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19880/5/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@282
PS5, Line 282:           maxScanThread = Math.max(
This seems to assume that scans are not run concurrently. Is that a correct assumption?



-- 
To view, visit http://gerrit.cloudera.org:8080/19880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I289c6ffcd6d7b225179cc9fb2f926390325a27e0
Gerrit-Change-Number: 19880
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 22:15:01 +0000
Gerrit-HasComments: Yes