You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/06/12 02:25:00 UTC
[jira] [Updated] (IMPALA-12204) Redundant codegen info of HashJoinBuilder inside a subplan
[ https://issues.apache.org/jira/browse/IMPALA-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-12204:
------------------------------------
Description:
In query profile, the info strings of a hash join builder contains an ExecOption that has content like "Build Side Codegen Enabled, Hash Table Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN node, this string could be repeated many times since the SUBPLAN node will open the right child many times. This could blow up the profile size.
I can reproduce this by the following query:
{code:sql}
select count(*) from
tpch_nested_parquet.customer c1,
tpch_nested_parquet.customer c2,
(select x.* from c1.c_orders x, c2.c_orders y
where x.o_orderkey = y.o_orderkey) v
where c1.c_custkey = c2.c_custkey;{code}
In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
{noformat}
08:SUBPLAN
| row-size=56B cardinality=1.50M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
| | row-size=56B cardinality=10
| |
| |--02:SINGULAR ROW SRC
| | row-size=40B cardinality=1
| |
| 05:HASH JOIN [INNER JOIN]
| | hash predicates: x.o_orderkey = y.o_orderkey
| | row-size=16B cardinality=10
| |
| |--04:UNNEST [c2.c_orders y]
| | row-size=0B cardinality=10
| |
| 03:UNNEST [c1.c_orders x]
| row-size=0B cardinality=10
{noformat}
The query porfile has super long strings:
{noformat}
Hash Join Builder (join_node_id=5):
ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled,...
{noformat}
was:
In query profile, the info strings of a hash join builder contains an ExecOption that has content like "Build Side Codegen Enabled, Hash Table Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN node, this string could be repeated many times since the SUBPLAN node open and close the right child many times. This could blow up the profile size.
I can reproduce this by the following query:
{code:sql}
select count(*) from
tpch_nested_parquet.customer c1,
tpch_nested_parquet.customer c2,
(select x.* from c1.c_orders x, c2.c_orders y
where x.o_orderkey = y.o_orderkey) v
where c1.c_custkey = c2.c_custkey;{code}
In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
{noformat}
08:SUBPLAN
| row-size=56B cardinality=1.50M
|
|--06:NESTED LOOP JOIN [CROSS JOIN]
| | row-size=56B cardinality=10
| |
| |--02:SINGULAR ROW SRC
| | row-size=40B cardinality=1
| |
| 05:HASH JOIN [INNER JOIN]
| | hash predicates: x.o_orderkey = y.o_orderkey
| | row-size=16B cardinality=10
| |
| |--04:UNNEST [c2.c_orders y]
| | row-size=0B cardinality=10
| |
| 03:UNNEST [c1.c_orders x]
| row-size=0B cardinality=10
{noformat}
The query porfile has super long strings:
{noformat}
Hash Join Builder (join_node_id=5):
ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled,...
{noformat}
> Redundant codegen info of HashJoinBuilder inside a subplan
> ----------------------------------------------------------
>
> Key: IMPALA-12204
> URL: https://issues.apache.org/jira/browse/IMPALA-12204
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> In query profile, the info strings of a hash join builder contains an ExecOption that has content like "Build Side Codegen Enabled, Hash Table Construction Codegen Enabled". When there is a HashJoin node inside a SUBPLAN node, this string could be repeated many times since the SUBPLAN node will open the right child many times. This could blow up the profile size.
> I can reproduce this by the following query:
> {code:sql}
> select count(*) from
> tpch_nested_parquet.customer c1,
> tpch_nested_parquet.customer c2,
> (select x.* from c1.c_orders x, c2.c_orders y
> where x.o_orderkey = y.o_orderkey) v
> where c1.c_custkey = c2.c_custkey;{code}
> In the query plan, there is a HASH JOIN node inside a SUBPLAN node:
> {noformat}
> 08:SUBPLAN
> | row-size=56B cardinality=1.50M
> |
> |--06:NESTED LOOP JOIN [CROSS JOIN]
> | | row-size=56B cardinality=10
> | |
> | |--02:SINGULAR ROW SRC
> | | row-size=40B cardinality=1
> | |
> | 05:HASH JOIN [INNER JOIN]
> | | hash predicates: x.o_orderkey = y.o_orderkey
> | | row-size=16B cardinality=10
> | |
> | |--04:UNNEST [c2.c_orders y]
> | | row-size=0B cardinality=10
> | |
> | 03:UNNEST [c1.c_orders x]
> | row-size=0B cardinality=10
> {noformat}
> The query porfile has super long strings:
> {noformat}
> Hash Join Builder (join_node_id=5):
> ExecOption: Build Side Codegen Enabled, Hash Table Construction Codegen Enabled, Build Side Codegen Enabled, Hash Table Construction Codegen Enabled,...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org