You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Manish Maheshwari (Jira)" <ji...@apache.org> on 2022/11/29 22:20:00 UTC

[jira] [Updated] (IMPALA-4857) Handle large # of duplicate keys on build side of a spilling hash join

     [ https://issues.apache.org/jira/browse/IMPALA-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manish Maheshwari updated IMPALA-4857:
--------------------------------------
    Labels: 2023Q1 resource-management  (was: resource-management)

> Handle large # of duplicate keys on build side of a spilling hash join
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-4857
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4857
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Priority: Minor
>              Labels: 2023Q1, resource-management
>
> Currently the hash join implementation relies on recursively repartitioning the build side until a single partition can fit entirely in memory. This works well in many cases, but can fail if there are a large number of rows with duplicate keys that does not fit in the available memory.
> This results in an error like: "Cannot perform hash join at node with id 6. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 6. Number of rows 275352"
> A special case of this is a Null-aware anti join with many NULLs on the build side.
> This error often occurs because of a suboptimal query or plan that has a lot of duplicate values on one side of the join. Changing the join operator to spill in many of these cases would result in the query running to completion, but very slowly (since it needs to do a quadratic pairwise comparison of both sides of the join).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org