You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/12/23 03:39:00 UTC

[jira] [Closed] (IMPALA-1073) Consider extending the hints for insert

     [ https://issues.apache.org/jira/browse/IMPALA-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha closed IMPALA-1073.
------------------------------
    Resolution: Fixed

Clustering hint was added for insert and is enabled by default for HDFS tables:

{noformat}
/* +CLUSTERED */ and /* +NOCLUSTERED */ Hints
/* +CLUSTERED */ sorts data by the partition columns before inserting to ensure that only
 one partition is written at a time per node. Use this hint to reduce the number of files kept
 open and the number of buffers kept in memory simultaneously. This technique is primarily 
useful for inserts into Parquet tables, where the large block size requires substantial memory
 to buffer data for multiple output files at once. This hint is available in CDH 5.10 / Impala 2.8
 or higher.
Starting in CDH 6.0 / Impala 3.0, /* +CLUSTERED */ is the default behavior for HDFS tables.
{noformat}

As such, I am marking this fixed.

> Consider extending the hints for insert
> ---------------------------------------
>
>                 Key: IMPALA-1073
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1073
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 1.3.1
>            Reporter: Nong Li
>            Priority: Minor
>
> We currently have the SHUFFLE/NOSHUFFLE hints for insert but it seems to be insufficient and still leading to some pain. We should consider adding the RANDOM and UNPARTITIONED strategies as well to have better control of the resulting number of files/file sizes. Whether these are added as hints, query options or improvements to the plan is open.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)