You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "karthik (Jira)" <ji...@apache.org> on 2021/01/17 14:08:00 UTC

[jira] [Commented] (IMPALA-10431) Consider partitioning instead of sorting before inserts

    [ https://issues.apache.org/jira/browse/IMPALA-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266786#comment-17266786 ] 

karthik commented on IMPALA-10431:
----------------------------------

[~karthikkp] test

> Consider partitioning instead of sorting before inserts
> -------------------------------------------------------
>
>                 Key: IMPALA-10431
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10431
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>
> Currently before partitioned inserts Impala can add a sort node to sort by the partitioning columns - the benefit of this is that the insert will only have to write one file at a time, which reduces memory requirements if the number of partitions is large. The drawback is the extra O(n*log n) CPU cost and the potential spilling.
> As the order we write the partitions in doesn't matter, partitioning the rows in a spilling  hash table would give the same benefits but reduce the O(n*log) n CPU cost to O(n). It may be even possible to avoid spilling in some cases - e.g. if only a single partition is written, and the partitioner passes through the first partition it sees to the file sink.
> The drawback is that unlike sort, we don't have  a partitioner node in Impala yet, but the algorithm is very similar to what's happening in grouping aggregation or hash join builders, so probably this could be done with medium amount of code.
> Note that we can also sort by other columns than partitioning columns during insert to make min-max stat filtering efficient - in this case we should probably stick to the sorting before insert.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org