You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2022/01/13 12:50:00 UTC

[jira] [Resolved] (IMPALA-10919) Fine tune row batch handling to support arrays with millions of items

     [ https://issues.apache.org/jira/browse/IMPALA-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabor Kaszab resolved IMPALA-10919.
-----------------------------------
    Resolution: Won't Fix

> Fine tune row batch handling to support arrays with millions of items
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-10919
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10919
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Gabor Kaszab
>            Priority: Major
>              Labels: complextype
>
> Currently, during row batch serialization there is an upper limit for the row batch size of int32::max():
> https://github.com/apache/impala/blob/b67c0906f596ca336d0ea0e8cbc618a20ac0e563/be/src/runtime/row-batch.cc#L348
> As a result if we query an array of millions of items in the select list it won't fit into a serialized row batch even if it stores integers. With the current implementation, taking into account that the default number of rows in a row batch is 1024 then the limit now allow approximately 560k integers in each row. (560k * 4byte * 1024 rows = ~int32::max() )
> There is a workaround to reduce the number of rows in a row batch with the BATCH_SIZE query option but that seems an overkill as it reduces the batch size for all the nodes in the query but most probably after the arrays are being unnested it is safe to use the default batch size.
> Just an idea but it would be nice to dynamically adjust the row batch size when big arrays are queried so that they can fit in a serialized row bacth, but when the unnesting happened (and there are no longer arrays in the row batch) the default size could be used.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org