You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2021/09/17 07:42:00 UTC

[jira] [Created] (IMPALA-10919) Fine tune row batch handling to support arrays with millions of items

Gabor Kaszab created IMPALA-10919:
-------------------------------------

             Summary: Fine tune row batch handling to support arrays with millions of items
                 Key: IMPALA-10919
                 URL: https://issues.apache.org/jira/browse/IMPALA-10919
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Gabor Kaszab


Currently, during row batch serialization there is an upper limit for the row batch size of int32::max():
https://github.com/apache/impala/blob/b67c0906f596ca336d0ea0e8cbc618a20ac0e563/be/src/runtime/row-batch.cc#L348

As a result if we query an array of millions of items in the select list it won't fit into a serialized row batch even if it stores integers. With the current implementation, taking into account that the default number of rows in a row batch is 1024 then the limit now allow approximately 560k integers in each row. (560k * 4byte * 1024 rows = ~int32::max() )

There is a workaround to reduce the number of rows in a row batch with the BATCH_SIZE query option but that seems an overkill as it reduces the batch size for all the nodes in the query but most probably after the arrays are being unnested it is safe to use the default batch size.

Just an idea but it would be nice to dynamically adjust the row batch size when big arrays are queried so that they can fit in a serialized row bacth, but when the unnesting happened (and there are no longer arrays in the row batch) the default size could be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org