You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/05/16 10:07:00 UTC
[jira] [Commented] (IMPALA-12142) Default fetch_size of 10240 is suboptimal

    [ https://issues.apache.org/jira/browse/IMPALA-12142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723081#comment-17723081 ] 

ASF subversion and git services commented on IMPALA-12142:
----------------------------------------------------------

Commit 3dfebca9b16d7cf4ced40f7efac5d05ac5fe51d9 in impala's branch refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3dfebca9b ]

IMPALA-12138: Optimize HS2 result vector allocations

Before this patch the reservation sizes were based on the
number of rows in the RowBatches - as batch_size has lower default
than fetch_size (1024 vs 10240), one fetch is served by multiple row
batches leading to reserving vectors in more than one step.

This patch changes the logic to:
- reserve during the first fetch the old way
- reserve fetch_size in subsequent fetches
This means that queries with small result set should not regress
while in large ones only the first and the last fetches will be
suboptimal.

Also noticed that the current default fetch_size=10240 in impala-shell
is not optimal for RowMaterializationTimer, probably because it is
not a power of 2 and leads to overallocation.
Created IMPALA-12142 for the potential default fetch_size change.

Tested with select * from tpch_parquet.lineitem, and
RowMaterializationTimer was decreased around 10-20%:
fetch_size=10240: 3.6s -> 3.2s
fetch_size=8192: 2.8s -> 2.6s

Change-Id: I7b0e6a0a8fd028e3c0e4f1f4e272a50d2bfb59ba
Reviewed-on: http://gerrit.cloudera.org:8080/19879
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Default fetch_size of 10240 is suboptimal
> -----------------------------------------
>
>                 Key: IMPALA-12142
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12142
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Clients
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: performance
>
> While working on IMPALA-12138 it turned out that the default settings of batch_size=1024 and  fetch_size=10240 are not ideal for the coordinator (RowMaterializationTimer). My guess for the cause is that HS2 results vectors are rounded up to power of 2 sizes leading to extra allocations and copying.
> query: select * from tpch_parquet.lineitem
> RowMaterializationTimer (before and after IMPALA-12138)
> fetch_size=10240: 3.6s ->3.2s
> fetch_size=8192: 2.8s->2.6s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org