You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Wenzhe Zhou (Jira)" <ji...@apache.org> on 2023/11/14 04:54:00 UTC

[jira] [Resolved] (IMPALA-12377) Improve count star performance for external data source

     [ https://issues.apache.org/jira/browse/IMPALA-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenzhe Zhou resolved IMPALA-12377.
----------------------------------
    Fix Version/s: Impala 4.4.0
       Resolution: Fixed

> Improve count star performance for external data source
> -------------------------------------------------------
>
>                 Key: IMPALA-12377
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12377
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend, Frontend
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> The code to handle count(*) query in backend function DataSourceScanNode::GetNext() are not efficient. Even there are no column data returned from external data source, it still try to materialize rows and add rows to RowBatch one by one up to the number of row count.  It also call GetNextInputBatch() multiple times (count / batch_size), while  GetNextInputBatch() invoke JNI function.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org