You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Qizhu Chan (Jira)" <ji...@apache.org> on 2022/11/04 06:15:00 UTC

[jira] [Commented] (IMPALA-11701) Slow query problem about querying iceberg table by impala

    [ https://issues.apache.org/jira/browse/IMPALA-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628693#comment-17628693 ] 

Qizhu Chan commented on IMPALA-11701:
-------------------------------------

Oh, I found an issue[IMPALA-11171|https://issues.apache.org/jira/browse/IMPALA-11171] that seems to be related to my issue ?

> Slow query problem about querying iceberg table by impala
> ---------------------------------------------------------
>
>                 Key: IMPALA-11701
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11701
>             Project: IMPALA
>          Issue Type: Question
>            Reporter: Qizhu Chan
>            Priority: Major
>              Labels: impala-iceberg
>         Attachments: image-2022-11-03-17-37-14-712.png, profile_cf446a1ab3a5e852_1b1005de00000000.txt
>
>
> I use impala to query iceberg table, but the query efficiency is not ideal, compared with querying the hive format table of the same data, the time-consuming increase is dozens of times.
> The sql statement used is a very simple statistical query, be like :
> select count(*)  from `db_name`.tbl_name where datekey='20221001' and event='xxx'
> ('datekey' and 'event' are the partition fields)
> My personal feeling is that impala might fetch iceberg's metadata stats and return results very quickly, but it doesn't.
> The catalog of iceberg table is of the hadoop type, and Impala can access it by creating an external table in hive. By the way,  iceberg table will perform snapshot expiration and data compaction on a daily basis, so there should be no small file problems.
> I found this warning using the explain statement:
> {code:java}
> | WARNING: The following tables are missing relevant table and/or column statistics. |
> | iceberg.gamebox_event_iceberg
> {code}
> Query: SHOW TABLE STATS `iceberg`.gamebox_event_iceberg
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                        |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | 0     | 590509 | 1.91TB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs:///hive/warehouse/iceberg/gamebox_event_iceberg |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> It seems like Impala is not syncing iceberg's table and column statistics. I'm not sure if this has anything to do with slow queries.
> As shown in the screenshot, i think the query time is mainly on planning and execution backends , but I don't know what is the reason for these two time consuming.
> Attachment is the complete profile for this query.
> How do I speed up the query? Can someone help with my question?plz.....
>  !image-2022-11-03-17-37-14-712.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org