You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2021/09/28 08:30:00 UTC

[jira] [Commented] (HIVE-25557) Hive 3.1.2 with Tez is slow to clount data in parquet format

    [ https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421247#comment-17421247 ] 

Stamatis Zampetakis commented on HIVE-25557:
--------------------------------------------

I am not sure I understand if the problem is in Tez, Parquet or the combination. Is the COUNT query fast with MR and Parquet? Is the COUNT query fast with Tez and other format e.g., ORC? 

Please also include the plans ({{EXPLAIN}}) for the queries you are testing.

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> ------------------------------------------------------------
>
>                 Key: HIVE-25557
>                 URL: https://issues.apache.org/jira/browse/HIVE-25557
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.1.2
>         Environment: Tez *0.10.1*
>            Reporter: katty he
>            Priority: Major
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with Tez, and the table is in parquet format, normally, when counting, the query engin can read metadata instead of reading the full data, but in my case,  Tez can not get count by metadata only, it will read the data, so it's slow, when count 2 billion data, tez wil use 500s , and spend 60s to initialized, ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)