You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/06/19 15:28:00 UTC

[jira] [Updated] (IMPALA-2017) Lazy materialization of Parquet columns during query

     [ https://issues.apache.org/jira/browse/IMPALA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong updated IMPALA-2017:
----------------------------------
    Summary: Lazy materialization of Parquet columns during query  (was: Lazy materialization of columns during query)

> Lazy materialization of Parquet columns during query
> ----------------------------------------------------
>
>                 Key: IMPALA-2017
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2017
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
>            Reporter: Lou Bershad
>            Priority: Minor
>              Labels: parquet, performance
>
> When I run a query over a 4 billion row table that returns a single row, it takes ~30 seconds if i do 'select * ...'.  It takes only 3 seconds if I do a 'select field1, field2 ...'.  This is repeatable.  
> Given these times, it would seem that the 'select *' query is materializing all the fields for rows whether they match or not.  
> Lazy materialization of columns when they are needed could improve performance.
>  
> These four queries were run back to back.  The actual returned data is elided (sorry).  The table has 35 fields.
> {noformat}
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; 
> <elided>
> 1 row selected (33.777 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
> +-------------+------------+--+
> | event_id | client_id |
> +-------------+------------+--+
> | 1416403791 | <elided> |
> +-------------+------------+--+
> 1 row selected (3.363 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; 
> <elided>
> 1 row selected (33.138 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
> +-------------+------------+--+
> | event_id | client_id |
> +-------------+------------+--+
> | 1416403791 | <elided> |
> +-------------+------------+--+
> 1 row selected (3.074 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org