You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/06/19 15:28:00 UTC
[jira] [Updated] (IMPALA-2017) Lazy materialization of Parquet
columns during query
[ https://issues.apache.org/jira/browse/IMPALA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-2017:
----------------------------------
Summary: Lazy materialization of Parquet columns during query (was: Lazy materialization of columns during query)
> Lazy materialization of Parquet columns during query
> ----------------------------------------------------
>
> Key: IMPALA-2017
> URL: https://issues.apache.org/jira/browse/IMPALA-2017
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
> Reporter: Lou Bershad
> Priority: Minor
> Labels: parquet, performance
>
> When I run a query over a 4 billion row table that returns a single row, it takes ~30 seconds if i do 'select * ...'. It takes only 3 seconds if I do a 'select field1, field2 ...'. This is repeatable.
> Given these times, it would seem that the 'select *' query is materializing all the fields for rows whether they match or not.
> Lazy materialization of columns when they are needed could improve performance.
>
> These four queries were run back to back. The actual returned data is elided (sorry). The table has 35 fields.
> {noformat}
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791;
> <elided>
> 1 row selected (33.777 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
> +-------------+------------+--+
> | event_id | client_id |
> +-------------+------------+--+
> | 1416403791 | <elided> |
> +-------------+------------+--+
> 1 row selected (3.363 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791;
> <elided>
> 1 row selected (33.138 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791;
> +-------------+------------+--+
> | event_id | client_id |
> +-------------+------------+--+
> | 1416403791 | <elided> |
> +-------------+------------+--+
> 1 row selected (3.074 seconds)
> 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure>
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org