You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashish Sharma (JIRA)" <ji...@apache.org> on 2018/06/13 09:19:00 UTC
[jira] [Updated] (HIVE-19103) Nested structure Projection Push Down
in Hive with ORC
[ https://issues.apache.org/jira/browse/HIVE-19103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Sharma updated HIVE-19103:
---------------------------------
Status: Open (was: Patch Available)
> Nested structure Projection Push Down in Hive with ORC
> ------------------------------------------------------
>
> Key: HIVE-19103
> URL: https://issues.apache.org/jira/browse/HIVE-19103
> Project: Hive
> Issue Type: Improvement
> Components: Hive, ORC
> Reporter: Ashish Sharma
> Assignee: Ashish Sharma
> Priority: Critical
> Labels: pull-request-available
>
> Reading required columns only in nested structure schema
> Example -
> *Current state* -
> Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Current state - read entire c struct from the file and then filter because "hive.io.file.readcolumn.ids" is referred due to which all the children column are select to read from the file.
> Conf -
> _hive.io.file.readcolumn.ids = "2"
> hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -
> boolean[ ] include = [true,false,false,true,true,true,true,true]
> *Expected state* -
> Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Expected state - instead of reading entire c struct from the file just read only the f column by referring the " hive.io.file.readNestedColumn.paths".
> Conf -
> _hive.io.file.readcolumn.ids = "2"
> hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -
> boolean[ ] include = [true,false,false,true,false,true,true,false]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)