You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/16 16:55:02 UTC

[jira] [Work logged] (HIVE-12898) Hive should support ORC block skipping on nested fields

     [ https://issues.apache.org/jira/browse/HIVE-12898?focusedWorklogId=446697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446697 ]

ASF GitHub Bot logged work on HIVE-12898:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jun/20 16:54
            Start Date: 16/Jun/20 16:54
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on pull request #346:
URL: https://github.com/apache/hive/pull/346#issuecomment-644886598


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 446697)
    Remaining Estimate: 0h
            Time Spent: 10m

> Hive should support ORC block skipping on nested fields
> -------------------------------------------------------
>
>                 Key: HIVE-12898
>                 URL: https://issues.apache.org/jira/browse/HIVE-12898
>             Project: Hive
>          Issue Type: Improvement
>          Components: ORC
>    Affects Versions: 0.14.0, 1.2.1
>            Reporter: Michael Haeusler
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive supports predicate pushdown (block skipping) for ORC tables only on top-level fields. Hive should also support block skipping on nested fields (within structs).
> Example top-level: the following query selects 0 rows, using a predicate on top-level column foo. We also see 0 INPUT_RECORDS in the summary:
> {code:sql}
> SET hive.tez.exec.print.summary=true;
> CREATE TABLE t_toplevel STORED AS ORC AS SELECT 23 AS foo;
> SELECT * FROM t_toplevel WHERE foo=42 ORDER BY foo;
> [...]
> VERTICES         TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS    CPU_TIME_MILLIS     GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS
> Map 1                      1                0            0             1.22              2,640                102              0                0
> {code}
> Example nested: the following query also selects 0 rows, but using a predicate on nested column foo.bar. Unfortunately we see 1 INPUT_RECORDS in the summary:
> {code:sql}
> SET hive.tez.exec.print.summary=true;
> CREATE TABLE t_nested STORED AS ORC AS SELECT NAMED_STRUCT('bar', 23) AS foo;
> SELECT * FROM t_nested WHERE foo.bar=42 ORDER BY foo;
> [...]
> VERTICES         TOTAL_TASKS  FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS    CPU_TIME_MILLIS     GC_TIME_MILLIS  INPUT_RECORDS   OUTPUT_RECORDS
> Map 1                      1                0            0             3.66              5,210                 68              1                0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)