You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Prasanth J (JIRA)" <ji...@apache.org> on 2013/11/01 20:58:18 UTC

[jira] [Commented] (HIVE-5632) Eliminate splits based on SARGs using stripe statistics in ORC

    [ https://issues.apache.org/jira/browse/HIVE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811604#comment-13811604 ] 

Prasanth J commented on HIVE-5632:
----------------------------------

[~ehans] Sorry about my comment about in-memory skips above. For every 10,000 rows or configured number of rows (orc.row.index.stride) ORC creates disk ranges (byte ranges) that are required to be read. Only the disk ranges that satisfies min/max conditions will be read. 

> Eliminate splits based on SARGs using stripe statistics in ORC
> --------------------------------------------------------------
>
>                 Key: HIVE-5632
>                 URL: https://issues.apache.org/jira/browse/HIVE-5632
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-5632.1.patch.txt, HIVE-5632.2.patch.txt, HIVE-5632.3.patch.txt, orc_split_elim.orc
>
>
> HIVE-5562 provides stripe level statistics in ORC. Stripe level statistics combined with predicate pushdown in ORC (HIVE-4246) can be used to eliminate the stripes (thereby splits) that doesn't satisfy the predicate condition. This can greatly reduce unnecessary reads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)