You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2013/11/04 18:17:22 UTC

[jira] [Commented] (HIVE-5562) Provide stripe level column statistics in ORC

    [ https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813025#comment-13813025 ] 

Owen O'Malley commented on HIVE-5562:
-------------------------------------

You can't change the method names in Reader without breaking API compatibility. Please change them back.

At some point, we should move the Footer's user metadata and file statistics to the metadata. That would involve making a new version (0.13) of ORC that I don't know if we want to sign up for yet.

It would be better if the reader only read the stripe statistics if it was required rather than just when the file was opened.

Other than that, it looks good.

> Provide stripe level column statistics in ORC
> ---------------------------------------------
>
>                 Key: HIVE-5562
>                 URL: https://issues.apache.org/jira/browse/HIVE-5562
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5562.1.patch.txt
>
>
> ORC maintains two levels of column statistics. Index statistics (for every rowgroup) and file level column statistics for the entire file. It is useful to have stripe level column statistics which will be intermediate to index and file statistics. The reason to maintain stripe level statistics is that, the current input split computation logic is based on stripe boundaries. So if stripe level statistics are available and if a stripe doesn't satisfy a predicate condition then that entire stripe (also split) can be eliminated from split computation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)