You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Prasanth J (JIRA)" <ji...@apache.org> on 2014/11/03 08:27:34 UTC

[jira] [Commented] (HIVE-8521) Document the ORC format

    [ https://issues.apache.org/jira/browse/HIVE-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194323#comment-14194323 ] 

Prasanth J commented on HIVE-8521:
----------------------------------

[~owen.omalley] I took a pass over the document. Mostly looks good. Few things
1) Section 4.4: "Runs start with an initial byte of 0x00 to 0xf7". Shouldn't it be 0x7f?
2) Section 4.5.1: "encoded if they type is signed" should be "the type"
3) Section 4.5.2: DEAD BEEF hex code :)
4) Section 4.5.3: I think we should revert the percentile back to 95. Since we only have 5 bits patch length we will not be able to encode lengths >32 which could happen if we consider 90th percentile (512 * 0.1 = 51 elements can be patched).
5) Section 5: The default stripe size is now 64MB. Do we need to mention that in this section?
6) Section 5.1: "DICTIONARY_DATA", "DIRECT_V2", "DICTIONARY_V2" has a stray "\" before _
7) Section 5.2.7: "definition was change" should be "changed"

> Document the ORC format
> -----------------------
>
>                 Key: HIVE-8521
>                 URL: https://issues.apache.org/jira/browse/HIVE-8521
>             Project: Hive
>          Issue Type: Bug
>          Components: Documentation, File Formats
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: orc-spec.pdf
>
>
> It is past time that we document the ORC file format. I've started and should have a first pass this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)