You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/04/23 15:24:55 UTC

[GitHub] [orc] wgtmac commented on issue #1475: [C++] Order of row index streams does not match the order of streams in the file footer

wgtmac commented on issue #1475:
URL: https://github.com/apache/orc/issues/1475#issuecomment-1519092569

   Thanks for reporting the issue! @vuule
   
   The order of data streams are **NOT FIXED** meaning that:
   - In a direct-encoded string columns, `DATA stream` can be placed **BEFORE** or **AFTER** `LENGTH stream`. Same flexibility for `PRESENT stream`.
   - Even data streams of different columns can be interleaved.
   
   However, the order of positions in a index stream is **FIXED**. So for a direct-encoded string column, its `INDEX stream` always put positions in this order: `PRESENT stream` (if exists), `DATA stream` and `LENGTH stream`.
   
   I checked the [specs](https://orc.apache.org/specification/ORCv1/) and it does not state this clearly. It would be a good time to document this as well. @deshanxiao 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org