You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/04/23 15:24:55 UTC
[GitHub] [orc] wgtmac commented on issue #1475: [C++] Order of row index streams does not match the order of streams in the file footer
wgtmac commented on issue #1475:
URL: https://github.com/apache/orc/issues/1475#issuecomment-1519092569
Thanks for reporting the issue! @vuule
The order of data streams are **NOT FIXED** meaning that:
- In a direct-encoded string columns, `DATA stream` can be placed **BEFORE** or **AFTER** `LENGTH stream`. Same flexibility for `PRESENT stream`.
- Even data streams of different columns can be interleaved.
However, the order of positions in a index stream is **FIXED**. So for a direct-encoded string column, its `INDEX stream` always put positions in this order: `PRESENT stream` (if exists), `DATA stream` and `LENGTH stream`.
I checked the [specs](https://orc.apache.org/specification/ORCv1/) and it does not state this clearly. It would be a good time to document this as well. @deshanxiao
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org