You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/05/11 02:17:39 UTC

[GitHub] [orc] wgtmac commented on a diff in pull request #1465: ORC-1409: [Docs] Add stream order description in ORC spec.

wgtmac commented on code in PR #1465:
URL: https://github.com/apache/orc/pull/1465#discussion_r1190568745


##########
site/specification/ORCv1.md:
##########
@@ -886,6 +886,17 @@ uses three streams PRESENT, DATA, and LENGTH, which stores the length
 of each value. The details of each type will be presented in the
 following subsections.
 
+There are a few points to note about the order of streams:
+
+* For a specific column type, the order of streams is **not fixed**.
+* Index and data streams cannot be **interleaved**.
+* The order of streams from different columns is **not fixed** as well.

Review Comment:
   ```suggestion
   There is a general order for index and data streams:
   * Index streams are always placed together in the beginning of the stripe.
   * Data streams are placed together after index streams (if any).
   * Inside index streams or data streams, the unencrypted streams should be placed first and then followed by streams grouped by each encryption variant.
   
   There is no fixed order within each unencrypted or encryption variant in the index and data streams:
   * Different stream kinds of the same column can be placed in any order.
   * Streams from different columns can even be placed in any order.
   
   To get the precise information (a.k.a stream kind, column id and location) of a stream within a stripe, the streams field in the StripeFooter described below is the single source of truth.
   ```



##########
site/specification/ORCv1.md:
##########
@@ -886,6 +886,17 @@ uses three streams PRESENT, DATA, and LENGTH, which stores the length
 of each value. The details of each type will be presented in the
 following subsections.
 
+There are a few points to note about the order of streams:
+
+* For a specific column type, the order of streams is **not fixed**.
+* Index and data streams cannot be **interleaved**.
+* The order of streams from different columns is **not fixed** as well.

Review Comment:
   Also I would prefer to move them to below line 907.



##########
site/specification/ORCv1.md:
##########
@@ -886,6 +886,17 @@ uses three streams PRESENT, DATA, and LENGTH, which stores the length
 of each value. The details of each type will be presented in the
 following subsections.
 
+There are a few points to note about the order of streams:
+
+* For a specific column type, the order of streams is **not fixed**.
+* Index and data streams cannot be **interleaved**.
+* The order of streams from different columns is **not fixed** as well.

Review Comment:
   These statements look a little bit vague to me, I have changed them to provide fixed order and flexible order. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org