You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2023/01/16 10:30:25 UTC

[GitHub] [parquet-format] pitrou opened a new pull request, #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

pitrou opened a new pull request, #189:
URL: https://github.com/apache/parquet-format/pull/189

   DELTA_BYTE_ARRAY has been supported for FIXED_LEN_BYTE_ARRAY by parquet-mr since 2015 (see PARQUET-152). Update the spec in consequence.
   
   Also improve wording, markup and add an example.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383831257

   @emkornfield @gszadovszky @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou merged pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
pitrou merged PR #189:
URL: https://github.com/apache/parquet-format/pull/189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383840418

   Also cc @rok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on a diff in pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #189:
URL: https://github.com/apache/parquet-format/pull/189#discussion_r1081911430


##########
Encodings.md:
##########
@@ -299,9 +302,18 @@ For a longer description, see https://en.wikipedia.org/wiki/Incremental_encoding
 This is stored as a sequence of delta-encoded prefix lengths (DELTA_BINARY_PACKED), followed by
 the suffixes encoded as delta length byte arrays (DELTA_LENGTH_BYTE_ARRAY).
 
+For example, if the data was "axis", "axle", "babble", "babyhood":
+
+The encoded data would be comprised of the following segments:

Review Comment:
   ```suggestion
   For example, if the data was "axis", "axle", "babble", "babyhood"
   
   then the encoded data would be comprised of the following segments:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] pitrou commented on pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #189:
URL: https://github.com/apache/parquet-format/pull/189#issuecomment-1383830870

   @wjones127 Could you help review the wording?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-format] wjones127 commented on a diff in pull request #189: PARQUET-2231: [Format] Allow DELTA_BYTE_ARRAY for FIXED_LEN_BYTE_ARRAY

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #189:
URL: https://github.com/apache/parquet-format/pull/189#discussion_r1081899568


##########
Encodings.md:
##########
@@ -280,16 +280,19 @@ concatenated back to back. The expected savings is from the cost of encoding the
 and possibly better compression in the data (it is no longer interleaved with the lengths).
 
 The data stream looks like:
-
+```
 <Delta Encoded Lengths> <Byte Array Data>
+```
 
-For example, if the data was "Hello", "World", "Foobar", "ABCDEF":
+For example, if the data was "Hello", "World", "Foobar", "ABCDEF"
 
-The encoded data would be DeltaEncoding(5, 5, 6, 6) "HelloWorldFoobarABCDEF"
+The encoded data would be comprised of the following segments:

Review Comment:
   ```suggestion
   then the encoded data would be comprised of the following segments:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org