You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/05/24 20:01:43 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #35628: GH-35627: [C++][Format][Integration] Add string view to the arrow format

westonpace commented on code in PR #35628:
URL: https://github.com/apache/arrow/pull/35628#discussion_r1204688260


##########
docs/source/format/Columnar.rst:
##########
@@ -350,6 +352,38 @@ will be represented as follows: ::
     |----------------|----------------------|
     | joemark        | unspecified          |
 
+Variable-size Binary View Layout
+--------------------------------
+
+Each value in this layout consists of 0 or more bytes. These characters'
+locations are indicated using a **views** buffer, which may point to one
+of potentially several **data** buffers or may contain the characters
+inline.
+
+The views buffer contains `length` view structures with the following layout:
+
+::
+
+    * Short strings, length <= 12
+      | Bytes 0-3  | Bytes 4-15                            |
+      |------------|---------------------------------------|
+      | length     | data (padded with 0)                  |
+
+    * Long strings, length > 12
+      | Bytes 0-3  | Bytes 4-7  | Bytes 8-11 | Bytes 12-15 |
+      |------------|------------|------------|-------------|
+      | length     | prefix     | buf. index | offset      |

Review Comment:
   What is prefix?



##########
docs/source/format/Columnar.rst:
##########
@@ -350,6 +352,38 @@ will be represented as follows: ::
     |----------------|----------------------|
     | joemark        | unspecified          |
 
+Variable-size Binary View Layout
+--------------------------------
+
+Each value in this layout consists of 0 or more bytes. These characters'
+locations are indicated using a **views** buffer, which may point to one
+of potentially several **data** buffers or may contain the characters
+inline.
+
+The views buffer contains `length` view structures with the following layout:
+
+::
+
+    * Short strings, length <= 12
+      | Bytes 0-3  | Bytes 4-15                            |
+      |------------|---------------------------------------|
+      | length     | data (padded with 0)                  |
+
+    * Long strings, length > 12
+      | Bytes 0-3  | Bytes 4-7  | Bytes 8-11 | Bytes 12-15 |
+      |------------|------------|------------|-------------|
+      | length     | prefix     | buf. index | offset      |
+
+For the long string case, the buffer index indicates which character buffer
+stores the characters and the offset indicates where in that buffer the
+characters begin. All integers (length, buffer index, and offset) are unsigned.

Review Comment:
   Why unsigned?



##########
docs/source/format/Columnar.rst:
##########
@@ -350,6 +352,38 @@ will be represented as follows: ::
     |----------------|----------------------|
     | joemark        | unspecified          |
 
+Variable-size Binary View Layout
+--------------------------------
+
+Each value in this layout consists of 0 or more bytes. These characters'
+locations are indicated using a **views** buffer, which may point to one
+of potentially several **data** buffers or may contain the characters
+inline.
+
+The views buffer contains `length` view structures with the following layout:
+
+::
+
+    * Short strings, length <= 12
+      | Bytes 0-3  | Bytes 4-15                            |
+      |------------|---------------------------------------|
+      | length     | data (padded with 0)                  |
+
+    * Long strings, length > 12
+      | Bytes 0-3  | Bytes 4-7  | Bytes 8-11 | Bytes 12-15 |
+      |------------|------------|------------|-------------|
+      | length     | prefix     | buf. index | offset      |
+
+For the long string case, the buffer index indicates which character buffer

Review Comment:
   Is the buffer index 0 the validity buffer (and thus not suitable to use) or the first buffer past the views buffer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org