You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2019/01/09 22:16:47 UTC
[arrow] branch master updated: ARROW-3997: [Documentation] Clarify
dictionary index type
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 6b496f7 ARROW-3997: [Documentation] Clarify dictionary index type
6b496f7 is described below
commit 6b496f7c1929a0a371fe708ae653228a9e722150
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Wed Jan 9 16:16:40 2019 -0600
ARROW-3997: [Documentation] Clarify dictionary index type
Mandate signed integers for dictionary index types, without constraining integer width.
Author: Antoine Pitrou <an...@python.org>
Closes #3355 from pitrou/ARROW-3997-dictionary-encoding-doc and squashes the following commits:
4e05e2642 <Antoine Pitrou> ARROW-3997: Clarify dictionary index type
---
docs/source/format/Layout.rst | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/docs/source/format/Layout.rst b/docs/source/format/Layout.rst
index 69cbf06..f3e5290 100644
--- a/docs/source/format/Layout.rst
+++ b/docs/source/format/Layout.rst
@@ -614,13 +614,13 @@ Dictionary encoding
-------------------
When a field is dictionary encoded, the values are represented by an array of
-Int32 representing the index of the value in the dictionary. The Dictionary is
-received as one or more DictionaryBatches with the id referenced by a
-dictionary attribute defined in the metadata (Message.fbs) in the Field
-table. The dictionary has the same layout as the type of the field would
-dictate. Each entry in the dictionary can be accessed by its index in the
-DictionaryBatches. When a Schema references a Dictionary id, it must send at
-least one DictionaryBatch for this id.
+signed integers representing the index of the value in the dictionary.
+The Dictionary is received as one or more DictionaryBatches with the id
+referenced by a dictionary attribute defined in the metadata (Message.fbs)
+in the Field table. The dictionary has the same layout as the type of the
+field would dictate. Each entry in the dictionary can be accessed by its
+index in the DictionaryBatches. When a Schema references a Dictionary id,
+it must send at least one DictionaryBatch for this id.
As an example, you could have the following data: ::
@@ -640,16 +640,17 @@ As an example, you could have the following data: ::
In dictionary-encoded form, this could appear as: ::
data List<String> (dictionary-encoded, dictionary id i)
- indices: [0, 0, 0, 1, 1, 1, 0]
+ type: Int32
+ values:
+ [0, 0, 0, 1, 1, 1, 0]
dictionary i
-
- type: List<String>
-
- [
- ['a', 'b'],
- ['c', 'd', 'e'],
- ]
+ type: List<String>
+ values:
+ [
+ ['a', 'b'],
+ ['c', 'd', 'e'],
+ ]
References
----------