You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2019/01/09 22:16:47 UTC

[arrow] branch master updated: ARROW-3997: [Documentation] Clarify dictionary index type

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 6b496f7  ARROW-3997: [Documentation] Clarify dictionary index type
6b496f7 is described below

commit 6b496f7c1929a0a371fe708ae653228a9e722150
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Wed Jan 9 16:16:40 2019 -0600

    ARROW-3997: [Documentation] Clarify dictionary index type
    
    Mandate signed integers for dictionary index types, without constraining integer width.
    
    Author: Antoine Pitrou <an...@python.org>
    
    Closes #3355 from pitrou/ARROW-3997-dictionary-encoding-doc and squashes the following commits:
    
    4e05e2642 <Antoine Pitrou> ARROW-3997:  Clarify dictionary index type
---
 docs/source/format/Layout.rst | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/docs/source/format/Layout.rst b/docs/source/format/Layout.rst
index 69cbf06..f3e5290 100644
--- a/docs/source/format/Layout.rst
+++ b/docs/source/format/Layout.rst
@@ -614,13 +614,13 @@ Dictionary encoding
 -------------------
 
 When a field is dictionary encoded, the values are represented by an array of
-Int32 representing the index of the value in the dictionary.  The Dictionary is
-received as one or more DictionaryBatches with the id referenced by a
-dictionary attribute defined in the metadata (Message.fbs) in the Field
-table.  The dictionary has the same layout as the type of the field would
-dictate. Each entry in the dictionary can be accessed by its index in the
-DictionaryBatches.  When a Schema references a Dictionary id, it must send at
-least one DictionaryBatch for this id.
+signed integers representing the index of the value in the dictionary.
+The Dictionary is received as one or more DictionaryBatches with the id
+referenced by a dictionary attribute defined in the metadata (Message.fbs)
+in the Field table.  The dictionary has the same layout as the type of the
+field would dictate. Each entry in the dictionary can be accessed by its
+index in the DictionaryBatches.  When a Schema references a Dictionary id,
+it must send at least one DictionaryBatch for this id.
 
 As an example, you could have the following data: ::
 
@@ -640,16 +640,17 @@ As an example, you could have the following data: ::
 In dictionary-encoded form, this could appear as: ::
 
     data List<String> (dictionary-encoded, dictionary id i)
-    indices: [0, 0, 0, 1, 1, 1, 0]
+       type: Int32
+       values:
+       [0, 0, 0, 1, 1, 1, 0]
 
     dictionary i
-
-    type: List<String>
-
-    [
-     ['a', 'b'],
-     ['c', 'd', 'e'],
-    ]
+       type: List<String>
+       values:
+       [
+        ['a', 'b'],
+        ['c', 'd', 'e'],
+       ]
 
 References
 ----------