You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by bl...@apache.org on 2017/10/06 23:57:24 UTC

parquet-format git commit: PARQUET-322 Document ENUM as a logical type.

Repository: parquet-format
Updated Branches:
  refs/heads/master e127c3f7f -> f59258a05


PARQUET-322 Document ENUM as a logical type.

Author: Jakub Kukul <ja...@mbr-targeting.com>

Closes #54 from jkukul/master and squashes the following commits:

a2490b2 [Jakub Kukul] PARQUET-322 Document ENUM as a logical type.


Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/f59258a0
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/f59258a0
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/f59258a0

Branch: refs/heads/master
Commit: f59258a0519fb4ed8fa25a88593a2d034ce909c6
Parents: e127c3f
Author: Jakub Kukul <ja...@mbr-targeting.com>
Authored: Fri Oct 6 16:57:21 2017 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Fri Oct 6 16:57:21 2017 -0700

----------------------------------------------------------------------
 LogicalTypes.md | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-format/blob/f59258a0/LogicalTypes.md
----------------------------------------------------------------------
diff --git a/LogicalTypes.md b/LogicalTypes.md
index 6e5c9db..c50b96b 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -32,13 +32,24 @@ This file contains the specification for all logical types.
 The parquet format's `ConvertedType` stores the type annotation. The annotation
 may require additional metadata fields, as well as rules for those fields.
 
-### UTF8 (Strings)
+## String Types
+
+### UTF8
 
 `UTF8` may only be used to annotate the binary primitive type and indicates
 that the byte array should be interpreted as a UTF-8 encoded character string.
 
 The sort order used for `UTF8` strings is unsigned byte-wise comparison.
 
+### ENUM
+
+`ENUM` annotates the binary primitive type and indicates that the value
+was converted from an enumerated type in another data model (e.g. Thrift, Avro, Protobuf).
+Applications using a data model lacking a native enum type should interpret `ENUM`
+annotated field as a UTF-8 encoded string. 
+
+The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison.
+
 ## Numeric Types
 
 ### Signed Integers