You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by bl...@apache.org on 2017/10/06 23:57:24 UTC
parquet-format git commit: PARQUET-322 Document ENUM as a logical
type.
Repository: parquet-format
Updated Branches:
refs/heads/master e127c3f7f -> f59258a05
PARQUET-322 Document ENUM as a logical type.
Author: Jakub Kukul <ja...@mbr-targeting.com>
Closes #54 from jkukul/master and squashes the following commits:
a2490b2 [Jakub Kukul] PARQUET-322 Document ENUM as a logical type.
Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/f59258a0
Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/f59258a0
Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/f59258a0
Branch: refs/heads/master
Commit: f59258a0519fb4ed8fa25a88593a2d034ce909c6
Parents: e127c3f
Author: Jakub Kukul <ja...@mbr-targeting.com>
Authored: Fri Oct 6 16:57:21 2017 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Fri Oct 6 16:57:21 2017 -0700
----------------------------------------------------------------------
LogicalTypes.md | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/parquet-format/blob/f59258a0/LogicalTypes.md
----------------------------------------------------------------------
diff --git a/LogicalTypes.md b/LogicalTypes.md
index 6e5c9db..c50b96b 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -32,13 +32,24 @@ This file contains the specification for all logical types.
The parquet format's `ConvertedType` stores the type annotation. The annotation
may require additional metadata fields, as well as rules for those fields.
-### UTF8 (Strings)
+## String Types
+
+### UTF8
`UTF8` may only be used to annotate the binary primitive type and indicates
that the byte array should be interpreted as a UTF-8 encoded character string.
The sort order used for `UTF8` strings is unsigned byte-wise comparison.
+### ENUM
+
+`ENUM` annotates the binary primitive type and indicates that the value
+was converted from an enumerated type in another data model (e.g. Thrift, Avro, Protobuf).
+Applications using a data model lacking a native enum type should interpret `ENUM`
+annotated field as a UTF-8 encoded string.
+
+The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison.
+
## Numeric Types
### Signed Integers