You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by bl...@apache.org on 2014/08/07 20:26:30 UTC

git commit: PARQUET-12: Add specs for new logical types.

Repository: incubator-parquet-format
Updated Branches:
  refs/heads/master b3e928e48 -> 5b24637f7


PARQUET-12: Add specs for new logical types.

This adds the new logical types from #3 to the LogicalTypes.md specification.

Author: Ryan Blue <rb...@cloudera.com>

Closes #5 from rdblue/PARQUET-12-add-new-type-docs and squashes the following commits:

be414fe [Ryan Blue] PARQUET-12: Add specs for new logical types.


Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/commit/5b24637f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/tree/5b24637f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/diff/5b24637f

Branch: refs/heads/master
Commit: 5b24637f7df6d9d34f1ed5991d4e0b3aed165b3a
Parents: b3e928e
Author: Ryan Blue <rb...@cloudera.com>
Authored: Thu Aug 7 11:26:14 2014 -0700
Committer: Ryan Blue <rb...@cloudera.com>
Committed: Thu Aug 7 11:26:14 2014 -0700

----------------------------------------------------------------------
 LogicalTypes.md | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-parquet-format/blob/5b24637f/LogicalTypes.md
----------------------------------------------------------------------
diff --git a/LogicalTypes.md b/LogicalTypes.md
index 96775af..b685813 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -18,6 +18,39 @@ may require additional metadata fields, as well as rules for those fields.
 `UTF8` may only be used to annotate the binary primitive type and indicates
 that the byte array should be interpreted as a UTF-8 encoded character string.
 
+## Numeric Types
+
+### Signed Integers
+
+`INT_8`, `INT_16`, `INT_32`, and `INT_64` annotations can be used to specify
+the maximum number of bits in the stored value.  Implementations may use these
+annotations to produce smaller in-memory representations when reading data.
+
+If a stored value is larger than the maximum allowed by the annotation, the
+behavior is not defined and can be determined by the implementation.
+Implementations must not write values that are larger than the annotation
+allows.
+
+`INT_8`, `INT_16`, and `INT_32` must annotate an `int32` primitive type and
+`INT_64` must annotate an `int64` primitive type. `INT_32` and `INT_64` are
+implied by the `int32` and `int64` primitive types if no other annotation is
+present and should be considered optional.
+
+### Unsigned Integers
+
+`UINT_8`, `UINT_16`, `UINT_32`, and `UINT_64` annotations can be used to
+specify unsigned integer types, along with a maximum number of bits in the
+stored value. Implementations may use these annotations to produce smaller
+in-memory representations when reading data.
+
+If a stored value is larger than the maximum allowed by the annotation, the
+behavior is not defined and can be determined by the implementation.
+Implementations must not write values that are larger than the annotation
+allows.
+
+`UINT_8`, `UINT_16`, and `UINT_32` must annotate an `int32` primitive type and
+`UINT_64` must annotate an `int64` primitive type.
+
 ### DECIMAL
 
 `DECIMAL` annotation represents arbitrary-precision signed decimal numbers of
@@ -45,3 +78,54 @@ integer. A precision too large for the underlying type (see below) is an error.
 
 A `SchemaElement` with the `DECIMAL` `ConvertedType` must also have both
 `scale` and `precision` fields set, even if scale is 0 by default.
+
+## Date/Time Types
+
+### DATE
+
+`DATE` is used to for a logical date type, without a time of day. It must
+annotate an `int32` that stores the number of days from the Unix epoch, 1
+January 1970.
+
+### TIME_MILLIS
+
+`TIME_MILLIS` is used for a logical time type, without a date. It must annotate
+an `int32` that stores the number of milliseconds after midnight.
+
+### TIMESTAMP_MILLIS
+
+`TIMESTAMP_MILLIS` is used for a combined logical date and time type. It must
+annotate an `int64` that stores the number of milliseconds from the Unix epoch,
+00:00:00.000 on 1 January 1970, UTC.
+
+### INTERVAL
+
+`INTERVAL` is used for an interval of time. It must annotate a
+`fixed_len_byte_array` of length 12. This array stores three little-endian
+unsigned integers that represent durations at different granularities of time.
+The first stores a number in months, the second stores a number in days, and
+the third stores a number in milliseconds. This representation is independent
+of any particular timezone or date.
+
+Each component in this representation is independent of the others. For
+example, there is no requirement that a large number of days should be
+expressed as a mix of months and days because there is not a constant
+conversion from days to months.
+
+## Embedded Types
+
+### JSON
+
+`JSON` is used for an embedded JSON document. It must annotate a `binary`
+primitive type. The `binary` data is interpreted as a UTF-8 encoded character
+string of valid JSON as defined by the [JSON specification][json-spec]
+
+[json-spec]: http://json.org/
+
+### BSON
+
+`BSON` is used for an embedded BSON document. It must annotate a `binary`
+primitive type. The `binary` data is interpreted as an encoded BSON document as
+defined by the [BSON specification][bson-spec].
+
+[bson-spec]: http://bsonspec.org/spec.html