You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ap...@apache.org on 2021/03/31 15:51:42 UTC

[parquet-format] branch master updated: PARQUET-2011: Use "unit" for timestamp parameter, not "precision" (#161)

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 6eb4af7  PARQUET-2011: Use "unit" for timestamp parameter, not "precision" (#161)
6eb4af7 is described below

commit 6eb4af7dd225e13a1a4ea8b5d89dd4fc277cc6a7
Author: tanuja5 <19...@gmail.com>
AuthorDate: Wed Mar 31 21:21:31 2021 +0530

    PARQUET-2011: Use "unit" for timestamp parameter, not "precision" (#161)
    
    The written spec shouldn't diverge from the Thrift definitions.
---
 LogicalTypes.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/LogicalTypes.md b/LogicalTypes.md
index c605857..2904eaa 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -256,17 +256,17 @@ The sort order used for `DATE` is signed.
 
 `TIME` is used for a logical time type without a date with millisecond or microsecond precision.
 The type has two type parameters: UTC adjustment (`true` or `false`)
-and precision (`MILLIS` or `MICROS`, `NANOS`).
+and unit (`MILLIS` or `MICROS`, `NANOS`).
 
-`TIME` with precision `MILLIS` is used for millisecond precision.
+`TIME` with unit `MILLIS` is used for millisecond precision.
 It must annotate an `int32` that stores the number of
 milliseconds after midnight.
 
-`TIME` with precision `MICROS` is used for microsecond precision.
+`TIME` with unit `MICROS` is used for microsecond precision.
 It must annotate an `int64` that stores the number of
 microseconds after midnight.
 
-`TIME` with precision `NANOS` is used for nanosecond precision.
+`TIME` with unit `NANOS` is used for nanosecond precision.
 It must annotate an `int64` that stores the number of
 nanoseconds after midnight.
 
@@ -341,26 +341,26 @@ time-line and such interpertations are allowed on purpose.
 
 The `TIMESTAMP` type has two type parameters:
 - `isAdjustedToUTC` must be either `true` or `false`.
-- `precision` must be one of `MILLIS`, `MICROS` or `NANOS`. This list is subject
-  to potential expansion in the future. Upon reading, unknown `precision`-s must
+- `unit` must be one of `MILLIS`, `MICROS` or `NANOS`. This list is subject
+  to potential expansion in the future. Upon reading, unknown `unit`-s must
   be handled as unsupported features (rather than as errors in the data files).
 
 #### Instant semantics (timestamps normalized to UTC)
 
 A `TIMESTAMP` with `isAdjustedToUTC=true` is defined as the number of
-milliseconds, microseconds or nanoseconds (depending on the `precision`
+milliseconds, microseconds or nanoseconds (depending on the `unit`
 parameter being `MILLIS`, `MICROS` or `NANOS`, respectively) elapsed since the
 Unix epoch, 1970-01-01 00:00:00 UTC. Each such value unambiguously identifies a
 single instant on the time-line.
 
-For example, in a `TIMESTAMP(isAdjustedToUTC=true, precision=MILLIS)`, the
+For example, in a `TIMESTAMP(isAdjustedToUTC=true, unit=MILLIS)`, the
 number 172800000 corresponds to 1970-01-03 00:00:00 UTC, because it is equal to
 2 * 24 * 60 * 60 * 1000, therefore it is exactly two days from the reference
 point, the Unix epoch. In Java, this calculation can be achieved by calling
 `Instant.ofEpochMilli(172800000)`.
 
 As a slightly more complicated example, if one wants to store 1970-01-03
-00:00:00 (UTC+01:00) as a `TIMESTAMP(isAdjustedToUTC=true, precision=MILLIS)`,
+00:00:00 (UTC+01:00) as a `TIMESTAMP(isAdjustedToUTC=true, unit=MILLIS)`,
 first the time zone offset has to be dealt with. By normalizing the timestamp to
 UTC, we calculate what time in UTC corresponds to the same instant: 1970-01-02
 23:00:00 UTC. This is 1 day and 23 hours after the epoch, therefore it can be
@@ -397,7 +397,7 @@ local timestamp. We define the reference local timestamp to be 1970-01-01
 00:00:00 (note the lack of UTC at the end, as this is not an instant). This way
 the encoding of local timestamp values becomes very similar to the encoding of
 instant values. For example, in a `TIMESTAMP(isAdjustedToUTC=false,
-precision=MILLIS)`, the number 172800000 corresponds to 1970-01-03 00:00:00
+unit=MILLIS)`, the number 172800000 corresponds to 1970-01-03 00:00:00
 (note the lack of UTC at the end), because it is exactly two days from the
 reference point (172800000 = 2 * 24 * 60 * 60 * 1000).
 
@@ -440,10 +440,10 @@ second and subsecond values can be encoded into an `int64`. Most notably:
   - minute = 61
   - month = 13
   - day = 29, month = 2, year = any non-leap year
-- Due to the range of the `int64` type, timestamps using the `NANOS` precision
+- Due to the range of the `int64` type, timestamps using the `NANOS` unit
   can only represent values between 1677-09-21 00:12:43 and 2262-04-11 23:47:16.
   Values outside of this range can not be represented with the `NANOS`
-  precision. (Other precisions have similar limits but those are outside of the
+  unit. (Other precisions have similar limits but those are outside of the
   domain for practical everyday usage.)
 
 The sort order used for `TIMESTAMP` is signed.