You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by tu...@apache.org on 2024/01/22 11:21:27 UTC

(arrow-rs) branch master updated: Enhance Date64 type documentation (#5323)

This is an automated email from the ASF dual-hosted git repository.

tustvold pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
     new b594d9063a Enhance Date64 type documentation (#5323)
b594d9063a is described below

commit b594d9063a55c503ae67cec2809fe3d2fa472bfa
Author: Jeffrey Vo <je...@gmail.com>
AuthorDate: Mon Jan 22 22:21:21 2024 +1100

    Enhance Date64 type documentation (#5323)
    
    * Enhance Date64 type documentation
    
    * Update arrow-schema/src/datatype.rs
    
    Co-authored-by: Raphael Taylor-Davies <17...@users.noreply.github.com>
    
    * Update arrow-schema/src/datatype.rs
    
    Co-authored-by: Raphael Taylor-Davies <17...@users.noreply.github.com>
    
    * Update arrow-schema/src/datatype.rs
    
    Co-authored-by: Raphael Taylor-Davies <17...@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Raphael Taylor-Davies <17...@users.noreply.github.com>
---
 arrow-schema/src/datatype.rs | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arrow-schema/src/datatype.rs b/arrow-schema/src/datatype.rs
index 6276a99a47..a5bd66b50c 100644
--- a/arrow-schema/src/datatype.rs
+++ b/arrow-schema/src/datatype.rs
@@ -145,10 +145,31 @@ pub enum DataType {
     /// ```
     Timestamp(TimeUnit, Option<Arc<str>>),
     /// A signed 32-bit date representing the elapsed time since UNIX epoch (1970-01-01)
-    /// in days (32 bits).
+    /// in days.
     Date32,
     /// A signed 64-bit date representing the elapsed time since UNIX epoch (1970-01-01)
-    /// in milliseconds (64 bits). Values are evenly divisible by 86400000.
+    /// in milliseconds.
+    ///
+    /// According to the specification (see [Schema.fbs]), this should be treated as the number of
+    /// days, in milliseconds, since the UNIX epoch. Therefore, values must be evenly divisible by
+    /// `86_400_000` (the number of milliseconds in a standard day).
+    ///
+    /// The reason for this is for compatibility with other language's native libraries,
+    /// such as Java, which historically lacked a dedicated date type
+    /// and only supported timestamps.
+    ///
+    /// Practically, validation that values of this type are evenly divisible by `86_400_000` is not enforced
+    /// by this library for performance and usability reasons. Date64 values will be treated similarly to the
+    /// `Timestamp(TimeUnit::Millisecond, None)` type, in that its values will be printed showing the time of
+    /// day if the value does not represent an exact day, and arithmetic can be done at the millisecond
+    /// granularity to change the time represented.
+    ///
+    /// Users should prefer using Date32 to cleanly represent the number of days, or one of the Timestamp
+    /// variants to include time as part of the representation, depending on their use case.
+    ///
+    /// For more details, see [#5288](https://github.com/apache/arrow-rs/issues/5288).
+    ///
+    /// [Schema.fbs]: https://github.com/apache/arrow/blob/main/format/Schema.fbs
     Date64,
     /// A signed 32-bit time representing the elapsed time since midnight in the unit of `TimeUnit`.
     /// Must be either seconds or milliseconds.