You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/10 23:07:22 UTC

[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r468233789



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       From what I can tell, the Date64 builder (neither in this PR nor in master) has never had an `Append(long)` method.  One could be added, but that's probably beyond the scope of this PR.  I can see the rationale for not doing a multiple-of-86400000 check on the equivalent method in the C++/Java APIs: the user knows they are building Date64 values, and if they are sophisticated enough to be providing the raw underlying 64-bit type, it seems reasonable to trust them to get it right.
   
   However, if the user is providing a conventional Date-like type, like `DateTime` or `DateTimeOffset`, my view is that the API ought to make it easy to do the right thing and difficult to do the wrong thing:
   
   * I believe we're definitely in agreement on what the Right Thing (TM) is for `DateTime`: the code looks at the year/month/day on it and considers that to be the date.  No ambiguity here.
   * We've already found conflicting viewpoints on how to interpret `DateTimeOffset` as a date, based on whether it's treated as an instant in time (take `DateTimeOffset.UtcDateTime.Date`) or a "zoned" date and time (take `DateTimeOffset.Date`).
   
   Regardless of how a date is interpreted, if the Arrow API offers a method accepting `DateTimeOffset` and converting into `Date64`, I feel that it should _never_ violate the spec when doing so:
   
   > Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no leap seconds), where the values are evenly divisible by 86400000
   
   Keeping the code as `value.ToUnixTimeMilliseconds()` makes it very easy for a user to violate the spec by accident, which is not good.
   
   Having spent a fair bit of time discusing this, I am tempted to _drop_ (or mark obsolete) all builder methods accepting `DateTimeOffset` and only accept `DateTime`, so there is no chance of misinterpretation: it would be up to the caller in possession of a `DateTimeOffset` to call `.Date` or `.UtcDateTime` as per their requirements - similar to something you advocated in an earlier comment in fact!  I'd be keen to hear your thoughts on this...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org