You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/07 09:39:14 UTC

[GitHub] [arrow] mr-smidge opened a new pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

mr-smidge opened a new pull request #7654:
URL: https://github.com/apache/arrow/pull/7654


   This PR introduces a _breaking change_ to the public API for `Date32Array` and `Date64Array` by changing the accepted and returned data type from `Sysetm.DateTimeOffset` to `System.DateTime`.
   
   The rationale for making this change is explained in detail on the [JIRA ticket](https://issues.apache.org/jira/browse/ARROW-8581), but briefly: making this change avoids a class of bugs that manifest based on the user's timezone, and where it is very easy to unknowingly fall into a trap.
   
   In addition, the new unit tests bring `Date32Array` and `Date64Array` up to 100% test coverage.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r453673591



##########
File path: csharp/src/Apache.Arrow/Arrays/DateArrayBuilder.cs
##########
@@ -0,0 +1,179 @@
+//------------------------------------------------------------------------------
+// <copyright file="DateArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using System.Collections.Generic;
+using System.Linq;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DateArrayBuilder{TUnderlyingType,TArray,TBuilder}"/> class is an abstract array builder that can
+    /// accept dates in the form of <see cref="DateTime"/> or <see cref="DateTimeOffset"/> and convert to some
+    /// underlying date representation.
+    /// </summary>
+    public abstract class DateArrayBuilder<TUnderlyingType, TArray, TBuilder> :

Review comment:
       This new class is a bit like splitting `PrimitiveArrayBuilder<TFrom, TTo, ...>` into two: one class whose interface is to do with accepting values of differing types for conversion, and another (`DelegatingArrayBuilder`) which simply delegates to an 'inner' array builder for methods like `Resize()` that don't care about the type of value.
   
   For clarity:  `PrimitiveArrayBuilder<TFrom, TTo, ...>` would no longer work as a base class for `DateXXArray.Builder` because the builder now accepts two different kinds of type for writing to the array.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r453675969



##########
File path: csharp/src/Apache.Arrow/Arrays/DelegatingArrayBuilder.cs
##########
@@ -0,0 +1,96 @@
+//------------------------------------------------------------------------------
+// <copyright file="DelegatingArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using Apache.Arrow.Memory;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DelegatingArrayBuilder{T,TArray,TBuilder}"/> class can be used as the base for any array builder
+    /// that needs to delegate most of its functionality to an inner array builder.
+    /// </summary>
+    /// <remarks>
+    /// The typical use case is when an array builder may accept a number of different types as input, but which are
+    /// all internally converted to a single type for assembly into an array.
+    /// </remarks>
+    /// <typeparam name="T">Type of item accepted by inner array builder.</typeparam>
+    /// <typeparam name="TArray">Type of array produced by this (and the inner) builder.</typeparam>
+    /// <typeparam name="TBuilder">Type of builder (see Curiously-Recurring Template Pattern).</typeparam>
+    public abstract class DelegatingArrayBuilder<T, TArray, TBuilder> : IArrowArrayBuilder<TArray, TBuilder>

Review comment:
       Perhaps  `PrimitiveArrayBuilder<TFrom, TTo, ...>`  could now use this as a base class?
   
   The issue is that the only remaining user of  `PrimitiveArrayBuilder<TFrom, TTo, ...>`  is now `TimestampArray`, and that class has a number of problems that mean it may need to be refactored to derive from a different base class (in short: it needs to understand IANA/Olson timezone identifiers, and NodaTime is the obvious library to do that, but if we include NodaTime then the class might as well accept `NodaTime.Instant`, etc).  This made me think that  `PrimitiveArrayBuilder<TFrom, TTo, ...>` should probably be marked obsolete.
   
   Opinions welcomed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r456557817



##########
File path: csharp/test/Apache.Arrow.Tests/Date32ArrayTests.cs
##########
@@ -20,28 +20,46 @@ namespace Apache.Arrow.Tests
 {
     public class Date32ArrayTests
     {
-        public class Set
+        public class GetDate

Review comment:
       Done.

##########
File path: csharp/test/Apache.Arrow.Tests/Date32ArrayTests.cs
##########
@@ -20,28 +20,46 @@ namespace Apache.Arrow.Tests
 {
     public class Date32ArrayTests
     {
-        public class Set
+        public class GetDate
         {
             [Fact]
-            public void SetAndGet()
+            public void SetAndGetNull()
             {
-                var now = DateTimeOffset.UtcNow;
+                // Arrange
+                var array = new Date32Array.Builder()
+                    .AppendNull()
+                    .Build();
+
+                // Act
+                var actual = array.GetDateTime(0);

Review comment:
       Added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-661361050


   I think that a member of the "arrow-committers" team can use "Re-run jobs". (I re-ran jobs.)
   
   Others can use `git commit --amend && git push --force ...` or ask a committer to use "Re-run jobs".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-656832270


   Can we just introduce new `Append(DateTime value)` and `GetDateAsDateTime(int index)` APIs instead of making a breaking change?
   
   In general, DateTimeOffset is the preferred type to use when talking about dates and times. The reasoning is because exactly what you point out in the JIRA issue - DateTime.Kind is very confusing and can cause a lot of problems. So we generally recommend users to use DateTimeOffset unless they have a strong reason to use DateTime.
   
   So I'd suggest to keep the existing code (the builder is based on DateTimeOffset), but to also add new `Append(DateTime value)` overloads to solve the issue you logged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r468233789



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       From what I can tell, the Date64 builder (neither in this PR nor in master) has never had an `Append(long)` method.  One could be added, but that's probably beyond the scope of this PR.  I can see the rationale for not doing a multiple-of-86400000 check on the equivalent method in the C++/Java APIs: the user knows they are building Date64 values, and if they are sophisticated enough to be providing the raw underlying 64-bit type, it seems reasonable to trust them to get it right.
   
   However, if the user is providing a conventional Date-like type, like `DateTime` or `DateTimeOffset`, my view is that the API ought to make it easy to do the right thing and difficult to do the wrong thing:
   
   * I believe we're definitely in agreement on what the Right Thing (TM) is for `DateTime`: the code looks at the year/month/day on it and considers that to be the date.  No ambiguity here.
   * We've already found conflicting viewpoints on how to interpret `DateTimeOffset` as a date, based on whether it's treated as an instant in time (take `DateTimeOffset.UtcDateTime.Date`) or a "zoned" date and time (take `DateTimeOffset.Date`).
   
   Regardless of how a date is interpreted, if the Arrow API offers a method accepting `DateTimeOffset` and converting into `Date64`, I feel that it should _never_ violate the spec when doing so:
   
   > Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no leap seconds), where the values are evenly divisible by 86400000
   
   Keeping the code as `value.ToUnixTimeMilliseconds()` makes it very easy for a user to violate the spec by accident, which is not good.
   
   Having spent a fair bit of time discussing this, I am tempted to _drop_ (or mark obsolete) all builder methods accepting `DateTimeOffset` and only accept `DateTime`, so there is no chance of misinterpretation: it would be up to the caller in possession of a `DateTimeOffset` to call `.Date` or `.UtcDateTime` as per their requirements - similar to something you advocated in an earlier comment in fact!  I'd be keen to hear your thoughts on this...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge edited a comment on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge edited a comment on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-657438635


   > In general, DateTimeOffset is the preferred type to use when talking about dates and times. The reasoning is because exactly what you point out in the JIRA issue - DateTime.Kind is very confusing and can cause a lot of problems.
   
   I suspect this is a matter of codebase/team preference.  By contrast, my organisation previously recommended `DateTime` for dates.  I say "previously" because now we would use NodaTime's `LocalDate` instead :-).
   
   Nonetheless, in the absence of a proper type just for dates, I can see arguments for either, and I'll modify the API accordingly to accept both in a non-breaking manner.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r456091623



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       I believe this should stay `value.ToUnixTimeMilliseconds()`. Same for Date32 above.

##########
File path: csharp/test/Apache.Arrow.Tests/Date32ArrayTests.cs
##########
@@ -20,28 +20,46 @@ namespace Apache.Arrow.Tests
 {
     public class Date32ArrayTests
     {
-        public class Set
+        public class GetDate

Review comment:
       The `GetDate` method is now obsolete. So this should probably be updated to the new method name.

##########
File path: csharp/src/Apache.Arrow/Arrays/DelegatingArrayBuilder.cs
##########
@@ -0,0 +1,96 @@
+//------------------------------------------------------------------------------
+// <copyright file="DelegatingArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using Apache.Arrow.Memory;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DelegatingArrayBuilder{T,TArray,TBuilder}"/> class can be used as the base for any array builder
+    /// that needs to delegate most of its functionality to an inner array builder.
+    /// </summary>
+    /// <remarks>
+    /// The typical use case is when an array builder may accept a number of different types as input, but which are
+    /// all internally converted to a single type for assembly into an array.
+    /// </remarks>
+    /// <typeparam name="T">Type of item accepted by inner array builder.</typeparam>
+    /// <typeparam name="TArray">Type of array produced by this (and the inner) builder.</typeparam>
+    /// <typeparam name="TBuilder">Type of builder (see Curiously-Recurring Template Pattern).</typeparam>
+    public abstract class DelegatingArrayBuilder<T, TArray, TBuilder> : IArrowArrayBuilder<TArray, TBuilder>

Review comment:
       Do we have to have this extra layer of abstraction? If we don't think TimestampArray could use this base class, then maybe let's wait with this extra base class. We can always refactor it out of `DateArrayBuilder` later, when it provides value.

##########
File path: csharp/test/Apache.Arrow.Tests/Date32ArrayTests.cs
##########
@@ -20,28 +20,46 @@ namespace Apache.Arrow.Tests
 {
     public class Date32ArrayTests
     {
-        public class Set
+        public class GetDate
         {
             [Fact]
-            public void SetAndGet()
+            public void SetAndGetNull()
             {
-                var now = DateTimeOffset.UtcNow;
+                // Arrange
+                var array = new Date32Array.Builder()
+                    .AppendNull()
+                    .Build();
+
+                // Act
+                var actual = array.GetDateTime(0);

Review comment:
       Can we also get tests for `GetDateTimeOffset`?

##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;
             }
         }
 
-        public Date64Array(ArrayData data) 
+        public Date64Array(ArrayData data)
             : base(data)
         {
             data.EnsureDataType(ArrowTypeId.Date64);
         }
 
         public override void Accept(IArrowArrayVisitor visitor) => Accept(this, visitor);
 
-        public DateTimeOffset? GetDate(int index)
+        [Obsolete("Use `GetDateTimeOffset()` instead")]
+        public DateTimeOffset? GetDate(int index) => GetDateTimeOffset(index);
+
+        /// <summary>
+        /// Get the date at the specified index in the form of a <see cref="DateTime"/> object.
+        /// </summary>
+        /// <remarks>
+        /// The <see cref="DateTime.Kind"/> property of the returned object is set to
+        /// <see cref="DateTimeKind.Unspecified"/>.
+        /// </remarks>
+        /// <param name="index">Index at which to get the date.</param>
+        /// <returns>Returns a <see cref="DateTime"/> object, or <c>null</c> if there is no object at that index.
+        /// </returns>
+        public DateTime? GetDateTime(int index)
         {
             long? value = GetValue(index);
+            return value.HasValue
+                ? _epoch.AddMilliseconds(value.Value)
+                : default(DateTime?);
+        }
 
-            if (!value.HasValue)
-            {
-                return default;
-            }
-
-            return DateTimeOffset.FromUnixTimeMilliseconds(value.Value);
+        /// <summary>
+        /// Get the date at the specified index in the form of a <see cref="DateTimeOffset"/> object.
+        /// </summary>
+        /// <param name="index">Index at which to get the date.</param>
+        /// <returns>Returns a <see cref="DateTimeOffset"/> object, or <c>null</c> if there is no object at that index.
+        /// </returns>
+        public DateTimeOffset? GetDateTimeOffset(int index)
+        {
+            long? value = GetValue(index);
+            return value.HasValue
+                ? new DateTimeOffset(_epoch.AddMilliseconds(value.Value), TimeSpan.Zero)

Review comment:
       I believe this should stay `return DateTimeOffset.FromUnixTimeMilliseconds(value.Value);`. Same for Date32 above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-660971998


   @eerhardt , is there a way to re-trigger build checks?  The `C# / AMD64 MacOS 10.15 C# 2.2.103 (pull_request)` check failed with an unhelpful "This check failed" error message - I assume this is transient?
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r458452083



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > ... at this very moment in time, the exact same number of days have passed since the Unix Epoch for me and for you, even though we are in different time zones and our local calendars may be on different days
   
   Ok, I agree with this, because the UNIX Epoch is a point in time rather than a date, and a "day" in this context must therefore be a block of 24 hours rather than a box on a calendar.
   
   Alternatively, if someone said "number of days passed since 1st Jan 1970", then a "day" must be a box on a calendar, because no time element has been mentioned: only dates.  This day count could indeed be different for us in different time zones at the very same moment in time.
   
   I'm happy with the idea that the `DateXX` array builders can accept a `DateTimeOffset` and interpret it as the number of 24-hour blocks since the UNIX Epoch.  I'll ensure this is fully documented, as I don't find this the intuitive interpretation and I'm sure I'm not alone here.
   
   Thank you for clarifying :smile:.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-654734988


   https://issues.apache.org/jira/browse/ARROW-8581


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt closed pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt closed pull request #7654:
URL: https://github.com/apache/arrow/pull/7654


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r468143702



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       I'm wondering if we really want this behavior, or if we should just store the `value.ToUnixTimeMilliseconds()` as we were doing before. The code calling this can convert the input to be whole days if it wants.
   
   The reason I think this is because I don't see anything preventing someone from calling `builder.Append(long value)` with a value that isn't a multiple of `86400000`. Looking at Java and C++, I don't see this from being stopped either.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-657438635


   > In general, DateTimeOffset is the preferred type to use when talking about dates and times. The reasoning is because exactly what you point out in the JIRA issue - DateTime.Kind is very confusing and can cause a lot of problems.
   
   I suspect this is a matter of codebase/team preference.  By contract, my organisation previously recommended `DateTime` for dates.  I say "previously" because now we would use NodaTime's `LocalDate` instead :-).
   
   Nonetheless, in the absence of a proper type just for dates, I can see arguments for either, and I'll modify the API accordingly to accept both in a non-breaking manner.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-661158367


   > is there a way to re-trigger build checks?
   
   In other repos I've used the "Ru-run jobs" button on the "Checks" tab:
   
   ![image](https://user-images.githubusercontent.com/8291187/87962072-41633e80-ca7c-11ea-967c-08ed9a9a62eb.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r457569636



##########
File path: csharp/src/Apache.Arrow/Arrays/DelegatingArrayBuilder.cs
##########
@@ -0,0 +1,96 @@
+//------------------------------------------------------------------------------
+// <copyright file="DelegatingArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using Apache.Arrow.Memory;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DelegatingArrayBuilder{T,TArray,TBuilder}"/> class can be used as the base for any array builder
+    /// that needs to delegate most of its functionality to an inner array builder.
+    /// </summary>
+    /// <remarks>
+    /// The typical use case is when an array builder may accept a number of different types as input, but which are
+    /// all internally converted to a single type for assembly into an array.
+    /// </remarks>
+    /// <typeparam name="T">Type of item accepted by inner array builder.</typeparam>
+    /// <typeparam name="TArray">Type of array produced by this (and the inner) builder.</typeparam>
+    /// <typeparam name="TBuilder">Type of builder (see Curiously-Recurring Template Pattern).</typeparam>
+    public abstract class DelegatingArrayBuilder<T, TArray, TBuilder> : IArrowArrayBuilder<TArray, TBuilder>

Review comment:
       > and probably NodaTime.Instant if we use NodaTime for the TZDB
   
   Just a FYI, see this [previous discussion](https://github.com/apache/arrow/pull/5965#discussion_r356233161) where we discussed NodaTime. In short, I don't believe the Arrow library should have a dependency on NodaTime. But instead, the Arrow API should allow for a consumer to use NodaTime and Arrow together.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#issuecomment-661363849


   > I think that a member of the "arrow-committers" team can use "Re-run jobs". (I re-ran jobs.)
   
   Ah, I didn't know committers were special that way. Thanks for re-running, and thanks for the info.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r456557580



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       I've restored this to use `.ToUnixTimeMilliseconds()` (and with the conversion in the other direction).
   
   However, I'm not sure whether it is such a good idea to make the same change in `Date32Array`.  My personal opinion is that `(dateTime - _epoch).TotalDays` is more readable as a day count than having to make a trip into milliseconds and then convert again.  Involving milliseconds may also distract the reader into momentarily thinking about leap seconds, when they are not relevant for a day count.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r463888181



##########
File path: csharp/test/Apache.Arrow.Tests/TestDateAndTimeData.cs
##########
@@ -0,0 +1,83 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements. See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+using System;
+using System.Collections.Generic;
+using System.Linq;
+
+namespace Apache.Arrow.Tests
+{
+    /// <summary>
+    /// The <see cref="TestDateAndTimeData"/> class holds example dates and times useful for testing.
+    /// </summary>
+    internal static class TestDateAndTimeData
+    {
+        private static readonly DateTime _earliestDate = new DateTime(1, 1, 1);
+        private static readonly DateTime _latestDate = new DateTime(9999, 12, 31);
+
+        private static readonly DateTime[] _exampleDates =
+        {
+            _earliestDate, new DateTime(1969, 12, 31), new DateTime(1970, 1, 1), new DateTime(1970, 1, 2),
+            new DateTime(1972, 6, 30), new DateTime(2015, 6, 30), new DateTime(2016, 12, 31), new DateTime(2020, 2, 29),
+            new DateTime(2020, 7, 1), _latestDate,
+        };
+
+        private static readonly TimeSpan[] _exampleTimes =
+        {
+            new TimeSpan(0, 0, 1), new TimeSpan(12, 0, 0), new TimeSpan(23, 59, 59),
+        };
+
+        private static readonly DateTimeKind[] _exampleKinds =
+        {
+            DateTimeKind.Local, DateTimeKind.Unspecified, DateTimeKind.Utc,
+        };
+
+        private static readonly TimeSpan[] _exampleOffsets =
+        {
+            TimeSpan.FromHours(-2),
+            TimeSpan.Zero,
+            TimeSpan.FromHours(2),
+        };
+
+        /// <summary>
+        /// Gets a collection of example dates (i.e. with a zero time component), of all different kinds.
+        /// </summary>
+        public static IEnumerable<DateTime> ExampleDates =>

Review comment:
       These properties provide a useful set of test data that acts over the cartesian product of dates/times/kinds/offsets, so that testing all possible combinations is straightforward.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r469456287



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       > From what I can tell, the Date64 builder (neither in this PR nor in master) has never had an Append(long) method.
   
   You're right - sorry I got confused between the different languages' implementations.
   
   I started a conversation on the dev list about this topic, and @wesm had a good idea that we could consider here:
   
   > I think we should validate optionally in ValidateFull in C++. I think to validate unconditionally would be too computationally expensive
   
   https://issues.apache.org/jira/browse/ARROW-9705
   
   I think you've convinced me that the default/right/safe behavior is to ensure the spec is enforced, so ensuring appending a `DateTimeOffset` to a `Date64Array` makes the underlying `long` value divisible by `86400000` is the right thing to do. If necessary, in the future we could add a `long` overload which doesn't do validation in the off chance that someone really did want non-zero time in their `Date64Array`.
   
   Thanks for working through this with me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r457739938



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > The Unix Epoch is defined as 00:00:00 UTC Thursday, 1 January 1970.
   
   Apologies, I've been too colloquial with use of the word epoch when talking about dates.  You are right that the UNIX epoch is a point in time, not a date.
   
   However, `Date32` and `Date64` are measures of _dates_.  As the standard C# library does not have a dedicated type for dates, we have to infer a date from either `DateTime` or `DateTimeOffset`.  Dates and instants in time aren't implicitly convertible in the natural sense because of timezones: there has to be some logic in the conversion.
   
   >  The logic needs to take into account the Offset portion of the DateTimeOffset.
   > I can have the same moment in time represented with 2 different DateTimeOffsets
   
   Absolutely right - the same point in time can lie within two different calendar dates depending on one's timezone.  At the time I'm writing this it's just gone midnight in the UK, so I think it's 21st July right now.  But if you're on CDT, it's 20th July for you at the very same instant.
   
   For clarity: I think the most natural way to infer a meaningful date from `DateTime`/`DateTimeOffset` is to call the `.Date` property, i.e. I would expect the time and offset to be _ignored_ when considering the date.
   
   > I can have the same moment in time represented with 2 different DateTimeOffsets - one with a negative Offset, and one with a positive Offset. If used the proposed Date32Array.Builder, I would get 2 different days. This is not correct.
   
   I disagree with the statement that this is not correct, but I want to understand your viewpoint better...
   
   From looking at your test case, do you mean to suggest that you believe a conversion to UTC first and then taking the date from the UTC date/time is the most natural way to infer a date from a `DateTimeOffset`?  This is what happens by doing `(int)(dto.ToUnixTimeMilliseconds() / MillisecondsPerDay)`, but it's an interpretation of dates from times that I've never seen in any other project!
   
   If you take the "simplest approach" and call `.Date` (or its equivalent in NodaTime), time and offset are always ignored:
   
   ```csharp
   // Using System types
   var westernHemisphereTime = new DateTimeOffset(2020, 7, 20, 20, 0, 0, TimeSpan.FromHours(-5));
   var easternHemisphereTime = new DateTimeOffset(2020, 7, 21, 10, 0, 0, TimeSpan.FromHours(9));
   Assert.NotEqual(westernHemisphereTime.Date == easternHemisphereTime.Date);
   Assert.Equal(westernHemisphereTime.UtcTicks == easternHemisphereTime.UtcTicks);
   
   // Using NodaTime types
   var westernHemisphereZtd = new LocalDateTime(2020, 7, 20, 20, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["America/Chicago"]);
   var easternHemisphereZtd = new LocalDateTime(2020, 7, 21, 10, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["Asia/Tokyo"]);
   Assert.NotEqual(westernHemisphereZtd.Date == easternHemisphereZtd.Date);
   Assert.Equal(westernHemisphereZtd.ToInstant() == easternHemisphereZtd.ToInstant());
   ```
   
   Separately, I've realised that `Date64Array` had an existing bug that I would not be fixing if I were to retain the previous code: the previous code would not truncate non-date parts and hence could store values that were not a multiple of 86400000, which is a violation of the spec.
   
   I think the best thing to do is to put this PR back into draft mode, implement a _full set_ of unit tests for both `Date32` and `Date64` (and their respective builders), and hence ensure we have full clarity on expected behaviour.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r463886892



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       An alternative to this code might be:
   
   ```c#
   // Similar to the code in Date32Array.
   return (long)(dateTimeOffset.UtcDateTime.Date - _epochDate).TotalDays * MillisecondsPerDay;
   ```
   
   Or even:
   
   ```c#
   // No maths, but requires making another DateTimeOffset only to throw it away again.
   return new DateTimeOffset(dateTimeOffset.UtcDateTime.Date, TimeSpan.Zero).ToUnixMilliseconds();
   ```
   
   @eerhardt , please let me know if you have a preference between any of these.

##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       I've now updated the PR so that treatment of `DateTimeOffset` is as discussed above: it is converted to UTC first and then the days are counted, so that the input is treated as a point in time rather than a calendar date/time.  The docstrings all explain this behaviour (hopefully clearly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r458353882



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > I disagree with the statement that this is not correct, but I want to understand your viewpoint better...
   
   > From looking at your test case, do you mean to suggest that you believe a conversion to UTC first and then taking the date from the UTC date/time is the most natural way to infer a date from a DateTimeOffset?
   
   What I'm saying is that at this very moment in time, the exact same number of days have passed since the Unix Epoch for me and for you, even though we are in different time zones and our local calendars may be on different days. If there was a program that did:
   
   ```C#
   var array = new Date32Array.Builder()
                       .Append(DateTimeOffset.Now)
                       .Build();
   ```
   
   And it ran on both of our machines at the same moment in time, using the proposed changes we could end up with 2 different values in the `array`. That is what is not correct, in my opinion. Since we are taking in a `DateTimeOffset`, we need to respect the `Offset` part of that structure. It is a significant piece of information.
   
   The Unix Epoch represents a moment in time. `DateTimeOffset` represents a moment in time. Taking 2 `DateTimeOffset` instances that are equivalent (for example calling Equals on the above `westernHemisphereTime` and `easternHemisphereTime` returns `true`) should result in the exact same number of days since the Unix Epoch.
   
   Now if a user of this library wants to strip the Offset and set it to `Zero`, then that is up to them. But this library shouldn't be making assumptions that the user doesn't want the Offset taken into account.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r456557697



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;
             }
         }
 
-        public Date64Array(ArrayData data) 
+        public Date64Array(ArrayData data)
             : base(data)
         {
             data.EnsureDataType(ArrowTypeId.Date64);
         }
 
         public override void Accept(IArrowArrayVisitor visitor) => Accept(this, visitor);
 
-        public DateTimeOffset? GetDate(int index)
+        [Obsolete("Use `GetDateTimeOffset()` instead")]
+        public DateTimeOffset? GetDate(int index) => GetDateTimeOffset(index);
+
+        /// <summary>
+        /// Get the date at the specified index in the form of a <see cref="DateTime"/> object.
+        /// </summary>
+        /// <remarks>
+        /// The <see cref="DateTime.Kind"/> property of the returned object is set to
+        /// <see cref="DateTimeKind.Unspecified"/>.
+        /// </remarks>
+        /// <param name="index">Index at which to get the date.</param>
+        /// <returns>Returns a <see cref="DateTime"/> object, or <c>null</c> if there is no object at that index.
+        /// </returns>
+        public DateTime? GetDateTime(int index)
         {
             long? value = GetValue(index);
+            return value.HasValue
+                ? _epoch.AddMilliseconds(value.Value)
+                : default(DateTime?);
+        }
 
-            if (!value.HasValue)
-            {
-                return default;
-            }
-
-            return DateTimeOffset.FromUnixTimeMilliseconds(value.Value);
+        /// <summary>
+        /// Get the date at the specified index in the form of a <see cref="DateTimeOffset"/> object.
+        /// </summary>
+        /// <param name="index">Index at which to get the date.</param>
+        /// <returns>Returns a <see cref="DateTimeOffset"/> object, or <c>null</c> if there is no object at that index.
+        /// </returns>
+        public DateTimeOffset? GetDateTimeOffset(int index)
+        {
+            long? value = GetValue(index);
+            return value.HasValue
+                ? new DateTimeOffset(_epoch.AddMilliseconds(value.Value), TimeSpan.Zero)

Review comment:
       Restored to use `.FromUnixTimeMilliseconds()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r453670993



##########
File path: csharp/src/Apache.Arrow/Arrays/Date32Array.cs
##########
@@ -55,16 +70,39 @@ public Date32Array(ArrayData data)
 
         public override void Accept(IArrowArrayVisitor visitor) => Accept(this, visitor);
 
-        public DateTimeOffset? GetDate(int index)
+        [Obsolete("Use `GetDateTimeOffset()` instead")]
+        public DateTimeOffset? GetDate(int index) => GetDateTimeOffset(index);
+
+        /// <summary>
+        /// Get the date at the specified index in the form of a <see cref="DateTime"/> object.
+        /// </summary>
+        /// <remarks>
+        /// The <see cref="DateTime.Kind"/> property of the returned object is set to
+        /// <see cref="DateTimeKind.Unspecified"/>.
+        /// </remarks>
+        /// <param name="index">Index at which to get the date.</param>
+        /// <returns>Returns a <see cref="DateTime"/> object, or <c>null</c> if there is no object at that index.
+        /// </returns>
+        public DateTime? GetDateTime(int index)

Review comment:
       In the absence of being able to overload return types, this class (and `Date64Array`) follows the naming convention of [System.Convert](https://docs.microsoft.com/en-us/dotnet/api/system.convert?view=netcore-3.1), and mentions the return type in the method name.
   
   As such, the original `GetDate()` method is marked as _obsolete_.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] eerhardt commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
eerhardt commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r457560950



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > However, I'm not sure whether it is such a good idea to make the same change in Date32Array. My personal opinion is that (dateTime - _epoch).TotalDays is more readable as a day count than having to make a trip into milliseconds and then convert again.
   
   This isn't about read-ability, but instead about correctness. The logic needs to take into account the Offset portion of the `DateTimeOffset`.
   
   The Unix Epoch is defined as `00:00:00 UTC Thursday, 1 January 1970`. At that specific time, half of the world was on the day `1 January 1970`, and the other half of the world was still on `31 December 1969`. I can have the same **moment in time** represented with 2 different DateTimeOffsets - one with a negative Offset, and one with a positive Offset. If used the proposed `Date32Array.Builder`, I would get 2 different days. This is not correct.
   
   For example, the following test should pass, but it doesn't:
   
   ```C#
               [Fact]
               public void RespectsOffset()
               {
                   // Arrange
                   var westernHemisphereTime = new DateTimeOffset(2020, 7, 20, 20, 0, 0, TimeSpan.FromHours(-5));
                   var easternHemisphereTime = new DateTimeOffset(2020, 7, 21, 10, 0, 0, TimeSpan.FromHours(9));
                   var array = new Date32Array.Builder()
                       .Append(westernHemisphereTime)
                       .Append(easternHemisphereTime)
                       .Build();
   
                   // Act
                   var actualFirst = array.GetDate(0);
                   var actualSecond = array.GetDate(1);
   
                   // Assert
                   Assert.NotNull(actualFirst);
                   Assert.NotNull(actualSecond);
                   Assert.Equal(actualFirst, actualSecond);
                   Assert.Equal(new DateTimeOffset(2020, 7, 21, 0, 0, 0, TimeSpan.Zero), actualFirst);
               }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r468233789



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,103 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds, in multiples of
+    /// 86400000.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private const long MillisecondsPerDay = 86400000;
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                var dateTimeOffset = new DateTimeOffset(
+                    DateTime.SpecifyKind(dateTime.Date, DateTimeKind.Unspecified),
+                    TimeSpan.Zero);
+                return dateTimeOffset.ToUnixTimeMilliseconds();
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                // The internal value stored for a DateTimeOffset can be thought of as the number of milliseconds,
+                // in multiples of 86400000, that have passed since the UNIX epoch.  It is not the same as what would
+                // result from encoding the date from the DateTimeOffset.Date property.
+                long millis = dateTimeOffset.ToUnixTimeMilliseconds();
+                long days = millis / MillisecondsPerDay;
+                return (millis < 0 ? days - 1 : days) * MillisecondsPerDay;

Review comment:
       From what I can tell, the Date64 builder (neither in this PR nor in master) has never had an `Append(long)` method.  One could be added, but that's probably beyond the scope of this PR.  I can see the rationale for not doing a multiple-of-86400000 check on the equivalent method in the C++/Java APIs: the user knows they are building Date64 values, and if they are sophisticated enough to be providing the raw underlying 64-bit type, it seems reasonable to trust them to get it right.
   
   However, if the user is providing a conventional Date-like type, like `DateTime` or `DateTimeOffset`, my view is that the API ought to make it easy to do the right thing and difficult to do the wrong thing:
   
   * I believe we're definitely in agreement on what the Right Thing (TM) is for `DateTime`: the code looks at the year/month/day on it and considers that to be the date.  No ambiguity here.
   * We've already found conflicting viewpoints on how to interpret `DateTimeOffset` as a date, based on whether it's treated as an instant in time (take `DateTimeOffset.UtcDateTime.Date`) or a "zoned" date and time (take `DateTimeOffset.Date`).
   
   Regardless of how a date is interpreted, if the Arrow API offers a method accepting `DateTimeOffset` and converting into `Date64`, I feel that it should _never_ violate the spec when doing so:
   
   > Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no leap seconds), where the values are evenly divisible by 86400000
   
   Keeping the code as `value.ToUnixTimeMilliseconds()` makes it very easy for a user to violate the spec by accident, which is not good.
   
   Having spent a fair bit of time discusing this, I am tempted to _drop_ (or mark obsolete) all builder methods accepting `DateTimeOffset` and only accept `DateTime`, so there is no chance of misinterpretation: it would be up to the caller in possession of a `DateTimeOffset` to call `.Date` or `.UtcDateTime` as per their requirements - similar to something you advocated in an earlier comment in fact!  I'd be keen to hear your thoughts on this...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r457739938



##########
File path: csharp/src/Apache.Arrow/Arrays/Date64Array.cs
##########
@@ -15,56 +15,94 @@
 
 using Apache.Arrow.Types;
 using System;
-using System.Collections.Generic;
-using Apache.Arrow.Memory;
 
 namespace Apache.Arrow
 {
+    /// <summary>
+    /// The <see cref="Date64Array"/> class holds an array of dates in the <c>Date64</c> format, where each date is
+    /// stored as the number of milliseconds since the dawn of (UNIX) time, excluding leap seconds.
+    /// </summary>
     public class Date64Array: PrimitiveArray<long>
     {
+        private static readonly DateTime _epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Unspecified);
+
         public Date64Array(
             ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
             int length, int nullCount, int offset)
             : this(new ArrayData(Date64Type.Default, length, nullCount, offset,
                 new[] { nullBitmapBuffer, valueBuffer }))
         { }
 
-        public class Builder : PrimitiveArrayBuilder<DateTimeOffset, long, Date64Array, Builder>
+        /// <summary>
+        /// The <see cref="Builder"/> class can be used to fluently build <see cref="Date64Array"/> objects.
+        /// </summary>
+        public class Builder : DateArrayBuilder<long, Date64Array, Builder>
         {
-            internal class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
+            private class DateBuilder: PrimitiveArrayBuilder<long, Date64Array, DateBuilder>
             {
                 protected override Date64Array Build(
                     ArrowBuffer valueBuffer, ArrowBuffer nullBitmapBuffer,
                     int length, int nullCount, int offset) =>
                     new Date64Array(valueBuffer, nullBitmapBuffer, length, nullCount, offset);
-            } 
+            }
 
+            /// <summary>
+            /// Construct a new instance of the <see cref="Builder"/> class.
+            /// </summary>
             public Builder() : base(new DateBuilder()) { }
 
-            protected override long ConvertTo(DateTimeOffset value)
+            protected override long Convert(DateTime dateTime)
+            {
+                return (long)(dateTime - _epoch).TotalMilliseconds;
+            }
+
+            protected override long Convert(DateTimeOffset dateTimeOffset)
             {
-                return value.ToUnixTimeMilliseconds();
+                return (long)(dateTimeOffset.Date - _epoch).TotalMilliseconds;

Review comment:
       > The Unix Epoch is defined as 00:00:00 UTC Thursday, 1 January 1970.
   
   Apologies, I've been too colloquial with use of the word epoch when talking about dates.  You are right that the UNIX epoch is a point in time, not a date.
   
   However, `Date32` and `Date64` are measures of _dates_.  As the standard C# library does not have a dedicated type for dates, we have to infer a date from either `DateTime` or `DateTimeOffset`.  Dates and instants in time aren't implicitly convertible in the natural sense because of timezones: there has to be some logic in the conversion.
   
   >  The logic needs to take into account the Offset portion of the DateTimeOffset.
   > I can have the same moment in time represented with 2 different DateTimeOffsets
   
   Absolutely right - the same point in time can lie within two different calendar dates depending on one's timezone.  At the time I'm writing this it's just gone midnight in the UK, so I think it's 21st July right now.  But if you're on CDT, it's 20th July for you at the very same instant.
   
   For clarity: I think the most natural way to infer a meaningful date from `DateTime`/`DateTimeOffset` is to call the `.Date` property, i.e. I would expect the time and offset to be _ignored_ when considering the date.
   
   > I can have the same moment in time represented with 2 different DateTimeOffsets - one with a negative Offset, and one with a positive Offset. If used the proposed Date32Array.Builder, I would get 2 different days. This is not correct.
   
   I disagree with the statement that this is not correct, but I want to understand your viewpoint better...
   
   From looking at your test case, do you mean to suggest that you believe a conversion to UTC first and then taking the date from the UTC date/time is the most natural way to infer a date from a `DateTimeOffset`?  This is what happens by doing `(int)(dto.ToUnixTimeMilliseconds() / MillisecondsPerDay)`, but it's an interpretation of dates from times that I've never seen in any other project!
   
   If you take the "simplest approach" and call `.Date` (or its equivalent in NodaTime), time and offset are always ignored:
   
   ```csharp
   // Using System types
   var westernHemisphereTime = new DateTimeOffset(2020, 7, 20, 20, 0, 0, TimeSpan.FromHours(-5));
   var easternHemisphereTime = new DateTimeOffset(2020, 7, 21, 10, 0, 0, TimeSpan.FromHours(9));
   Assert.NotEqual(westernHemisphereTime.Date == easternHemisphereTime.Date);
   Assert.Equal(westernHemisphereTime.UtcTicks == easternHemisphereTime.UtcTicks);
   
   // Using NodaTime types
   var westernHemisphereZtd = new LocalDateTime(2020, 7, 20, 20, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["America/Chicago"]);
   var easternHemisphereZtd = new LocalDateTime(2020, 7, 21, 10, 0, 0).InZoneStrictly(DateTimeZoneProviders.Tzdb["Asia/Tokyo"]);
   Assert.NotEqual(westernHemisphereZtd.Date == easternHemisphereZtd.Date);
   Assert.Equal(westernHemisphereZtd.ToInstant() == easternHemisphereZtd.ToInstant());
   ```
   
   Separately, I've realised that `Date64Array` had an existing bug that I would not be fixing if I were to retain the previous code: the previous code would not truncate non-date parts and hence could store values that were not a multiple of 8640000, which is a violation of the spec.
   
   I think the best thing to do is to put this PR back into draft mode, implement a _full set_ of unit tests for both `Date32` and `Date64` (and their respective builders), and hence ensure we have full clarity on expected behaviour.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mr-smidge commented on a change in pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

Posted by GitBox <gi...@apache.org>.
mr-smidge commented on a change in pull request #7654:
URL: https://github.com/apache/arrow/pull/7654#discussion_r456563263



##########
File path: csharp/src/Apache.Arrow/Arrays/DelegatingArrayBuilder.cs
##########
@@ -0,0 +1,96 @@
+//------------------------------------------------------------------------------
+// <copyright file="DelegatingArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using Apache.Arrow.Memory;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DelegatingArrayBuilder{T,TArray,TBuilder}"/> class can be used as the base for any array builder
+    /// that needs to delegate most of its functionality to an inner array builder.
+    /// </summary>
+    /// <remarks>
+    /// The typical use case is when an array builder may accept a number of different types as input, but which are
+    /// all internally converted to a single type for assembly into an array.
+    /// </remarks>
+    /// <typeparam name="T">Type of item accepted by inner array builder.</typeparam>
+    /// <typeparam name="TArray">Type of array produced by this (and the inner) builder.</typeparam>
+    /// <typeparam name="TBuilder">Type of builder (see Curiously-Recurring Template Pattern).</typeparam>
+    public abstract class DelegatingArrayBuilder<T, TArray, TBuilder> : IArrowArrayBuilder<TArray, TBuilder>

Review comment:
       If I were to refactor `TimestampArray` today, I would use `DelegatingArrayBuilder` as the builder's base class, adding methods to accept `DateTimeOffset` (and probably `NodaTime.Instant` if we use NodaTime for the TZDB).
   
   I don't mind which direction we take for this PR, but have a minor preference to keep `DelegatingArrayBuilder` (because we know that we will need to refactor `TimestampArray` in order for the C# codebase to become Production-ready).
   
   I've raised https://issues.apache.org/jira/browse/ARROW-9515 for the enhancements to `TimestampArray` in future.

##########
File path: csharp/src/Apache.Arrow/Arrays/DelegatingArrayBuilder.cs
##########
@@ -0,0 +1,96 @@
+//------------------------------------------------------------------------------
+// <copyright file="DelegatingArrayBuilder.cs" company="Jetstone Asset Management LLP">
+// Copyright (C) Jetstone Asset Management LLP.  All rights reserved.
+// Unauthorised copying of this file, via any medium, is strictly prohibited.
+// Proprietary and confidential
+// </copyright>
+// <author>Adam Szmigin</author>
+//------------------------------------------------------------------------------
+
+using System;
+using Apache.Arrow.Memory;
+
+namespace Apache.Arrow
+{
+    /// <summary>
+    /// The <see cref="DelegatingArrayBuilder{T,TArray,TBuilder}"/> class can be used as the base for any array builder
+    /// that needs to delegate most of its functionality to an inner array builder.
+    /// </summary>
+    /// <remarks>
+    /// The typical use case is when an array builder may accept a number of different types as input, but which are
+    /// all internally converted to a single type for assembly into an array.
+    /// </remarks>
+    /// <typeparam name="T">Type of item accepted by inner array builder.</typeparam>
+    /// <typeparam name="TArray">Type of array produced by this (and the inner) builder.</typeparam>
+    /// <typeparam name="TBuilder">Type of builder (see Curiously-Recurring Template Pattern).</typeparam>
+    public abstract class DelegatingArrayBuilder<T, TArray, TBuilder> : IArrowArrayBuilder<TArray, TBuilder>

Review comment:
       If I were to refactor `TimestampArray` today, I would use `DelegatingArrayBuilder` as the builder's base class, adding methods to accept `DateTimeOffset` (and probably `NodaTime.Instant` if we use NodaTime for the TZDB).
   
   I don't mind which direction we take for this PR, but have a preference to keep `DelegatingArrayBuilder` (because we know that we will need to refactor `TimestampArray` in order for the C# codebase to become Production-ready).
   
   I've raised https://issues.apache.org/jira/browse/ARROW-9515 for the enhancements to `TimestampArray` in future.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org