You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/21 21:55:44 UTC

[GitHub] [arrow] JabariBooker opened a new pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

JabariBooker opened a new pull request #12014:
URL: https://github.com/apache/arrow/pull/12014


   Validate that temporal data in arrays meet the restrictions of their data type specifications.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1028017983


   Benchmark runs are scheduled for baseline = 9b53235611a1cda81d5fbe801503acf45f5bf74e and contender = c715bebbd89089f385c9996560866da23ea1ddda. c715bebbd89089f385c9996560866da23ea1ddda is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3d98628293494948b70a2de31fbfbf68...4ec74154e4a440a2a7410cb74cf581c9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d1a59396625248a3935bc5ad2d556e7a...2d0495c8ce394d50b20505b928336533/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f2326e71b90e4299a2e580d26b3cdd9a...71d8e5ea695a4903a81e2c6ece99f09a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1006674558


   @pitrou So there is a limit for `Date64Type` but not for `Date32Type`? I now understand that `TimestampType` does not has limits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1003124230


   @edponce There are no limits on `Date32Type` and `TimestampType` AFAICT.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1024358953


   @lidavidm @edponce @pitrou There are still failing integration tests (all are also failing due to invalid values for date64). The data involved in these tests seem to be generated by some mechanism I'm unaware of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794894926



##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -772,12 +789,42 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
     }
 
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Date32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Date64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int64Type, TimestampType);
-      GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Time32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Time64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, MonthIntervalType);
 
+    case Type::type::DATE64: {
+      using c_type = typename Date64Type::c_type;
+      constexpr c_type kFullDayMillis = 1000 * 60 * 60 * 24;
+      constexpr c_type min_value = std::numeric_limits<c_type>::min() / kFullDayMillis;
+      constexpr c_type max_value = std::numeric_limits<c_type>::max() / kFullDayMillis;

Review comment:
       As far as I understand, negative values for Date64Type should be permitted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12014:
URL: https://github.com/apache/arrow/pull/12014


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794900959



##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -182,6 +188,9 @@ static std::shared_ptr<NumericArray<ArrowType>> GenerateNumericArray(int64_t siz
 
   buffers[1] = *AllocateBuffer(sizeof(CType) * size);
   options.GenerateData(buffers[1]->mutable_data(), size);
+  if (std::is_same<ArrowType, Date64Type>::value) {
+    GenerateFullDayMillisNoNan(buffers[1]->mutable_data(), size);
+  }

Review comment:
       This is intentional. Since we need to generate multiples of 86400000 for `Date64Type`, random numbers are generated in `GenerateData` and ultimately multiplied by 86400000 in [`GenerateFullDayMillisNoNan`](https://github.com/JabariBooker/arrow/blob/ARROW-10924/cpp/src/arrow/testing/random.cc#L172-L176). The behavior of `GenerateFullDayMillisNoNan` is separated from `GenerateOptions` since this class doesn't know the arrow type, only the underlying C type. However, behavior of `GenerateFullDayMillisNoNan` could be moved into the above `if` statement if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1025599895


   @JabariBooker Integration data is generated in Python by Archery here: https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/datagen.py#L196
   
   You can find more info about the JSON integration format here, and how to run the integration tests locally, here: https://arrow.apache.org/docs/format/Integration.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1006699051


   Date32Type's unit is days and so there's nothing to check, but Date64Type's unit is milliseconds, hence it doesn't have a limit per se, but it must be a multiple of 86400000 (=1 day in milliseconds).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794900959



##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -182,6 +188,9 @@ static std::shared_ptr<NumericArray<ArrowType>> GenerateNumericArray(int64_t siz
 
   buffers[1] = *AllocateBuffer(sizeof(CType) * size);
   options.GenerateData(buffers[1]->mutable_data(), size);
+  if (std::is_same<ArrowType, Date64Type>::value) {
+    GenerateFullDayMillisNoNan(buffers[1]->mutable_data(), size);
+  }

Review comment:
       This is intentional. Since we need to generate multiples of 86400000 for `Date64Type`, random numbers are generated in `GenerateData` and ultimately multiplied by 86400000 in [`GenerateFullDayMillisNoNan`](https://github.com/JabariBooker/arrow/blob/ARROW-10924/cpp/src/arrow/testing/random.cc#L172). The behavior of `GenerateFullDayMillisNoNan` is separated from `GenerateOptions` since this class doesn't know the arrow type, only the underlying C type. However, behavior of `GenerateFullDayMillisNoNan` could be moved into the above `if` statement if needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794749664



##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -534,7 +534,7 @@ static ScalarVector GetScalars() {
       std::make_shared<UInt64Scalar>(3),
       std::make_shared<DoubleScalar>(3.0),
       std::make_shared<Date32Scalar>(10),
-      std::make_shared<Date64Scalar>(11),
+      std::make_shared<Date64Scalar>(864000000),

Review comment:
       Is this literal meant to be an invalid value?
   There is an extra 0 compared to `1000 x 60 x 60 x 24 = 86400000`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794797795



##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -182,6 +188,9 @@ static std::shared_ptr<NumericArray<ArrowType>> GenerateNumericArray(int64_t siz
 
   buffers[1] = *AllocateBuffer(sizeof(CType) * size);
   options.GenerateData(buffers[1]->mutable_data(), size);
+  if (std::is_same<ArrowType, Date64Type>::value) {
+    GenerateFullDayMillisNoNan(buffers[1]->mutable_data(), size);
+  }

Review comment:
       This should be an if-else. `buffers[1]->mutable_data()` is modified twice if `Date64Type`.

##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -772,12 +789,42 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
     }
 
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Date32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Date64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int64Type, TimestampType);
-      GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Time32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Time64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, MonthIntervalType);
 
+    case Type::type::DATE64: {
+      using c_type = typename Date64Type::c_type;
+      constexpr c_type kFullDayMillis = 1000 * 60 * 60 * 24;
+      constexpr c_type min_value = std::numeric_limits<c_type>::min() / kFullDayMillis;
+      constexpr c_type max_value = std::numeric_limits<c_type>::max() / kFullDayMillis;

Review comment:
       `std::numeric_limits<int64_t>::min()` would allow negative values, so maybe set to 0.
   Although, there are tests with negative `Date64Type` values.

##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -182,6 +188,9 @@ static std::shared_ptr<NumericArray<ArrowType>> GenerateNumericArray(int64_t siz
 
   buffers[1] = *AllocateBuffer(sizeof(CType) * size);
   options.GenerateData(buffers[1]->mutable_data(), size);
+  if (std::is_same<ArrowType, Date64Type>::value) {
+    GenerateFullDayMillisNoNan(buffers[1]->mutable_data(), size);
+  }

Review comment:
       I think a more suitable place for this type-specific generation of numeric arrays, is in [`GenerateTypeDataNoNan()` from `GenerateOptions`](https://github.com/apache/arrow/pull/12014/files#diff-a128a62d4300b880f0c4652a53eaae67f4ff3fa0a6bd3ad8d6bc15e2930f3e6aR97). Make [`GenerateFullDayMillisNoNan`](https://github.com/apache/arrow/pull/12014/files#diff-a128a62d4300b880f0c4652a53eaae67f4ff3fa0a6bd3ad8d6bc15e2930f3e6aR172) as a private member to `GenerateOptions` and perform the `if-else` check in `GenerateTypedDataNoNan()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r777676579



##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");

Review comment:
       You are correct. I was referring to after the addition of whitespace.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r777631342



##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");

Review comment:
       The status would read something like: "Date64Type 1234ms does not represent a whole number of days" which should be grammatically correct. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794752401



##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -534,7 +534,7 @@ static ScalarVector GetScalars() {
       std::make_shared<UInt64Scalar>(3),
       std::make_shared<DoubleScalar>(3.0),
       std::make_shared<Date32Scalar>(10),
-      std::make_shared<Date64Scalar>(11),
+      std::make_shared<Date64Scalar>(864000000),

Review comment:
       While I did not mean for it have an extra 0, it is still a valid value. So long as values for date64 are multiples of 86400000, they are valid (in this case the number just represents 10 whole days in milliseconds rather than 1 day).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1028017983


   Benchmark runs are scheduled for baseline = 9b53235611a1cda81d5fbe801503acf45f5bf74e and contender = c715bebbd89089f385c9996560866da23ea1ddda. c715bebbd89089f385c9996560866da23ea1ddda is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3d98628293494948b70a2de31fbfbf68...4ec74154e4a440a2a7410cb74cf581c9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d1a59396625248a3935bc5ad2d556e7a...2d0495c8ce394d50b20505b928336533/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f2326e71b90e4299a2e580d26b3cdd9a...71d8e5ea695a4903a81e2c6ece99f09a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-999119473






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1006674558


   @pitrou So there is a limit for `Date64Type` but not for `Date32Type`? I now understand that `TimestampType` does not have limits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r777648679



##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");

Review comment:
       I _think_ it will be "Date64Type1234ms does not represent a whole number of days" (note the missing space: whitespace is not implicitly inserted between elements).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1028017983


   Benchmark runs are scheduled for baseline = 9b53235611a1cda81d5fbe801503acf45f5bf74e and contender = c715bebbd89089f385c9996560866da23ea1ddda. c715bebbd89089f385c9996560866da23ea1ddda is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3d98628293494948b70a2de31fbfbf68...4ec74154e4a440a2a7410cb74cf581c9/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d1a59396625248a3935bc5ad2d556e7a...2d0495c8ce394d50b20505b928336533/)
   [Finished :arrow_down:0.13% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f2326e71b90e4299a2e580d26b3cdd9a...71d8e5ea695a4903a81e2c6ece99f09a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1009961789


   FYI, that test failure in `GroupBy.MinMaxTypes` is here: https://github.com/apache/arrow/blob/d88e23273fd4eb7945a5fb94cfdb6315f412ea83/cpp/src/arrow/compute/kernels/hash_aggregate_test.cc#L1614
   
   The test uses one set of data to go through several data types, so you could just update the hardcoded data there too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r773897638



##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,86 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    // check that data is divisible by 8.64e7 (= 1000ms * 60s * 60mins * 24hrs)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      c_type fullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if(date % fullDay != 0) {
+              return Status::Invalid(type, date, "ms does not represent a whole number of days");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time32Type& type) {
+    // check unit
+    // if unit is s => data must be within [0, 8.64e4)
+    // if unit is ms => data must be within [0, 8.64e7)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Time32Type::c_type;
+    if (full_validation) {
+      c_type fullDay_s = 60 * 60 * 24;
+      c_type fullDay_ms = fullDay_s * 1000;
+      return VisitArrayDataInline<Time32Type>(
+          data,
+          [&](c_type time) {
+            if(type.unit() == TimeUnit::SECOND && (time < 0 || time >= fullDay_s)) {
+              return Status::Invalid(type, " ", time,
+                                     "s does not fit within the acceptable range of [0, ",
+                                     fullDay_s, ") s");
+            }
+            if(type.unit() == TimeUnit::MILLI && (time < 0 || time >= fullDay_ms)) {
+              return Status::Invalid(type, " ", time,
+                                     "ms does not fit within the acceptable range of [0, ",
+                                     fullDay_ms, ") ms");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time64Type& type) {
+    // check unit
+    // if unit is us => data must be within [0, 8.64e10)
+    // if unit is ns => data must be within [0, 8.64e13)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Time64Type::c_type;
+    if (full_validation) {
+      c_type fullDay_us = 1000000ll * 60 * 60 * 24;

Review comment:
       nit: use the `L` suffix over `l` so it's a little easier to distinguish

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,86 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    // check that data is divisible by 8.64e7 (= 1000ms * 60s * 60mins * 24hrs)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      c_type fullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if(date % fullDay != 0) {
+              return Status::Invalid(type, date, "ms does not represent a whole number of days");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time32Type& type) {
+    // check unit
+    // if unit is s => data must be within [0, 8.64e4)
+    // if unit is ms => data must be within [0, 8.64e7)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Time32Type::c_type;
+    if (full_validation) {
+      c_type fullDay_s = 60 * 60 * 24;
+      c_type fullDay_ms = fullDay_s * 1000;

Review comment:
       Note that for regular variable names, `snake_case` is preferred.

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,86 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    // check that data is divisible by 8.64e7 (= 1000ms * 60s * 60mins * 24hrs)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      c_type fullDay = 1000 * 60 * 60 * 24;

Review comment:
       `constexpr c_type kFullDayMillis`? (Constants in the [Google style guide](https://arrow.apache.org/docs/developers/cpp/development.html#code-style-linting-and-ci) use [`kCamelCase`](https://google.github.io/styleguide/cppguide.html#Constant_Names).)

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,86 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    // check that data is divisible by 8.64e7 (= 1000ms * 60s * 60mins * 24hrs)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      c_type fullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if(date % fullDay != 0) {
+              return Status::Invalid(type, date, "ms does not represent a whole number of days");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time32Type& type) {
+    // check unit
+    // if unit is s => data must be within [0, 8.64e4)
+    // if unit is ms => data must be within [0, 8.64e7)
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+    
+    using c_type = typename Time32Type::c_type;
+    if (full_validation) {
+      c_type fullDay_s = 60 * 60 * 24;
+      c_type fullDay_ms = fullDay_s * 1000;

Review comment:
       Use constexpr here too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1028017983


   Benchmark runs are scheduled for baseline = 9b53235611a1cda81d5fbe801503acf45f5bf74e and contender = c715bebbd89089f385c9996560866da23ea1ddda. c715bebbd89089f385c9996560866da23ea1ddda is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3d98628293494948b70a2de31fbfbf68...4ec74154e4a440a2a7410cb74cf581c9/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d1a59396625248a3935bc5ad2d556e7a...2d0495c8ce394d50b20505b928336533/)
   [Finished :arrow_down:0.13% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f2326e71b90e4299a2e580d26b3cdd9a...71d8e5ea695a4903a81e2c6ece99f09a/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r796581517



##########
File path: dev/archery/archery/integration/datagen.py
##########
@@ -221,6 +221,30 @@ def _get_type(self):
             ('unit', 'DAY' if self.unit == self.DAY else 'MILLISECOND')
         ])
 
+    def generate_range(self, size, lower, upper, name=None,
+                       include_extremes=False):
+        if self.unit == self.DAY:
+            return super().generate_range(size, lower, upper, name)
+
+        full_day_millis = 1000 * 60 * 60 * 24
+        lower //= full_day_millis

Review comment:
       Python appears to round towards -Inf so a lower negative bound that is not exactly even will get rounded to an out-of-bound value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r797106217



##########
File path: dev/archery/archery/integration/tester_cpp.py
##########
@@ -60,6 +60,10 @@ def _run(self, arrow_path=None, json_path=None, command='VALIDATE',
         if quirks:
             if "no_decimal_validate" in quirks:
                 cmd.append("--validate_decimals=false")
+            if "no_date64_validate" in quirks:
+                cmd.append("--validate_date64=false")
+            if "no_times_validate" in quirks:
+                cmd.append("--validate_times=false")

Review comment:
       Just a question - do all existing files have invalid times generated, or just some? Since it seems the current code path is correct, so presumably it was fixed a while back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] JabariBooker commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
JabariBooker commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r797114295



##########
File path: dev/archery/archery/integration/tester_cpp.py
##########
@@ -60,6 +60,10 @@ def _run(self, arrow_path=None, json_path=None, command='VALIDATE',
         if quirks:
             if "no_decimal_validate" in quirks:
                 cmd.append("--validate_decimals=false")
+            if "no_date64_validate" in quirks:
+                cmd.append("--validate_date64=false")
+            if "no_times_validate" in quirks:
+                cmd.append("--validate_times=false")

Review comment:
       There were 3 golden files that had invalid times resulting failure in the integration tests. So not all of them. However I think that all files did had invalid date64 value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r776479002



##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");

Review comment:
       Is a whitespace missing between `type` and `date`?
   Also, it should be _"does not represents"_ ('s' at the end).

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;

Review comment:
       `kFullDay` --> `kFullDayMillis` to be consistent with the other cases.

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;

Review comment:
       Also, I suggest to move these constexpr statements inside the lambdas.

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;

Review comment:
       Nit: Move `using` statement inside `if (full_validation) { .. }` block.

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time32Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Time32Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDaySeconds = 60 * 60 * 24;
+      constexpr c_type kFullDayMillis = kFullDaySeconds * 1000;
+      return VisitArrayDataInline<Time32Type>(
+          data,
+          [&](c_type time) {
+            if (type.unit() == TimeUnit::SECOND && (time < 0 || time >= kFullDaySeconds)) {
+              return Status::Invalid(type, " ", time,
+                                     "s does not fit within the acceptable range of ",

Review comment:
       _"does not fit within"_ --> _"is not within"_ I think this message represents better the invalid value. 
   Same applies for the other cases below.

##########
File path: cpp/src/arrow/array/validate.cc
##########
@@ -166,6 +166,80 @@ struct ValidateArrayImpl {
     return Status::OK();
   }
 
+  Status Visit(const Date64Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Date64Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDay = 1000 * 60 * 60 * 24;
+      return VisitArrayDataInline<Date64Type>(
+          data,
+          [&](c_type date) {
+            if (date % kFullDay != 0) {
+              return Status::Invalid(type, date,
+                                     "ms does not represent a whole number of days");
+            }
+            return Status::OK();
+          },
+          []() { return Status::OK(); });
+    }
+    return Status::OK();
+  }
+
+  Status Visit(const Time32Type& type) {
+    RETURN_NOT_OK(ValidateFixedWidthBuffers());
+
+    using c_type = typename Time32Type::c_type;
+    if (full_validation) {
+      constexpr c_type kFullDaySeconds = 60 * 60 * 24;
+      constexpr c_type kFullDayMillis = kFullDaySeconds * 1000;
+      return VisitArrayDataInline<Time32Type>(

Review comment:
       Move these `constexpr` variables inside the lambda before their respective use.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#issuecomment-1002768779


   @JabariBooker Thanks for working on this! I left some minor comments.
   Also, we are missing validation for the following temporal types: `Date32Type` and `TimestampType`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794749664



##########
File path: cpp/src/arrow/array/array_test.cc
##########
@@ -534,7 +534,7 @@ static ScalarVector GetScalars() {
       std::make_shared<UInt64Scalar>(3),
       std::make_shared<DoubleScalar>(3.0),
       std::make_shared<Date32Scalar>(10),
-      std::make_shared<Date64Scalar>(11),
+      std::make_shared<Date64Scalar>(864000000),

Review comment:
       Is this literal meant to be an invalid value?
   There is an extra 0 compared to `1000 x 60 x 60 x 24`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12014:
URL: https://github.com/apache/arrow/pull/12014#discussion_r794516464



##########
File path: cpp/src/arrow/compute/kernels/hash_aggregate_test.cc
##########
@@ -1652,13 +1693,7 @@ TEST(GroupBy, MinMaxTypes) {
                 field("hash_min_max", struct_({field("min", ty), field("max", ty)})),
                 field("key_0", int64()),
             }),
-            R"([
-    [{"min": 1, "max": 3},       1],
-    [{"min": 0, "max": 0},       2],
-    [{"min": null, "max": null}, 3],
-    [{"min": 3, "max": 5},       4],
-    [{"min": 1, "max": 4},       null]
-  ])"),
+            (ty->name() == "date64") ? date64_expected : default_expected),

Review comment:
       nit: ditto here

##########
File path: cpp/src/arrow/compute/kernels/hash_aggregate_test.cc
##########
@@ -1632,11 +1630,54 @@ TEST(GroupBy, MinMaxTypes) {
     [3,    1],
     [0,    2]
 ])",
-                                           R"([
+                                                  R"([
     [0,    2],
     [1,    null],
     [null, 3]
-])"});
+])"};
+
+  const std::vector<std::string> date64_table = {R"([
+    [86400000,    1],
+    [null, 1]
+])",
+                                                 R"([
+    [0,    2],
+    [null, 3],
+    [259200000,    4],
+    [432000000,    4],
+    [345600000,    null],
+    [259200000,    1],
+    [0,    2]
+])",
+                                                 R"([
+    [0,    2],
+    [86400000,    null],
+    [null, 3]
+])"};
+
+  const std::string default_expected =
+      R"([
+    [{"min": 1, "max": 3},       1],
+    [{"min": 0, "max": 0},       2],
+    [{"min": null, "max": null}, 3],
+    [{"min": 3, "max": 5},       4],
+    [{"min": 1, "max": 4},       null]
+    ])";
+
+  const std::string date64_expected =
+      R"([
+    [{"min": 86400000, "max": 259200000},       1],
+    [{"min": 0, "max": 0},       2],
+    [{"min": null, "max": null}, 3],
+    [{"min": 259200000, "max": 432000000},       4],
+    [{"min": 86400000, "max": 345600000},       null]
+    ])";
+
+  for (const auto& ty : types) {
+    SCOPED_TRACE(ty->ToString());
+    auto in_schema = schema({field("argument0", ty), field("key", int64())});
+    auto table =
+        TableFromJSON(in_schema, (ty->name() == "date64") ? date64_table : default_table);

Review comment:
       nit: can check `ty->id() == Type::DATE64`

##########
File path: cpp/src/arrow/compute/kernels/vector_hash_test.cc
##########
@@ -161,42 +161,95 @@ TYPED_TEST_SUITE(TestHashKernelPrimitive, PrimitiveDictionaries);
 TYPED_TEST(TestHashKernelPrimitive, Unique) {
   using T = typename TypeParam::c_type;
   auto type = TypeTraits<TypeParam>::type_singleton();
-  CheckUnique<TypeParam, T>(type, {2, 1, 2, 1}, {true, false, true, true}, {2, 0, 1},
-                            {1, 0, 1});
-  CheckUnique<TypeParam, T>(type, {2, 1, 3, 1}, {false, false, true, true}, {0, 3, 1},
-                            {0, 1, 1});
 
-  // Sliced
-  CheckUnique(ArrayFromJSON(type, "[1, 2, null, 3, 2, null]")->Slice(1, 4),
-              ArrayFromJSON(type, "[2, null, 3]"));
+  if (type->name() == "date64") {

Review comment:
       and here and so on (sorry)

##########
File path: cpp/src/arrow/testing/random.cc
##########
@@ -772,12 +789,42 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
     }
 
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Date32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Date64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int64Type, TimestampType);
-      GENERATE_INTEGRAL_CASE_VIEW(Int32Type, Time32Type);
-      GENERATE_INTEGRAL_CASE_VIEW(Int64Type, Time64Type);
       GENERATE_INTEGRAL_CASE_VIEW(Int32Type, MonthIntervalType);
 
+    case Type::type::DATE64: {
+      using c_type = typename Date64Type::c_type;
+      constexpr c_type kFullDayMillis = 1000 * 60 * 60 * 24;
+      constexpr c_type min_value = std::numeric_limits<c_type>::min() / kFullDayMillis;
+      constexpr c_type max_value = std::numeric_limits<c_type>::max() / kFullDayMillis;
+
+      return *Numeric<Date64Type>(length, min_value, max_value, null_probability)
+                  ->View(field.type());
+    }
+
+    case Type::type::TIME32: {
+      TimeUnit::type unit = std::dynamic_pointer_cast<Time32Type>(field.type())->unit();

Review comment:
       nit: we have `internal::checked_[pointer_]cast` which will use dynamic or static cast depending on debug/release build




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org