You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/09 19:18:38 UTC

[GitHub] [arrow-datafusion] andre-cc-natzka opened a new pull request, #4156: Support time32 and time64

andre-cc-natzka opened a new pull request, #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156

   # Which issue does this PR close?
   
   To my knowledge, this PR does not tackle an existing issue.
   
   Closes #.
   
   # Rationale for this change
   
   Time coercion did not support `Time` types, while it did support `Date` and `Timestamp`. It would be useful to support both `Time32` and `Time64` for type coercion as well, thus ensuring that, for instance, a pair composed by a `Utf8` and a `Time32` is coerced to a `Time32`. The implementation has to take into account the valid `TimeUnit` for both `Time32` and `Time64` and implement them consistently throughout the code. Additionally, it would also be useful to implement support for `Time32` and `Time64` in `hash_join` and `hash_utils`.
   
   # What changes are included in this PR?
   
   A series of adaptations were implemented throughout the code as to consistently support `Time32(TimeUnit::Second)`, `Time32(TimeUnit::Millisecond)`, `Time64(TimeUnit::Microsecond)` and `Time64(TimeUnit::Nanosecond)`. The changes are mostly localized in the `binary.rs` file from `type_coercion`, but support for time types also included numerous changes to `scalar.rs` from the `common` crate and some modules from the `physical_expr` crate. 
   
   Although most changes are straightforward, some non-trivial modifications were added in the proto crate. The `datafusion.proto` file is not changed, so it keeps supporting only a generic `Time64` type, with no unit specification, interpreting it to nanosecond accuracy. The `datafusion/proto/src/to_proto.rs` and `datafusion/proto/src/from_proto.rs` files are adapted consistently, as the proto `Time64` translates into a `Time64Nanosecond`. On the other hand, a proto `Time64` can be obtained from all four `Time64Nanosecond`, `Time64Microsecond`, `Time32Millisecond` and `Time32Second` scalar value types, as implemented in `datafusion/proto/src/to_proto.rs`.
   
   # Are these changes tested?
   
   Yes, the changes in `type_coercion` are tested in the `binary.rs` module, where the existing tests have been extended to account for the cases of type coercion involving `Time32` and `Time64` types, and tests are passing.
   
   # Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1317716233

   Thanks for the hint, @alamb! I have done it and all tests are passing now. Actually there was one failing due to an experiment I had made and had forgotten it, now it is fixed!
   
   Thanks for helping with the merge, and no problem at all, I'll wait for you to be available. Having said that, I am looking forward to it! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1317470123

   > There are also 101 failed tests and 340 passing tests, which is the same as the result one gets when running the most recent version of the master, so that makes me think that this is potentially ok?
   
   I suspect this has to do with perhaps not having submodules checked out (do the tests pass with `git submodule update --init`?
   
   THanks for the work here @andre-cc-natzka  -- I have this PR on my radar and want to help push it over the line (aka get the CI clean and passing). I will do so but may not have a chance to do for another day or two


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1314574846

   So, I have added the tests required by @alamb and I rebased the branch on top of the most recent version of the master. I had a few troubles while rebasing but I could solve them and everything seems to be working fine now. I will now wait for your final revision (and approval of workflows) before merge. Thanks for your support!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018802183


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   I don't think so, in fact I would guess we wanted to represent the other types as well, i.e. protos would support `Time32Second`, `Time32Millisecond`, `Time64Microsecond`, `Time64Nanosecond`. I didn't want to change the protos before discussing them with you first. I would suggest the following changes:
   
   `// Contains all valid datafusion scalar type except for
   // List
   enum PrimitiveScalarType{
   
       (...)
       INTERVAL_MONTHDAYNANO = 29;
   
       BINARY = 25;
       LARGE_BINARY = 26;
   
       TIME32 = 27;
       TIME64 = 28;
   }`
   
   and
   
   `message ScalarValue{
       oneof value {
   
           (...)
   
           ScalarTime32Value time32_value = 30;
           ScalarTime64Value time64_value = 31;
           IntervalMonthDayNanoValue interval_month_day_nano = 32;
           StructValue struct_value = 33;
       }
   }`
   
   with the new messages:
   
   `message ScalarTime32Value {
     oneof value {
       int32 time32_second_value = 1;
       int32 time32_millisecond_value = 2;
     };
   }
   
   message ScalarTime64Value {
     oneof value {
       int64 time64_microsecond_value = 1;
       int64 time64_nanosecond_value = 2;
     };
   }`
   
   Then, I would change  the `to_proto.rs` and `from_proto.rs` files consistently. What do you think, @isidentical, @alamb, @waitingkuo, @avantgardnerio?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1311906372

   Thanks to @alamb and @waitingkuo for your hints! I will implement the missing tests and ask you if I have questions early next week, then I'll follow @isidentical's suggestion and rebase this into the current master branch, solve conflicts and merge!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
isidentical commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1310336192

   @andre-cc-natzka I also highly recommend rebasing against the current master branch, since there were a couple of changes in how we process primitive scalars. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1022929133


##########
datafusion/core/src/physical_plan/hash_utils.rs:
##########
@@ -0,0 +1,830 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   I don't understand this file -- it looks like maybe it was previously moved into physical-expr and maybe a merge conflict caused it to end up here?
   
   ```
   find . -name 'hash_utils.rs'
   ./datafusion/core/src/physical_plan/hash_utils.rs
   ./datafusion/physical-expr/src/hash_utils.rs
   ``



##########
datafusion/core/tests/sql/select.rs:
##########
@@ -814,6 +814,8 @@ async fn query_on_string_dictionary() -> Result<()> {
     ];
     assert_batches_eq!(expected, &actual);
 
+    // filtering with Time32 and Time64 types

Review Comment:
   this comment seems out of place



##########
datafusion/proto/src/lib.rs:
##########
@@ -606,9 +606,21 @@ mod roundtrip_tests {
             ScalarValue::Date32(Some(0)),
             ScalarValue::Date32(Some(i32::MAX)),
             ScalarValue::Date32(None),
-            ScalarValue::Time64(Some(0)),
-            ScalarValue::Time64(Some(i64::MAX)),
-            ScalarValue::Time64(None),
+            ScalarValue::Date64(Some(0)),

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1315470038

   > out of bound value make the casting crash, i think it's worth fixing as well. Perhaps another follow on issue/pr to fix
   
   FWIW I reran those examples from @waitingkuo  using datafusion-cli on this branch and they no longer panics (not sure if it is this PR or something else). 
   
   ```
   cd datafusion-cli
   cargo run datafusion-cli
   ...
   
   DataFusion CLI v14.0.0
   ❯ select 86400000000000::time;
   +-----------------------+
   | Int64(86400000000000) |
   +-----------------------+
   | ERROR CONVERTING DATE |
   +-----------------------+
   1 row in set. Query took 0.041 seconds.
   
   ❯ select (-1)::time;
   +-----------------------+
   | Int64(-1)             |
   +-----------------------+
   | ERROR CONVERTING DATE |
   +-----------------------+
   1 row in set. Query took 0.002 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1311270098

   @alamb  @andre-cc-natzka 
   out of bound value make the casting crash, i think it's worth fixing as well. Perhaps another follow on issue/pr to fix
   
   ```bash
   ➜  datafusion-cli git:(Support_Time32_and_Time64) ✗ cargo run
       Finished dev [unoptimized + debuginfo] target(s) in 0.48s
        Running `target/debug/datafusion-cli`
   DataFusion CLI v13.0.0
   ❯ select 86400000000000::time;
   thread 'main' panicked at 'invalid time', /Users/willy/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.22/src/naive/time/mod.rs:422:67
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   ```bash
   ➜  datafusion-cli git:(Support_Time32_and_Time64) ✗ cargo run
       Finished dev [unoptimized + debuginfo] target(s) in 0.46s
        Running `target/debug/datafusion-cli`
   DataFusion CLI v13.0.0
   ❯ select (-1)::time;
   thread 'main' panicked at 'invalid time', /Users/willy/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.22/src/naive/time/mod.rs:422:67
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1322437617

   Thank you very much for your commitment into making this through, @alamb! I am very happy with this outcome.
   
   In the meanwhile, I've realized that type coercion is not supported for a (`Timestamp`, `Utf8`) pair, which in my opinion would be very interesting and relevant. I looked a bit into this and have a suggestion to add this feature, so I took the freedom to create a new issue [#4311](https://github.com/apache/arrow-datafusion/issues/4311) to address it. 
   
   As I mention there, unfortunately Arrow does not support type coercion for (`Timestamp`, `Utf8`) if the time unit considered is other than nanoseconds, so the output type for type coercion between a string and a timestamp has to be a timestamp to nanosecond accuracy, no matter the original time unit. 
   
   I will open a PR with the suggested changes, and if you agree we could add that feature as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1321239949

   Benchmark runs are scheduled for baseline = 0bb5d4d9b885b909bbdd27230880c8a94602f295 and contender = bea645cc99f955e0233e742e9e4b7266cfacbbe7. bea645cc99f955e0233e742e9e4b7266cfacbbe7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/5a335e90230d414195344e2f36cf8a3a...80d64e4469434795b8e3b6ed259c0948/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4529533f4a0a445e90e887a11b85a4a4...07ccc245877941b382795781cdec80ca/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5d9142922a6341a78b1c8dde580ee9c7...52d42b0b461b48bdbf554dbb4f95c1d6/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/add7a79edfff4940a3beb0e4352ab3ee...880eb9e2b1784157a3272d5fceb6648b/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1321119628

   I took the liberty of merging master into this PR, resolving the conflicts, and updating a chrono usage that was deprecated. I plan to merge it once the CI checks have completed. 
   
   Thanks again @andre-cc-natzka 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1321239077

   CI failure due to https://github.com/apache/arrow-datafusion/issues/4294 (not this PR)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1019496373


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   I don't think we have worried about backwards compatibility in serialization before -- I recommend we simply make the protobuf definitions match the changes to `ScalarValue`s



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1019502686


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -824,6 +854,30 @@ mod tests {
             Operator::Lt,
             DataType::Date64
         );
+        test_coercion_binary_rule!(

Review Comment:
   👍  nice
   
   I double checked that this casting is supported in arrow-rs, which they are ✅ 
   
   https://github.com/apache/arrow-rs/blob/e44cb5b478257751916d1674292123eaf5d80a7c/arrow-cast/src/cast.rs#L149-L158
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1309959837

   > Thank you @andre-cc-natzka -- this looks great. I will plan to review this carefully in the morning but after a quick skim this looks very nice
   > 
   > cc @waitingkuo and @avantgardnerio
   
   Thank you for your prompt appreciation, @alamb! I am awaiting your more detailed review and willing to address your comments. For the time being there was already important point raised and I left a suggestion to change `datafusion.proto` in the answer. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1310534121

   hi @andre-cc-natzka  this looks great, thanks!
   
   it'll be great if we can some basic the test cases for time64(micro), time32(millis), time32(seconds).
   right now we only have it for time64(nanos)
   
   https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sql/timestamp.rs#L1555-L1571


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
isidentical commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018530191


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   Question: is there a particular reason why we can't represent these types directly in our serialization format other than backwards compatibility?



##########
datafusion/common/src/scalar.rs:
##########
@@ -2100,15 +2188,16 @@ impl TryFrom<ScalarValue> for i32 {
     }
 }
 
-// special implementation for i64 because of TimeNanosecond
+// special implementation for i64 because of Date64, Time64, Time64 and Timestamp

Review Comment:
   nit: Seems like `Time64` is repeated
   ```suggestion
   // special implementation for i64 because of Date64, Time64, and Timestamp
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1315517118

   I have made the changes and pushed the new version. I would expect this to be ready to merge now, however I am a bit puzzled with the failed Rust workflow runs. There are also 101 failed tests and 340 passing tests, which is the same as the result one gets when running the most recent version of the master, so that makes me think that this is potentially ok? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018771826


##########
datafusion/common/src/scalar.rs:
##########
@@ -2100,15 +2188,16 @@ impl TryFrom<ScalarValue> for i32 {
     }
 }
 
-// special implementation for i64 because of TimeNanosecond
+// special implementation for i64 because of Date64, Time64, Time64 and Timestamp

Review Comment:
   Thanks @isidentical!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018802183


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   I don't think so, in fact I would guess we wanted to represent the other types as well, i.e. protos would support `Time32Second`, `Time32Millisecond`, `Time64Microsecond`, `Time64Nanosecond`. I didn't want to change the protos before discussing them with you first. I would suggest the following changes:
   
   ```
   enum PrimitiveScalarType{
   
       [...]
       INTERVAL_MONTHDAYNANO = 29;
   
       BINARY = 25;
       LARGE_BINARY = 26;
   
       TIME32 = 27;
       TIME64 = 28;
   }
   ```
   
   and
   
   ```
   message ScalarValue{
       oneof value {
   
           [...]
   
           ScalarTime32Value time32_value = 30;
           ScalarTime64Value time64_value = 31;
           IntervalMonthDayNanoValue interval_month_day_nano = 32;
           StructValue struct_value = 33;
       }
   }
   ```
   
   with the new messages:
   
   ```
   message ScalarTime32Value {
     oneof value {
       int32 time32_second_value = 1;
       int32 time32_millisecond_value = 2;
     };
   }
   
   message ScalarTime64Value {
     oneof value {
       int64 time64_microsecond_value = 1;
       int64 time64_nanosecond_value = 2;
     };
   }
   ```
   
   Then, I would change  the `to_proto.rs` and `from_proto.rs` files consistently. What do you think, @isidentical, @alamb, @waitingkuo, @avantgardnerio?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018802183


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   I don't think so, in fact I would guess we wanted to represent the other types as well, i.e. protos would support `Time32Second`, `Time32Millisecond`, `Time64Microsecond`, `Time64Nanosecond`. I didn't want to change the protos before discussing them with you first. I would suggest the following changes:
   
   `enum PrimitiveScalarType{
   
       [...]
       INTERVAL_MONTHDAYNANO = 29;
   
       BINARY = 25;
       LARGE_BINARY = 26;
   
       TIME32 = 27;
       TIME64 = 28;
   }`
   
   and
   
   `message ScalarValue{
       oneof value {
   
           [...]
   
           ScalarTime32Value time32_value = 30;
           ScalarTime64Value time64_value = 31;
           IntervalMonthDayNanoValue interval_month_day_nano = 32;
           StructValue struct_value = 33;
       }
   }`
   
   with the new messages:
   
   `message ScalarTime32Value {
     oneof value {
       int32 time32_second_value = 1;
       int32 time32_millisecond_value = 2;
     };
   }
   
   message ScalarTime64Value {
     oneof value {
       int64 time64_microsecond_value = 1;
       int64 time64_nanosecond_value = 2;
     };
   }`
   
   Then, I would change  the `to_proto.rs` and `from_proto.rs` files consistently. What do you think, @isidentical, @alamb, @waitingkuo, @avantgardnerio?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
waynexia commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018850468


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   This change looks reasonable to me. It also corresponds to arrow's definition: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L214-L231



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Support time32 and time64

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1018912511


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   Thanks @waynexia! I will thus push my most up-to-date version where I implement these changes, with consistent modifications in `to_proto.rs` and `from_proto.rs`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1022965056


##########
datafusion/core/src/physical_plan/hash_utils.rs:
##########
@@ -0,0 +1,830 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   Hello @alamb! I have been looking into the merges and I think that is exactly what happened. Indeed, this file was present in version 13, but it is no longer there. I had to change `/datafusion/core/src/physical_plan/joins/hash_join.rs` and I ended up modifying `datafusion/core/src/physical_plan/hash_utils.rs` as well. Then I wasn't sure if this file had been eliminated or not, and while solving the merge conflicts I thought it had not been, but I was wrong. 
   
   It seems clear to me now that this file has to be removed and everything will be solved. But maybe I am missing something? I will push the new version without this file, please let me know what you think.
   
   And thank you very much for your words and for going through my changes! We are happy to cooperate with you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1020343920


##########
datafusion/proto/src/to_proto.rs:
##########
@@ -1188,7 +1188,28 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue {
                 })
             }
 
-            datafusion::scalar::ScalarValue::Time64(v) => {
+            // Since the protos only support Time64 and always interpret it to nanosecond accuracy,

Review Comment:
   Thanks for the clarification, @alamb.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1020335882


##########
datafusion/expr/src/type_coercion/binary.rs:
##########
@@ -824,6 +854,30 @@ mod tests {
             Operator::Lt,
             DataType::Date64
         );
+        test_coercion_binary_rule!(

Review Comment:
   Thank you very much for checking this, @alamb! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andre-cc-natzka commented on a diff in pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
andre-cc-natzka commented on code in PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#discussion_r1022945017


##########
datafusion/core/tests/sql/select.rs:
##########
@@ -814,6 +814,8 @@ async fn query_on_string_dictionary() -> Result<()> {
     ];
     assert_batches_eq!(expected, &actual);
 
+    // filtering with Time32 and Time64 types

Review Comment:
   Thanks, that one escaped me somehow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156#issuecomment-1321239104

   Thanks again @andre-cc-natzka 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #4156: Full support for time32 and time64 literal values (`ScalarValue`)

Posted by GitBox <gi...@apache.org>.
alamb merged PR #4156:
URL: https://github.com/apache/arrow-datafusion/pull/4156


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org