You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/14 21:19:34 UTC

[GitHub] [arrow] seddonm1 opened a new pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

seddonm1 opened a new pull request #8892:
URL: https://github.com/apache/arrow/pull/8892


   This PR adds support for what the `sqlparser` crate calls `TypedString` which is basically syntactic sugar for an inline-cast. As this was an effort to get the `TPC-H` queries behaving correctly I then went a step further and added support for `Date` (temporal) coercion. I can split this PR if needed.
   
   ```sql
   where
       l_shipdate <= date '1998-09-02'
   ```
   
   is equivalent to
   
   ```sql
   where
       l_shipdate <= CAST('1998-09-02' AS DATE)
   ```
   
   FYI I am planning to tackle `INTERVAL` next.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744064489


   @andygrove makes sense


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r542000995



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Oh yes, sorry I thought you meant add it to the `l_shipdate` component. Yes, i will fix that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744038027


   @andygrove I am actually expecting that comparison on date (in ms sine epoch / int64) should be faster than on strings (if no weird implementation)?
   
   @seddonm1 unfortunately I am getting an error on master, as the CSV reader doesn't support dates yet.
   
   ```
   Running benchmarks with the following options: BenchmarkOpt { query: 10, debug: false, iterations: 10, concurrency: 10, batch_size: 20000, path: "/home/danielheres/Code/gdd/arrow/rust/benchmarks/tpch-dbgen", file_format: "tbl", mem_table: true }
   Loading table 'part' into memory
   Loaded table 'part' into memory in 157 ms
   Loading table 'supplier' into memory
   Loaded table 'supplier' into memory in 12 ms
   Loading table 'partsupp' into memory
   Loaded table 'partsupp' into memory in 553 ms
   Loading table 'customer' into memory
   Loaded table 'customer' into memory in 139 ms
   Loading table 'orders' into memory
   Error: ArrowError(ParseError("Unsupported data type Date32(Day)"))
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541999967



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Actually, no. The raw query as per TPC-H does not use between for that clause:
   
   ```sql
   where l_shipdate >= date '[DATE]' 
   and l_shipdate < date '[DATE]' + interval '1' year
   and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01
   ```
   
   Which is different as BETWEEN is inclusive (>= AND <=)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744717221


   Sorry I reopened this by accident. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541999967



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Actually, no. The raw query as per TPC-H does not use between for that clause:
   
   ```sql
   where l_shipdate >= date '[DATE]' 
   and l_shipdate < date '[DATE]' + interval '1' year
   and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01
   ```
   
   Which is different as BETWEEN is inclusive (`>=` AND `<=`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744038027


   @andygrove I am actually expecting that comparison on date (in ms since epoch / int64) should be faster than on strings (if no weird implementation)?
   
   @seddonm1 unfortunately I am getting an error on master, as the CSV reader doesn't support dates yet.
   
   ```
   Running benchmarks with the following options: BenchmarkOpt { query: 10, debug: false, iterations: 10, concurrency: 10, batch_size: 20000, path: "/home/danielheres/Code/gdd/arrow/rust/benchmarks/tpch-dbgen", file_format: "tbl", mem_table: true }
   Loading table 'part' into memory
   Loaded table 'part' into memory in 157 ms
   Loading table 'supplier' into memory
   Loaded table 'supplier' into memory in 12 ms
   Loading table 'partsupp' into memory
   Loaded table 'partsupp' into memory in 553 ms
   Loading table 'customer' into memory
   Loaded table 'customer' into memory in 139 ms
   Loading table 'orders' into memory
   Error: ArrowError(ParseError("Unsupported data type Date32(Day)"))
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-743924925


   I should note that these changes make the test more realistic and will likely reduce performance so we will need to bear this in mind when comparing benchmark results to previous results.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744717032


   > @andygrove checked it, in #8913 , indeed parsing is a bit slower, queries are faster
   
   
   That is the right tradeoff in my opinion


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r542000454



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       I see a between on the last line?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-743928132


   Rebased so should be good to merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541998828



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       yes, sorry i will fix this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-743913249


   @andygrove could you please have a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541944358



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Shouldn't this have `between` that was also added recently? @seddonm1 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-743926150


   Thanks Andy. I will resolve the merge conflict.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #8892:
URL: https://github.com/apache/arrow/pull/8892


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541999967



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Actually, no. The raw query as per TPC-H does not use between for that clause:
   
   ```sql
   where l_shipdate >= date '[DATE]' 
   and l_shipdate < date '[DATE]' + interval '1' year
   and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-742766152


   https://issues.apache.org/jira/browse/ARROW-10817


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-742770895


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=h1) Report
   > Merging [#8892](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=desc) (536bb1b) into [master](https://codecov.io/gh/apache/arrow/commit/3deae8dd50da773ba215704e567d9937b04b02c5?el=desc) (3deae8d) will **increase** coverage by `24.05%`.
   > The diff coverage is `77.41%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8892/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #8892       +/-   ##
   ===========================================
   + Coverage   52.98%   77.04%   +24.05%     
   ===========================================
     Files         172      173        +1     
     Lines       30750    40172     +9422     
   ===========================================
   + Hits        16294    30950    +14656     
   + Misses      14456     9222     -5234     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `0.00% <0.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.33% <96.00%> (+0.12%)` | :arrow_up: |
   | [...ntegration-testing/src/bin/arrow-file-to-stream.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctZmlsZS10by1zdHJlYW0ucnM=) | | |
   | [...ion-testing/src/bin/arrow-json-integration-test.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctanNvbi1pbnRlZ3JhdGlvbi10ZXN0LnJz) | | |
   | [...ntegration-testing/src/bin/arrow-stream-to-file.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctc3RyZWFtLXRvLWZpbGUucnM=) | | |
   | [rust/parquet/src/util/hash\_util.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL2hhc2hfdXRpbC5ycw==) | `95.77% <0.00%> (ø)` | |
   | [rust/parquet/src/util/bit\_packing.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL2JpdF9wYWNraW5nLnJz) | `99.96% <0.00%> (ø)` | |
   | [rust/parquet/src/file/footer.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL2Zvb3Rlci5ycw==) | `96.22% <0.00%> (ø)` | |
   | [rust/parquet/src/util/test\_common/file\_util.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL3Rlc3RfY29tbW9uL2ZpbGVfdXRpbC5ycw==) | `77.77% <0.00%> (ø)` | |
   | ... and [49 more](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=footer). Last update [b0b9269...536bb1b](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#discussion_r541944358



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -363,9 +363,9 @@ fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<Logic
             from
                 lineitem
             where
-                l_shipdate >= '1994-01-01'
-                and l_shipdate < '1995-01-01'
-                and l_discount between 0.06 - 0.01 and 0.06 + 0.01
+                l_shipdate >= date '1994-01-01'
+                and l_shipdate < date '1995-01-01'
+                and l_discount > 0.06 - 0.01 and l_discount < 0.06 + 0.01

Review comment:
       Shouldn't this have between that was also added recently? @seddonm1 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744062498


   @Dandandan I was thinking about the overhead of converting the strings to dates, but you're right, if the data is stored natively in date format in Parquet then it should be faster. It would be slower than before against CSV though, probably.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #8892:
URL: https://github.com/apache/arrow/pull/8892


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-742770895


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=h1) Report
   > Merging [#8892](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=desc) (d21097e) into [master](https://codecov.io/gh/apache/arrow/commit/3deae8dd50da773ba215704e567d9937b04b02c5?el=desc) (3deae8d) will **increase** coverage by `0.02%`.
   > The diff coverage is `77.41%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8892/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8892      +/-   ##
   ==========================================
   + Coverage   52.98%   53.01%   +0.02%     
   ==========================================
     Files         172      172              
     Lines       30750    30781      +31     
   ==========================================
   + Hits        16294    16319      +25     
   - Misses      14456    14462       +6     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `0.00% <0.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.20% <96.00%> (-0.01%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/take.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL3Rha2UucnM=) | `94.81% <0.00%> (+0.21%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=footer). Last update [b0b9269...d21097e](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-742770895


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=h1) Report
   > Merging [#8892](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=desc) (056a199) into [master](https://codecov.io/gh/apache/arrow/commit/3deae8dd50da773ba215704e567d9937b04b02c5?el=desc) (3deae8d) will **increase** coverage by `24.06%`.
   > The diff coverage is `81.08%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8892/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #8892       +/-   ##
   ===========================================
   + Coverage   52.98%   77.04%   +24.06%     
   ===========================================
     Files         172      173        +1     
     Lines       30750    40178     +9428     
   ===========================================
   + Hits        16294    30957    +14663     
   + Misses      14456     9221     -5235     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `0.00% <0.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2Nhc3QucnM=) | `96.34% <96.77%> (+0.13%)` | :arrow_up: |
   | [...ntegration-testing/src/bin/arrow-stream-to-file.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctc3RyZWFtLXRvLWZpbGUucnM=) | | |
   | [...ion-testing/src/bin/arrow-json-integration-test.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctanNvbi1pbnRlZ3JhdGlvbi10ZXN0LnJz) | | |
   | [...ntegration-testing/src/bin/arrow-file-to-stream.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9pbnRlZ3JhdGlvbi10ZXN0aW5nL3NyYy9iaW4vYXJyb3ctZmlsZS10by1zdHJlYW0ucnM=) | | |
   | [rust/parquet/src/file/footer.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9maWxlL2Zvb3Rlci5ycw==) | `96.22% <0.00%> (ø)` | |
   | [rust/parquet/src/util/bit\_packing.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL2JpdF9wYWNraW5nLnJz) | `99.96% <0.00%> (ø)` | |
   | [rust/parquet/src/util/test\_common/file\_util.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL3Rlc3RfY29tbW9uL2ZpbGVfdXRpbC5ycw==) | `77.77% <0.00%> (ø)` | |
   | [rust/parquet/src/util/hash\_util.rs](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy91dGlsL2hhc2hfdXRpbC5ycw==) | `95.77% <0.00%> (ø)` | |
   | ... and [49 more](https://codecov.io/gh/apache/arrow/pull/8892/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=footer). Last update [b0b9269...056a199](https://codecov.io/gh/apache/arrow/pull/8892?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8892: ARROW-10817: [Rust] [DataFusion] Implement TypedString and DATE coercion

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8892:
URL: https://github.com/apache/arrow/pull/8892#issuecomment-744637812


   @andygrove checked it, in https://github.com/apache/arrow/pull/8913 , indeed parsing is a bit slower, queries are faster


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org