You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/29 10:47:19 UTC
[GitHub] [arrow] Dandandan opened a new pull request #9359: ARROW-11426: [Rust][DataFusion] Start of EXTRACT support for DataFusion [WIP]
Dandandan opened a new pull request #9359:
URL: https://github.com/apache/arrow/pull/9359
This PR starts implementing support for the expression, to retrieve date parts (hours, minutes, days, etc.) from temporal data types:
`EXTRACT (HOUR from dt)`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782847077
Integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 and is related to this RP
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769918230
This is ready for review now. @alamb @nevi-me tagging you, because I think you would be interested in more temporal support.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567414429
##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+ array::{
+ Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+ TimestampNanosecondArray, TimestampSecondArray,
+ },
+ compute::hour,
+ datatypes::{DataType, Schema, TimeUnit},
+ record_batch::RecordBatch,
+};
+
+use crate::{
+ error::DataFusionError,
+ logical_plan::DatePart,
+ physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+ }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+ date_part: DatePart,
+ expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+ /// Create new Extract expression
+ pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+ Self { date_part, expr }
+ }
+}
+
+impl PhysicalExpr for Extract {
+ fn as_any(&self) -> &dyn Any {
+ self
+ }
+
+ fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+ Ok(DataType::Int32)
+ }
+
+ fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+ self.expr.nullable(input_schema)
+ }
+
+ fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+ let value = self.expr.evaluate(batch)?;
+ let data_type = value.data_type();
+ let array = match value {
+ ColumnarValue::Array(array) => array,
+ ColumnarValue::Scalar(scalar) => scalar.to_array(),
+ };
+
+ match data_type {
+ DataType::Date32 => {
+ let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok(ColumnarValue::Array(Arc::new(hour(array)?)))
Review comment:
There is no other datepart for now in the enum, so I guess that might generate some clippy warnings. But we could match directly on the datepart. The errorshould be generated elsewhere already (when building the logical plan), agree makes sense to add a test for that π.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782839678
@alamb conflict solved :+1:
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567304269
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
πΊ awesome! π
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567411606
##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1964,6 +1964,20 @@ async fn crypto_expressions() -> Result<()> {
Ok(())
}
+#[tokio::test]
+async fn extract_date_part() -> Result<()> {
+ let mut ctx = ExecutionContext::new();
+ let sql = "SELECT
+ EXTRACT(HOUR FROM CAST('2020-01-01' AS DATE)),
+ EXTRACT(HOUR FROM to_timestamp('2020-09-08T12:00:00+00:00'))
Review comment:
I think adding coverage for the other date parts would be valuable here (even if they error)
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
I also agree (belatedly) that having a more efficient implementation of constant function arguments (e.g. #9376) is the way to go!
In terms of adding variants to `Expr` I think it will need be done when the semantics of whatever expression is being added can't realistically be expressed as a function (e.g. CASE).
So in this case, given @jorgecarleitao is cranking along with #9376 it seems like perhaps this PR should perhaps try and translate `EXTRACT` into a function.
##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+ array::{
+ Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+ TimestampNanosecondArray, TimestampSecondArray,
+ },
+ compute::hour,
+ datatypes::{DataType, Schema, TimeUnit},
+ record_batch::RecordBatch,
+};
+
+use crate::{
+ error::DataFusionError,
+ logical_plan::DatePart,
+ physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+ }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+ date_part: DatePart,
+ expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+ /// Create new Extract expression
+ pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+ Self { date_part, expr }
+ }
+}
+
+impl PhysicalExpr for Extract {
+ fn as_any(&self) -> &dyn Any {
+ self
+ }
+
+ fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+ Ok(DataType::Int32)
+ }
+
+ fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+ self.expr.nullable(input_schema)
+ }
+
+ fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+ let value = self.expr.evaluate(batch)?;
+ let data_type = value.data_type();
+ let array = match value {
+ ColumnarValue::Array(array) => array,
+ ColumnarValue::Scalar(scalar) => scalar.to_array(),
+ };
+
+ match data_type {
+ DataType::Date32 => {
+ let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok(ColumnarValue::Array(Arc::new(hour(array)?)))
Review comment:
I find it confusing that `date_part` is passed all the way down in the Exprs / trees only to be ignored in the actual implementation which directly calls `hour. I can see that the expr seems to always be made with `DatePart::Hour` but I am not 100% sure.
I am fine with not supporting all the various date parts initially, but I would recommend the following as a way of documenting through code / safe guard against future bugs:
1. Add a test for `EXTRACT DAY from timestamp` and show that it generates a useful error
2. Add a check in this function for `date_part != DataPart::Hour` and throw an error
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support [WIP]
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (dbbb647) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.02%`.
> The diff coverage is `63.01%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
- Coverage 81.98% 81.96% -0.03%
==========================================
Files 216 217 +1
Lines 53299 53366 +67
==========================================
+ Hits 43699 43743 +44
- Misses 9600 9623 +23
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `0.00% <0.00%> (ΓΈ)` | |
| [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
| [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.68% <54.16%> (-1.27%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `81.03% <82.35%> (+0.01%)` | :arrow_up: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `60.15% <0.00%> (+1.19%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...dbbb647](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579655053
##########
File path: rust/datafusion/src/physical_plan/datetime_expressions.rs
##########
@@ -344,6 +347,98 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
})
}
+macro_rules! extract_date_part {
+ ($ARRAY: expr, $FN:expr) => {
+ match $ARRAY.data_type() {
+ DataType::Date32 => {
+ let array = $ARRAY.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok($FN(array)?)
+ }
+ DataType::Date64 => {
+ let array = $ARRAY.as_any().downcast_ref::<Date64Array>().unwrap();
+ Ok($FN(array)?)
+ }
+ DataType::Timestamp(time_unit, None) => match time_unit {
+ TimeUnit::Second => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampSecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Millisecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampMillisecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Microsecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampMicrosecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Nanosecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampNanosecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ },
+ datatype => Err(DataFusionError::Internal(format!(
+ "Extract does not support datatype {:?}",
+ datatype
+ ))),
+ }
+ };
+}
+
+/// DATE_PART SQL function
+pub fn date_part(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+ if args.len() != 2 {
+ return Err(DataFusionError::Execution(
+ "Expected two arguments in DATE_PART".to_string(),
+ ));
+ }
+ let (date_part, array) = (&args[0], &args[1]);
+
+ let date_part = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) = date_part {
+ v
+ } else {
+ return Err(DataFusionError::Execution(
+ "First argument of `DATE_PART` must be non-null scalar Utf8".to_string(),
+ ));
+ };
+
+ let is_scalar = matches!(array, ColumnarValue::Scalar(_));
+
+ let array = match array {
Review comment:
Yes, indeed. For now we can use this approach to avoid reimplementing hours/years etc, with a bit of overhead.
Maybe longer term would be nice to have something like `Datum` in Arrow in order to both gain some performance and avoid reimplementing things for the scalar case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (40e184b) into [master](https://codecov.io/gh/apache/arrow/commit/aebabca047a8adeb9f6d4fc81e29d12cc42e629b?el=desc) (aebabca) will **increase** coverage by `0.01%`.
> The diff coverage is `87.14%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
+ Coverage 82.27% 82.29% +0.01%
==========================================
Files 244 244
Lines 55555 55616 +61
==========================================
+ Hits 45708 45767 +59
- Misses 9847 9849 +2
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `81.13% <50.00%> (ΓΈ)` | |
| [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.83% <66.66%> (-0.35%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `73.82% <100.00%> (+1.46%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.62% <100.00%> (+0.09%)` | :arrow_up: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [aebabca...40e184b](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780474154
Thanks @jorgecarleitao . Not blocked ATM, this was waiting before on the scalar changes. Now that's done I hope to continue with this soon!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782847077
Integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 and is related to this PR
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
I just looked into this. The downside currently is w.r.t. performance and being able to utilize Arrow kernels. The `ScalarFunction` repeats scalar values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` will be for repeated for each row.
In PostgreSQL, expressions are not allowed for `date_part` / `extract`, `date_trunc` etc. :
https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
> Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract.
This is also happening with the existing `date_trunc` function where currently the date part (as string) is repeated / matched against for each row and also evaluated per row (see below). That won't work with the `hour` kernel for obvious reasons.
```rust
let result = range
.map(|i| {
if array.is_null(i) {
Ok(0_i64)
} else {
let date_time = match granularity_array.value(i) {
"second" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0)),
"minute" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0))
.and_then(|d| d.with_second(0)),
[...]
```
So I think here we have a few options:
* Refactor/Optimize `ScalarFunction` to also allow for scalar values, be able to check on them + support literals (I guess it should use `ColumnarValue` instead of just `Array`s).
* Have a similar (inefficient) implementation for extract from / date_part to compute as currently `date_trunc`. I think that
* Refactor/Optimize `ScalarFunction` later and keep the Extract as is for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567202684
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -237,6 +245,25 @@ impl Expr {
Expr::Wildcard => Err(DataFusionError::Internal(
"Wildcard expressions are not valid in a logical query plan".to_owned(),
)),
+ Expr::Extract {
+ date_part: DatePart::Hour,
+ expr,
+ } => {
+ let inner = expr.get_type(schema)?;
+ match inner {
+ DataType::Date32(_)
+ | DataType::Date64(_)
+ | DataType::Time32(_)
+ | DataType::Time64(_)
+ | DataType::Timestamp(arrow::datatypes::TimeUnit::Nanosecond, None) => {
Review comment:
Is there a particular reason to only support `Timestamp(Nanosecond)`? `Millisecond,Second,Microsecond` should also work, no?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780266711
Are you blocked here, @Dandandan ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
I just looked into this. The downside currently is w.r.t. performance and being able to utilize Arrow kernels. The `ScalarFunction` implementation repeats scalar values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` will be for repeated for each row.
In PostgreSQL, expressions are not allowed for `date_part` / `extract`, `date_trunc` etc. :
https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
> Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract.
This is also happening with the existing `date_trunc` function where currently the date part (as string) is repeated / matched against for each row and also evaluated per row (see below). That won't work with the `hour` kernel for obvious reasons.
```rust
let result = range
.map(|i| {
if array.is_null(i) {
Ok(0_i64)
} else {
let date_time = match granularity_array.value(i) {
"second" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0)),
"minute" => array
.value_as_datetime(i)
.and_then(|d| d.with_nanosecond(0))
.and_then(|d| d.with_second(0)),
[...]
```
So I think here we have a few options:
* Refactor/Optimize `ScalarFunction` to also allow for scalar values, be able to check on them + support literals (I guess it should use `ColumnarValue` instead of just `Array`s).
* Have a similar (inefficient) implementation for extract from / date_part to compute as currently `date_trunc`. I think that
* Refactor/Optimize `ScalarFunction` later and keep the Extract as is for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567202041
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
I have been trying to move them to the `ScalarFunction` to avoid having many variants around, mostly because adding a new item to an enum is backward incompatible and thus it may be beneficial to reserve that for operations that cannot be described by `ScalarFunction`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782672779
Thanks @Dandandan -- I am about to run out of time for today but I will plan to merge this in tomorrow if someone doesn't beat me to it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579656523
##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
current_types: &[DataType],
signature: &Signature,
) -> Result<Vec<DataType>> {
+ let valid_types = get_valid_types(signature, current_types)?;
+
+ if valid_types.contains(¤t_types.to_owned()) {
Review comment:
seems `¤t_types` isn't possible with `contains`, made it use `any` instead.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb closed pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9359:
URL: https://github.com/apache/arrow/pull/9359
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579863202
##########
File path: rust/datafusion/src/physical_plan/functions.rs
##########
@@ -71,6 +71,8 @@ pub enum Signature {
Exact(Vec<DataType>),
/// fixed number of arguments of arbitrary types
Any(usize),
+ /// One of a list of signatures
+ OneOf(Vec<Signature>),
Review comment:
Yes, I missed this but all good. This is actually better :D
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567415842
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
Yes, I think it is worth it to wait for the other PR to land and convert it to use the scalar function. Then we could also add the function alias `date_part` easily next to the `extract` syntax, which are both supported by PostgreSQL.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (a23e4c9) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **increase** coverage by `0.01%`.
> The diff coverage is `86.95%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
+ Coverage 82.27% 82.28% +0.01%
==========================================
Files 244 244
Lines 55371 55431 +60
==========================================
+ Hits 45554 45612 +58
- Misses 9817 9819 +2
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
| [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.83% <66.66%> (-0.35%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `77.70% <100.00%> (+1.39%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.61% <100.00%> (+0.08%)` | :arrow_up: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...a23e4c9](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782668510
Integration test failure looks like https://issues.apache.org/jira/browse/ARROW-11717. I am retriggering it on this PR
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (31f8d28) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **decrease** coverage by `0.01%`.
> The diff coverage is `59.61%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
- Coverage 82.27% 82.25% -0.02%
==========================================
Files 244 244
Lines 55371 55420 +49
==========================================
+ Hits 45554 45586 +32
- Misses 9817 9834 +17
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
| [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `63.58% <37.93%> (-5.60%)` | :arrow_down: |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.17% <75.00%> (-0.07%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `76.63% <100.00%> (+0.32%)` | :arrow_up: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `52.45% <0.00%> (+0.81%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...af8792f](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567302576
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
hold my beer: #9376 :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782604533
@jorgecarleitao @alamb
This is now ready for review
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567411585
##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+ array::{
+ Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+ TimestampNanosecondArray, TimestampSecondArray,
+ },
+ compute::hour,
+ datatypes::{DataType, Schema, TimeUnit},
+ record_batch::RecordBatch,
+};
+
+use crate::{
+ error::DataFusionError,
+ logical_plan::DatePart,
+ physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+ }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+ date_part: DatePart,
+ expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+ /// Create new Extract expression
+ pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+ Self { date_part, expr }
+ }
+}
+
+impl PhysicalExpr for Extract {
+ fn as_any(&self) -> &dyn Any {
+ self
+ }
+
+ fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+ Ok(DataType::Int32)
+ }
+
+ fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+ self.expr.nullable(input_schema)
+ }
+
+ fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+ let value = self.expr.evaluate(batch)?;
+ let data_type = value.data_type();
+ let array = match value {
+ ColumnarValue::Array(array) => array,
+ ColumnarValue::Scalar(scalar) => scalar.to_array(),
+ };
+
+ match data_type {
+ DataType::Date32 => {
+ let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok(ColumnarValue::Array(Arc::new(hour(array)?)))
Review comment:
I find it confusing that `date_part` is passed all the way down in the Exprs / trees only to be ignored in the actual implementation which directly calls `hour`. I can see that the expr seems to always be made with `DatePart::Hour` but I am not 100% sure.
I am fine with not supporting all the various date parts initially, but I would recommend the following as a way of documenting through code / safe guard against future bugs:
1. Add a test for `EXTRACT DAY from timestamp` and show that it generates a useful error
2. Add a check in this function for `date_part != DataPart::Hour` and throw an error
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780820072
Most of it converted to use scalar functions now. There is some missing support to support multiple types for one argument but not for the other in scalar functions, will have a look at that later.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567414429
##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+ array::{
+ Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+ TimestampNanosecondArray, TimestampSecondArray,
+ },
+ compute::hour,
+ datatypes::{DataType, Schema, TimeUnit},
+ record_batch::RecordBatch,
+};
+
+use crate::{
+ error::DataFusionError,
+ logical_plan::DatePart,
+ physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+ }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+ date_part: DatePart,
+ expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+ /// Create new Extract expression
+ pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+ Self { date_part, expr }
+ }
+}
+
+impl PhysicalExpr for Extract {
+ fn as_any(&self) -> &dyn Any {
+ self
+ }
+
+ fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+ Ok(DataType::Int32)
+ }
+
+ fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+ self.expr.nullable(input_schema)
+ }
+
+ fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+ let value = self.expr.evaluate(batch)?;
+ let data_type = value.data_type();
+ let array = match value {
+ ColumnarValue::Array(array) => array,
+ ColumnarValue::Scalar(scalar) => scalar.to_array(),
+ };
+
+ match data_type {
+ DataType::Date32 => {
+ let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok(ColumnarValue::Array(Arc::new(hour(array)?)))
Review comment:
There is no other datepart for now in the enum, so I guess that might generate some clippy warnings. But we could match directly on the datepart. The warning should be generated elsewhere already (when building the logical plan), agree makes sense to add a test for that π.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579655411
##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
current_types: &[DataType],
signature: &Signature,
) -> Result<Vec<DataType>> {
+ let valid_types = get_valid_types(signature, current_types)?;
+
+ if valid_types.contains(¤t_types.to_owned()) {
Review comment:
Will have a look... I think it was auto generated by the new "extract function" functionality in rust-analyzer (which doesn't work 100% reliably, but still is very useful).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (dc67999) into [master](https://codecov.io/gh/apache/arrow/commit/f05b49bb08c0a4cc0cbfcfb07103dcf374c7fd38?el=desc) (f05b49b) will **decrease** coverage by `0.03%`.
> The diff coverage is `57.89%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
- Coverage 81.94% 81.91% -0.04%
==========================================
Files 231 233 +2
Lines 53374 53464 +90
==========================================
+ Hits 43739 43794 +55
- Misses 9635 9670 +35
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
| [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `90.00% <ΓΈ> (ΓΈ)` | |
| [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
| [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
| [...atafusion/src/physical\_plan/expressions/extract.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL2V4dHJhY3QucnM=) | `52.50% <52.50%> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
| ... and [3 more](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [f05b49b...dc67999](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (7ce35ab) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.05%`.
> The diff coverage is `65.11%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
- Coverage 81.98% 81.93% -0.06%
==========================================
Files 216 217 +1
Lines 53299 53426 +127
==========================================
+ Hits 43699 43773 +74
- Misses 9600 9653 +53
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
| [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
| [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `80.81% <68.96%> (-0.21%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `57.85% <0.00%> (-1.11%)` | :arrow_down: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
| ... and [1 more](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...0cad90f](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9359: ARROW-11426: [Rust][DataFusion] Start of EXTRACT support for DataFusion [WIP]
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769729658
https://issues.apache.org/jira/browse/ARROW-11426
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (a471f57) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.02%`.
> The diff coverage is `63.21%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
- Coverage 81.98% 81.96% -0.03%
==========================================
Files 216 217 +1
Lines 53299 53380 +81
==========================================
+ Hits 43699 43753 +54
- Misses 9600 9627 +27
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
| [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
| [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `80.71% <64.51%> (-0.31%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `60.55% <0.00%> (+1.59%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...5de73f6](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782835501
@Dandandan sadly this PR has merge conflicts (probably from https://github.com/apache/arrow/pull/9509) -- can you possibly rebase it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
> Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (ca771de) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **increase** coverage by `0.01%`.
> The diff coverage is `87.50%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9359 +/- ##
==========================================
+ Coverage 82.27% 82.28% +0.01%
==========================================
Files 244 244
Lines 55371 55426 +55
==========================================
+ Hits 45554 45609 +55
Misses 9817 9817
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ | |
|---|---|---|
| [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
| [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
| [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.45% <62.50%> (-0.72%)` | :arrow_down: |
| [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
| [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `77.70% <100.00%> (+1.39%)` | :arrow_up: |
| [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.61% <100.00%> (+0.08%)` | :arrow_up: |
| [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.05% <0.00%> (+0.19%)` | :arrow_up: |
| [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Ξ = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...ca771de](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579652526
##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
current_types: &[DataType],
signature: &Signature,
) -> Result<Vec<DataType>> {
+ let valid_types = get_valid_types(signature, current_types)?;
+
+ if valid_types.contains(¤t_types.to_owned()) {
Review comment:
Why can't this be `¤t_types` (aka why does it need a call to `to_owned` just to immediately borrow from it?)
##########
File path: rust/datafusion/src/physical_plan/functions.rs
##########
@@ -71,6 +71,8 @@ pub enum Signature {
Exact(Vec<DataType>),
/// fixed number of arguments of arbitrary types
Any(usize),
+ /// One of a list of signatures
+ OneOf(Vec<Signature>),
Review comment:
FYI @seddonm1 I am not sure how this affects your string functions / other postgres function plans
##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -2134,6 +2134,24 @@ async fn crypto_expressions() -> Result<()> {
Ok(())
}
+#[tokio::test]
+async fn extract_date_part() -> Result<()> {
+ let mut ctx = ExecutionContext::new();
+ let sql = "SELECT
Review comment:
π
##########
File path: rust/datafusion/src/physical_plan/datetime_expressions.rs
##########
@@ -344,6 +347,98 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
})
}
+macro_rules! extract_date_part {
+ ($ARRAY: expr, $FN:expr) => {
+ match $ARRAY.data_type() {
+ DataType::Date32 => {
+ let array = $ARRAY.as_any().downcast_ref::<Date32Array>().unwrap();
+ Ok($FN(array)?)
+ }
+ DataType::Date64 => {
+ let array = $ARRAY.as_any().downcast_ref::<Date64Array>().unwrap();
+ Ok($FN(array)?)
+ }
+ DataType::Timestamp(time_unit, None) => match time_unit {
+ TimeUnit::Second => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampSecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Millisecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampMillisecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Microsecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampMicrosecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ TimeUnit::Nanosecond => {
+ let array = $ARRAY
+ .as_any()
+ .downcast_ref::<TimestampNanosecondArray>()
+ .unwrap();
+ Ok($FN(array)?)
+ }
+ },
+ datatype => Err(DataFusionError::Internal(format!(
+ "Extract does not support datatype {:?}",
+ datatype
+ ))),
+ }
+ };
+}
+
+/// DATE_PART SQL function
+pub fn date_part(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+ if args.len() != 2 {
+ return Err(DataFusionError::Execution(
+ "Expected two arguments in DATE_PART".to_string(),
+ ));
+ }
+ let (date_part, array) = (&args[0], &args[1]);
+
+ let date_part = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) = date_part {
+ v
+ } else {
+ return Err(DataFusionError::Execution(
+ "First argument of `DATE_PART` must be non-null scalar Utf8".to_string(),
+ ));
+ };
+
+ let is_scalar = matches!(array, ColumnarValue::Scalar(_));
+
+ let array = match array {
Review comment:
I assume the longer term plan will be to handle the `Scalar` case more efficiently. This (converting to an array) is fine for now I think
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r566951425
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
},
/// Represents a reference to all fields in a schema.
Wildcard,
+ /// Extract date parts (day, hour, minute) from a date / time expression
+ Extract {
Review comment:
Alternatively, this could use `ScalarFunction` e.g. using "date_part" (which is also supported by PostgreSQL) to avoid an extra enum option, and convert the extract sql syntax to this `ScalarFunction`. I am not sure which is better?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567235809
##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -237,6 +245,25 @@ impl Expr {
Expr::Wildcard => Err(DataFusionError::Internal(
"Wildcard expressions are not valid in a logical query plan".to_owned(),
)),
+ Expr::Extract {
+ date_part: DatePart::Hour,
+ expr,
+ } => {
+ let inner = expr.get_type(schema)?;
+ match inner {
+ DataType::Date32(_)
+ | DataType::Date64(_)
+ | DataType::Time32(_)
+ | DataType::Time64(_)
+ | DataType::Timestamp(arrow::datatypes::TimeUnit::Nanosecond, None) => {
Review comment:
I wasn't convinced yet the `hour` kernel was working correctly, and it's currently a bit hard to test from DataFusion. But I am convinced now that it should work as expected. I think it can use some more tests at the Arrow side as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org