You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/29 10:47:19 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9359: ARROW-11426: [Rust][DataFusion] Start of EXTRACT support for DataFusion [WIP]

Dandandan opened a new pull request #9359:
URL: https://github.com/apache/arrow/pull/9359


   This PR starts implementing support for the expression, to retrieve date parts (hours, minutes, days, etc.) from temporal data types:
   
   `EXTRACT (HOUR from dt)` 
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782847077


   Integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 and is related to this RP


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769918230


   This is ready for review now.  @alamb @nevi-me tagging you, because I think you would be interested in more temporal support.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567414429



##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+    array::{
+        Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+        TimestampNanosecondArray, TimestampSecondArray,
+    },
+    compute::hour,
+    datatypes::{DataType, Schema, TimeUnit},
+    record_batch::RecordBatch,
+};
+
+use crate::{
+    error::DataFusionError,
+    logical_plan::DatePart,
+    physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+    }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+    date_part: DatePart,
+    expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+    /// Create new Extract expression
+    pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self { date_part, expr }
+    }
+}
+
+impl PhysicalExpr for Extract {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(DataType::Int32)
+    }
+
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        self.expr.nullable(input_schema)
+    }
+
+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        let value = self.expr.evaluate(batch)?;
+        let data_type = value.data_type();
+        let array = match value {
+            ColumnarValue::Array(array) => array,
+            ColumnarValue::Scalar(scalar) => scalar.to_array(),
+        };
+
+        match data_type {
+            DataType::Date32 => {
+                let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok(ColumnarValue::Array(Arc::new(hour(array)?)))

Review comment:
       There is no other datepart for now in the enum, so I guess that might generate some clippy warnings. But we could match directly on the datepart. The errorshould be generated elsewhere already (when building the logical plan), agree makes sense to add a test for that πŸ‘.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782839678


   @alamb conflict solved :+1: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567304269



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       🍺 awesome! 😎




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567411606



##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -1964,6 +1964,20 @@ async fn crypto_expressions() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn extract_date_part() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    let sql = "SELECT
+        EXTRACT(HOUR FROM CAST('2020-01-01' AS DATE)),
+        EXTRACT(HOUR FROM to_timestamp('2020-09-08T12:00:00+00:00'))

Review comment:
       I think adding coverage for the other date parts would be valuable here (even if they error)

##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I also agree (belatedly) that having a more efficient implementation of constant function arguments (e.g. #9376) is the way to go!
   
   In terms of adding variants to `Expr` I think it will need be done when the semantics of whatever expression is being added can't realistically be expressed as a function (e.g. CASE). 
   
   So in this case, given @jorgecarleitao is cranking along with #9376 it seems like perhaps this PR should perhaps try and translate `EXTRACT` into a function.

##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+    array::{
+        Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+        TimestampNanosecondArray, TimestampSecondArray,
+    },
+    compute::hour,
+    datatypes::{DataType, Schema, TimeUnit},
+    record_batch::RecordBatch,
+};
+
+use crate::{
+    error::DataFusionError,
+    logical_plan::DatePart,
+    physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+    }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+    date_part: DatePart,
+    expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+    /// Create new Extract expression
+    pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self { date_part, expr }
+    }
+}
+
+impl PhysicalExpr for Extract {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(DataType::Int32)
+    }
+
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        self.expr.nullable(input_schema)
+    }
+
+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        let value = self.expr.evaluate(batch)?;
+        let data_type = value.data_type();
+        let array = match value {
+            ColumnarValue::Array(array) => array,
+            ColumnarValue::Scalar(scalar) => scalar.to_array(),
+        };
+
+        match data_type {
+            DataType::Date32 => {
+                let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok(ColumnarValue::Array(Arc::new(hour(array)?)))

Review comment:
       I find it confusing that `date_part` is passed all the way down in the Exprs / trees only to be ignored in the actual implementation which directly calls `hour. I can see that the expr seems to always be made with `DatePart::Hour` but I am not 100% sure. 
   
   I am fine with not supporting all the various date parts initially, but I would recommend the following as a way of documenting through code / safe guard against future bugs:
   
   1. Add a test for `EXTRACT DAY from timestamp` and show that it generates a useful error
   2. Add a check in this function for `date_part != DataPart::Hour` and throw an error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support [WIP]

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (dbbb647) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.02%`.
   > The diff coverage is `63.01%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   - Coverage   81.98%   81.96%   -0.03%     
   ==========================================
     Files         216      217       +1     
     Lines       53299    53366      +67     
   ==========================================
   + Hits        43699    43743      +44     
   - Misses       9600     9623      +23     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `0.00% <0.00%> (ΓΈ)` | |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
   | [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.68% <54.16%> (-1.27%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `81.03% <82.35%> (+0.01%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `60.15% <0.00%> (+1.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...dbbb647](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579655053



##########
File path: rust/datafusion/src/physical_plan/datetime_expressions.rs
##########
@@ -344,6 +347,98 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
     })
 }
 
+macro_rules! extract_date_part {
+    ($ARRAY: expr, $FN:expr) => {
+        match $ARRAY.data_type() {
+            DataType::Date32 => {
+                let array = $ARRAY.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok($FN(array)?)
+            }
+            DataType::Date64 => {
+                let array = $ARRAY.as_any().downcast_ref::<Date64Array>().unwrap();
+                Ok($FN(array)?)
+            }
+            DataType::Timestamp(time_unit, None) => match time_unit {
+                TimeUnit::Second => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampSecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Millisecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampMillisecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Microsecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampMicrosecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Nanosecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampNanosecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+            },
+            datatype => Err(DataFusionError::Internal(format!(
+                "Extract does not support datatype {:?}",
+                datatype
+            ))),
+        }
+    };
+}
+
+/// DATE_PART SQL function
+pub fn date_part(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Execution(
+            "Expected two arguments in DATE_PART".to_string(),
+        ));
+    }
+    let (date_part, array) = (&args[0], &args[1]);
+
+    let date_part = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) = date_part {
+        v
+    } else {
+        return Err(DataFusionError::Execution(
+            "First argument of `DATE_PART` must be non-null scalar Utf8".to_string(),
+        ));
+    };
+
+    let is_scalar = matches!(array, ColumnarValue::Scalar(_));
+
+    let array = match array {

Review comment:
       Yes, indeed. For now we can use this approach to avoid reimplementing hours/years etc, with a bit of overhead.
   Maybe longer term would be nice to have something like `Datum` in Arrow in order to both gain some performance and avoid reimplementing things for the scalar case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (40e184b) into [master](https://codecov.io/gh/apache/arrow/commit/aebabca047a8adeb9f6d4fc81e29d12cc42e629b?el=desc) (aebabca) will **increase** coverage by `0.01%`.
   > The diff coverage is `87.14%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   + Coverage   82.27%   82.29%   +0.01%     
   ==========================================
     Files         244      244              
     Lines       55555    55616      +61     
   ==========================================
   + Hits        45708    45767      +59     
   - Misses       9847     9849       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `81.13% <50.00%> (ΓΈ)` | |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.83% <66.66%> (-0.35%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `73.82% <100.00%> (+1.46%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.62% <100.00%> (+0.09%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [aebabca...40e184b](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780474154


   Thanks @jorgecarleitao . Not blocked ATM, this was waiting before on the scalar changes. Now that's done I hope to continue with this soon!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782847077


   Integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 and is related to this PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I just looked into this. The downside currently is w.r.t. performance and being able to utilize Arrow kernels. The `ScalarFunction` repeats scalar values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` will be for repeated for each row.
   In PostgreSQL, expressions are not allowed for `date_part` / `extract`, `date_trunc` etc. :
   
   https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
   
   > Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract. 
   
   This is also happening with the existing `date_trunc` function where currently the date part (as string) is repeated / matched against for each row and also evaluated per row (see below). That won't work with the `hour` kernel for obvious reasons.
   
   ```rust
       let result = range
           .map(|i| {
               if array.is_null(i) {
                   Ok(0_i64)
               } else {
                   let date_time = match granularity_array.value(i) {
                       "second" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0)),
                       "minute" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0))
                           .and_then(|d| d.with_second(0)),
   [...]
   ```
   So I think here we have a few options:
   
   * Refactor/Optimize `ScalarFunction` to also allow for scalar values, be able to check on them + support literals (I guess it should use `ColumnarValue` instead of just `Array`s).
   * Have a similar (inefficient) implementation for extract from / date_part to compute as currently `date_trunc`. I think that 
   * Refactor/Optimize `ScalarFunction` later and keep the Extract as is for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567202684



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -237,6 +245,25 @@ impl Expr {
             Expr::Wildcard => Err(DataFusionError::Internal(
                 "Wildcard expressions are not valid in a logical query plan".to_owned(),
             )),
+            Expr::Extract {
+                date_part: DatePart::Hour,
+                expr,
+            } => {
+                let inner = expr.get_type(schema)?;
+                match inner {
+                    DataType::Date32(_)
+                    | DataType::Date64(_)
+                    | DataType::Time32(_)
+                    | DataType::Time64(_)
+                    | DataType::Timestamp(arrow::datatypes::TimeUnit::Nanosecond, None) => {

Review comment:
       Is there a particular reason to only support `Timestamp(Nanosecond)`? `Millisecond,Second,Microsecond` should also work, no?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780266711


   Are you blocked here, @Dandandan ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567253020



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I just looked into this. The downside currently is w.r.t. performance and being able to utilize Arrow kernels. The `ScalarFunction` implementation repeats scalar values, the date part, e.g. 'HOUR' for `date_part('HOUR', dt)` will be for repeated for each row.
   In PostgreSQL, expressions are not allowed for `date_part` / `extract`, `date_trunc` etc. :
   
   https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
   
   > Note that here the field parameter needs to be a string value, not a name. The valid field names for date_part are the same as for extract. 
   
   This is also happening with the existing `date_trunc` function where currently the date part (as string) is repeated / matched against for each row and also evaluated per row (see below). That won't work with the `hour` kernel for obvious reasons.
   
   ```rust
       let result = range
           .map(|i| {
               if array.is_null(i) {
                   Ok(0_i64)
               } else {
                   let date_time = match granularity_array.value(i) {
                       "second" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0)),
                       "minute" => array
                           .value_as_datetime(i)
                           .and_then(|d| d.with_nanosecond(0))
                           .and_then(|d| d.with_second(0)),
   [...]
   ```
   So I think here we have a few options:
   
   * Refactor/Optimize `ScalarFunction` to also allow for scalar values, be able to check on them + support literals (I guess it should use `ColumnarValue` instead of just `Array`s).
   * Have a similar (inefficient) implementation for extract from / date_part to compute as currently `date_trunc`. I think that 
   * Refactor/Optimize `ScalarFunction` later and keep the Extract as is for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567202041



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       I have been trying to move them to the `ScalarFunction` to avoid having many variants around, mostly because adding a new item to an enum is backward incompatible and thus it may be beneficial to reserve that for operations that cannot be described by `ScalarFunction`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782672779


   Thanks @Dandandan  -- I am about to run out of time for today but I will plan to merge this in tomorrow if someone doesn't beat me to it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579656523



##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
     current_types: &[DataType],
     signature: &Signature,
 ) -> Result<Vec<DataType>> {
+    let valid_types = get_valid_types(signature, current_types)?;
+
+    if valid_types.contains(&current_types.to_owned()) {

Review comment:
       seems `&current_types` isn't possible with `contains`, made it use `any`  instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9359:
URL: https://github.com/apache/arrow/pull/9359


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579863202



##########
File path: rust/datafusion/src/physical_plan/functions.rs
##########
@@ -71,6 +71,8 @@ pub enum Signature {
     Exact(Vec<DataType>),
     /// fixed number of arguments of arbitrary types
     Any(usize),
+    /// One of a list of signatures
+    OneOf(Vec<Signature>),

Review comment:
       Yes, I missed this but all good. This is actually better :D




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567415842



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       Yes, I think it is worth it to wait for the other PR to land and convert it to use the scalar function. Then we could also add the function alias `date_part` easily next to the `extract` syntax, which are both supported by PostgreSQL.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (a23e4c9) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **increase** coverage by `0.01%`.
   > The diff coverage is `86.95%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   + Coverage   82.27%   82.28%   +0.01%     
   ==========================================
     Files         244      244              
     Lines       55371    55431      +60     
   ==========================================
   + Hits        45554    45612      +58     
   - Misses       9817     9819       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.83% <66.66%> (-0.35%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `77.70% <100.00%> (+1.39%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.61% <100.00%> (+0.08%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...a23e4c9](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782668510


   Integration test failure looks like https://issues.apache.org/jira/browse/ARROW-11717. I am retriggering it on this PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (31f8d28) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **decrease** coverage by `0.01%`.
   > The diff coverage is `59.61%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   - Coverage   82.27%   82.25%   -0.02%     
   ==========================================
     Files         244      244              
     Lines       55371    55420      +49     
   ==========================================
   + Hits        45554    45586      +32     
   - Misses       9817     9834      +17     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `63.58% <37.93%> (-5.60%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.17% <75.00%> (-0.07%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `76.63% <100.00%> (+0.32%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `52.45% <0.00%> (+0.81%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...af8792f](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567302576



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       hold my beer: #9376 :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782604533


   @jorgecarleitao @alamb 
   
   This is now ready for review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567411585



##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+    array::{
+        Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+        TimestampNanosecondArray, TimestampSecondArray,
+    },
+    compute::hour,
+    datatypes::{DataType, Schema, TimeUnit},
+    record_batch::RecordBatch,
+};
+
+use crate::{
+    error::DataFusionError,
+    logical_plan::DatePart,
+    physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+    }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+    date_part: DatePart,
+    expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+    /// Create new Extract expression
+    pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self { date_part, expr }
+    }
+}
+
+impl PhysicalExpr for Extract {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(DataType::Int32)
+    }
+
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        self.expr.nullable(input_schema)
+    }
+
+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        let value = self.expr.evaluate(batch)?;
+        let data_type = value.data_type();
+        let array = match value {
+            ColumnarValue::Array(array) => array,
+            ColumnarValue::Scalar(scalar) => scalar.to_array(),
+        };
+
+        match data_type {
+            DataType::Date32 => {
+                let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok(ColumnarValue::Array(Arc::new(hour(array)?)))

Review comment:
       I find it confusing that `date_part` is passed all the way down in the Exprs / trees only to be ignored in the actual implementation which directly calls `hour`. I can see that the expr seems to always be made with `DatePart::Hour` but I am not 100% sure. 
   
   I am fine with not supporting all the various date parts initially, but I would recommend the following as a way of documenting through code / safe guard against future bugs:
   
   1. Add a test for `EXTRACT DAY from timestamp` and show that it generates a useful error
   2. Add a check in this function for `date_part != DataPart::Hour` and throw an error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-780820072


   Most of it converted to use scalar functions now. There is some missing support to support multiple types for one argument but not for the other in scalar functions, will have a look at that later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567414429



##########
File path: rust/datafusion/src/physical_plan/expressions/extract.rs
##########
@@ -0,0 +1,115 @@
+use crate::error::Result;
+use core::fmt;
+use std::{any::Any, sync::Arc};
+
+use arrow::{
+    array::{
+        Date32Array, Date64Array, TimestampMicrosecondArray, TimestampMillisecondArray,
+        TimestampNanosecondArray, TimestampSecondArray,
+    },
+    compute::hour,
+    datatypes::{DataType, Schema, TimeUnit},
+    record_batch::RecordBatch,
+};
+
+use crate::{
+    error::DataFusionError,
+    logical_plan::DatePart,
+    physical_plan::{ColumnarValue, PhysicalExpr},
+};
+
+impl fmt::Display for Extract {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "EXTRACT({} AS {:?})", self.date_part, self.expr)
+    }
+}
+
+/// Extract
+#[derive(Debug)]
+pub struct Extract {
+    date_part: DatePart,
+    expr: Arc<dyn PhysicalExpr>,
+}
+
+impl Extract {
+    /// Create new Extract expression
+    pub fn new(date_part: DatePart, expr: Arc<dyn PhysicalExpr>) -> Self {
+        Self { date_part, expr }
+    }
+}
+
+impl PhysicalExpr for Extract {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(DataType::Int32)
+    }
+
+    fn nullable(&self, input_schema: &Schema) -> Result<bool> {
+        self.expr.nullable(input_schema)
+    }
+
+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        let value = self.expr.evaluate(batch)?;
+        let data_type = value.data_type();
+        let array = match value {
+            ColumnarValue::Array(array) => array,
+            ColumnarValue::Scalar(scalar) => scalar.to_array(),
+        };
+
+        match data_type {
+            DataType::Date32 => {
+                let array = array.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok(ColumnarValue::Array(Arc::new(hour(array)?)))

Review comment:
       There is no other datepart for now in the enum, so I guess that might generate some clippy warnings. But we could match directly on the datepart. The warning should be generated elsewhere already (when building the logical plan), agree makes sense to add a test for that πŸ‘.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579655411



##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
     current_types: &[DataType],
     signature: &Signature,
 ) -> Result<Vec<DataType>> {
+    let valid_types = get_valid_types(signature, current_types)?;
+
+    if valid_types.contains(&current_types.to_owned()) {

Review comment:
       Will have a look... I think it was auto generated by the new "extract function" functionality in rust-analyzer (which doesn't work 100% reliably, but still is very useful).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (dc67999) into [master](https://codecov.io/gh/apache/arrow/commit/f05b49bb08c0a4cc0cbfcfb07103dcf374c7fd38?el=desc) (f05b49b) will **decrease** coverage by `0.03%`.
   > The diff coverage is `57.89%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   - Coverage   81.94%   81.91%   -0.04%     
   ==========================================
     Files         231      233       +2     
     Lines       53374    53464      +90     
   ==========================================
   + Hits        43739    43794      +55     
   - Misses       9635     9670      +35     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
   | [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `90.00% <ΓΈ> (ΓΈ)` | |
   | [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
   | [...atafusion/src/physical\_plan/expressions/extract.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL2V4dHJhY3QucnM=) | `52.50% <52.50%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | ... and [3 more](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [f05b49b...dc67999](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (7ce35ab) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.05%`.
   > The diff coverage is `65.11%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   - Coverage   81.98%   81.93%   -0.06%     
   ==========================================
     Files         216      217       +1     
     Lines       53299    53426     +127     
   ==========================================
   + Hits        43699    43773      +74     
   - Misses       9600     9653      +53     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
   | [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `80.81% <68.96%> (-0.21%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `57.85% <0.00%> (-1.11%)` | :arrow_down: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...0cad90f](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9359: ARROW-11426: [Rust][DataFusion] Start of EXTRACT support for DataFusion [WIP]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769729658


   https://issues.apache.org/jira/browse/ARROW-11426


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (a471f57) into [master](https://codecov.io/gh/apache/arrow/commit/71dab27098775228340ecb4e8a15870967d7cf91?el=desc) (71dab27) will **decrease** coverage by `0.02%`.
   > The diff coverage is `63.21%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   - Coverage   81.98%   81.96%   -0.03%     
   ==========================================
     Files         216      217       +1     
     Lines       53299    53380      +81     
   ==========================================
   + Hits        43699    43753      +54     
   - Misses       9600     9627      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `48.26% <0.00%> (-0.76%)` | :arrow_down: |
   | [rust/datafusion/src/sql/utils.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvdXRpbHMucnM=) | `38.88% <0.00%> (-0.55%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/temporal.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vdGVtcG9yYWwucnM=) | `50.00% <50.00%> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `78.91% <58.33%> (-1.05%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `80.71% <64.51%> (-0.31%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `78.61% <75.00%> (-0.03%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.02% <75.00%> (-0.13%)` | :arrow_down: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `60.55% <0.00%> (+1.59%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [71dab27...5de73f6](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-782835501


   @Dandandan  sadly this PR has merge conflicts (probably from https://github.com/apache/arrow/pull/9509) -- can you possibly rebase it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#issuecomment-769847991


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=h1) Report
   > Merging [#9359](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=desc) (ca771de) into [master](https://codecov.io/gh/apache/arrow/commit/c100a5cca357f83d5bad21e9ba2d7fb9eb15a0d1?el=desc) (c100a5c) will **increase** coverage by `0.01%`.
   > The diff coverage is `87.50%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9359/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9359      +/-   ##
   ==========================================
   + Coverage   82.27%   82.28%   +0.01%     
   ==========================================
     Files         244      244              
     Lines       55371    55426      +55     
   ==========================================
   + Hits        45554    45609      +55     
     Misses       9817     9817              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=tree) | Coverage Ξ” | |
   |---|---|---|
   | [...st/datafusion/src/physical\_plan/expressions/mod.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zL21vZC5ycw==) | `71.42% <ΓΈ> (ΓΈ)` | |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.75% <50.00%> (ΓΈ)` | |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `68.45% <62.50%> (-0.72%)` | :arrow_down: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.22% <80.00%> (-0.02%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/functions.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2Z1bmN0aW9ucy5ycw==) | `77.70% <100.00%> (+1.39%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/type\_coercion.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3R5cGVfY29lcmNpb24ucnM=) | `98.61% <100.00%> (+0.08%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.92% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.05% <0.00%> (+0.19%)` | :arrow_up: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9359/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `53.27% <0.00%> (+1.63%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Ξ” = absolute <relative> (impact)`, `ΓΈ = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=footer). Last update [5647e90...ca771de](https://codecov.io/gh/apache/arrow/pull/9359?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r579652526



##########
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##########
@@ -68,6 +68,29 @@ pub fn data_types(
     current_types: &[DataType],
     signature: &Signature,
 ) -> Result<Vec<DataType>> {
+    let valid_types = get_valid_types(signature, current_types)?;
+
+    if valid_types.contains(&current_types.to_owned()) {

Review comment:
       Why can't this be `&current_types` (aka why does it need a call to `to_owned` just to immediately borrow from it?)

##########
File path: rust/datafusion/src/physical_plan/functions.rs
##########
@@ -71,6 +71,8 @@ pub enum Signature {
     Exact(Vec<DataType>),
     /// fixed number of arguments of arbitrary types
     Any(usize),
+    /// One of a list of signatures
+    OneOf(Vec<Signature>),

Review comment:
       FYI @seddonm1  I am not sure how this affects your string functions / other postgres function plans

##########
File path: rust/datafusion/tests/sql.rs
##########
@@ -2134,6 +2134,24 @@ async fn crypto_expressions() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn extract_date_part() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    let sql = "SELECT

Review comment:
       πŸ‘ 

##########
File path: rust/datafusion/src/physical_plan/datetime_expressions.rs
##########
@@ -344,6 +347,98 @@ pub fn date_trunc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
     })
 }
 
+macro_rules! extract_date_part {
+    ($ARRAY: expr, $FN:expr) => {
+        match $ARRAY.data_type() {
+            DataType::Date32 => {
+                let array = $ARRAY.as_any().downcast_ref::<Date32Array>().unwrap();
+                Ok($FN(array)?)
+            }
+            DataType::Date64 => {
+                let array = $ARRAY.as_any().downcast_ref::<Date64Array>().unwrap();
+                Ok($FN(array)?)
+            }
+            DataType::Timestamp(time_unit, None) => match time_unit {
+                TimeUnit::Second => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampSecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Millisecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampMillisecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Microsecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampMicrosecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+                TimeUnit::Nanosecond => {
+                    let array = $ARRAY
+                        .as_any()
+                        .downcast_ref::<TimestampNanosecondArray>()
+                        .unwrap();
+                    Ok($FN(array)?)
+                }
+            },
+            datatype => Err(DataFusionError::Internal(format!(
+                "Extract does not support datatype {:?}",
+                datatype
+            ))),
+        }
+    };
+}
+
+/// DATE_PART SQL function
+pub fn date_part(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    if args.len() != 2 {
+        return Err(DataFusionError::Execution(
+            "Expected two arguments in DATE_PART".to_string(),
+        ));
+    }
+    let (date_part, array) = (&args[0], &args[1]);
+
+    let date_part = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) = date_part {
+        v
+    } else {
+        return Err(DataFusionError::Execution(
+            "First argument of `DATE_PART` must be non-null scalar Utf8".to_string(),
+        ));
+    };
+
+    let is_scalar = matches!(array, ColumnarValue::Scalar(_));
+
+    let array = match array {

Review comment:
       I assume the longer term plan will be to handle the `Scalar` case more efficiently. This (converting to an array) is fine for now I think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r566951425



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -169,6 +170,13 @@ pub enum Expr {
     },
     /// Represents a reference to all fields in a schema.
     Wildcard,
+    /// Extract date parts (day, hour, minute) from a date / time expression
+    Extract {

Review comment:
       Alternatively, this could use `ScalarFunction` e.g. using "date_part" (which is also supported by PostgreSQL) to avoid an extra enum option, and convert the extract sql syntax to this `ScalarFunction`. I am not sure which is better?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9359: ARROW-11426: [Rust][DataFusion] EXTRACT support

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9359:
URL: https://github.com/apache/arrow/pull/9359#discussion_r567235809



##########
File path: rust/datafusion/src/logical_plan/expr.rs
##########
@@ -237,6 +245,25 @@ impl Expr {
             Expr::Wildcard => Err(DataFusionError::Internal(
                 "Wildcard expressions are not valid in a logical query plan".to_owned(),
             )),
+            Expr::Extract {
+                date_part: DatePart::Hour,
+                expr,
+            } => {
+                let inner = expr.get_type(schema)?;
+                match inner {
+                    DataType::Date32(_)
+                    | DataType::Date64(_)
+                    | DataType::Time32(_)
+                    | DataType::Time64(_)
+                    | DataType::Timestamp(arrow::datatypes::TimeUnit::Nanosecond, None) => {

Review comment:
       I wasn't convinced yet the `hour` kernel was working correctly, and it's currently a bit hard to test from DataFusion. But I am convinced now that it should work as expected. I think it can use some more tests at the Arrow side as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org