You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "BubbaJoe (via GitHub)" <gi...@apache.org> on 2023/03/18 02:05:43 UTC

[GitHub] [arrow-datafusion] BubbaJoe opened a new issue, #5635: Unable to "GROUP BY" udf result

BubbaJoe opened a new issue, #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635

   **Describe the bug**
   When i try to add a udf (`to_date`), I am able to do the following:
   - SELECT to_date(\"updatedAt\") as u from 'table';
   - SELECT DISTINCT to_date(\"updatedAt\") as u from 'table';
   - SELECT to_date(\"updatedAt\") as u from 'table' where to_date(\"updatedAt\") != '2023-03-04';
   
   But I am _NOT_ able to do the following
   - `SELECT to_date(\"updatedAt\") as u from 'table' group by u;`
   - `SELECT to_date(\"updatedAt\") as u from 'table' group by to_date(\"updatedAt\");`
     - Error log: `Join Error\ncaused by\nExternal error: task 554 panicked`: <https://pastebin.com/AcxWjh9Y>
   
   **To Reproduce**
   example updatedAt looks like: "2023-03-05T05:01:15.274000+00:00"
   - `SELECT to_date(\"updatedAt\") as u from 'table' group by u;`
   
   `to_date` udf using `chrono` crate
   
   ```
   pub fn to_date(args: &[ArrayRef]) -> DFResult<ArrayRef> {
       if args.is_empty() || args.len() > 1 {
           return Err(DataFusionError::Internal(format!(
               "to_date was called with {} arguments. It requires only 1.",
               args.len()
           )));
       }
       let arg = &args[0].as_any().downcast_ref::<StringArray>().unwrap();
       let mut builder = Date32Builder::new();
       let date_string: &str = arg.value(0);
       let date_time = match DateTime::parse_from_rfc3339(date_string) {
           Ok(dt) => dt,
           Err(e) => {
               return Result::Err(DataFusionError::Internal(e.to_string()));
           }
       };
       builder.append_value((date_time.timestamp() / 86400) as i32);
       Ok(Arc::new(builder.finish()))
   }
   ```
   
   **Expected behavior**
   Expecting it to group the dates like this:
     SELECT u FROM (SELECT to_date(\"updatedAt\") as u from 'table') group by u;
   
   **Additional context**
   Another problem/issue i am having is:
   - `SELECT to_date(\"updatedAt\") as u from 'table' where u != '2023-03-04';`
     - `Schema error: No field named 'u'. Valid fields are 'table'.'field', ...`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1477217345

   @BubbaJoe  i still working on your "getting an error when calling record_batches_to_json_rows on the queries:"
   
   something  so weird.
   ```
   
       let sql = "SELECT to_timestamp(a) as u from t";            //fine
       // let sql = "SELECT to_date(a) as u from t";                   //fine
       // let sql = "SELECT to_date(a) as u from t group by u";        //got error
       // let sql = "SELECT to_timestamp(a) as u from t group by u";   //got error
   
       let batches = ctx.sql(sql).await?.collect().await.unwrap();
       let list: Vec<_> = record_batches_to_json_rows(&batches)?;    //got error here
   
   ```
   
   also i write a testcase for record_batches_to_json_rows,look like everything works fine.   
   ```
   #[cfg(test)]
   mod tests {
       use arrow::array::Date32Builder;
       use arrow::json::writer::record_batches_to_json_rows;
       use arrow::json::LineDelimitedWriter;
       use arrow::record_batch::RecordBatch;
       use arrow_schema::{DataType, Field, Schema};
       use chrono::DateTime;
       use serde_json::Value;
       use std::sync::Arc;
   
       #[test]
       fn work_well() {
           let mut builder = Date32Builder::new();
   
           builder.append_value(
               i32::try_from(
                   DateTime::parse_from_rfc3339("2021-02-04T05:01:14.274000+00:00")
                       .unwrap()
                       .timestamp()
                       / (60 * 60 * 24),
               )
               .unwrap(),
           );
           builder.append_value(
               i32::try_from(
                   DateTime::parse_from_rfc3339("2023-03-05T05:01:14.274000+00:00")
                       .unwrap()
                       .timestamp()
                       / (60 * 60 * 24),
               )
               .unwrap(),
           );
   
           let arr_date32 = builder.finish();
   
           let schema = Schema::new(vec![Field::new("date32", DataType::Date32, true)]);
           let schema = Arc::new(schema);
   
           let batch = RecordBatch::try_new(schema, vec![Arc::new(arr_date32)]).unwrap();
   
           let mut buf = Vec::new();
           {
               let mut writer = LineDelimitedWriter::new(&mut buf);
               writer.write_batches(&[batch.clone()]).unwrap();
           }
   
           let actual: Vec<Option<Value>> = buf
               .split(|b| *b == b'\n')
               .map(|s| (!s.is_empty()).then(|| serde_json::from_slice(s).unwrap()))
               .collect();
   
           println!("{:?}", actual);
   
           //everything works fine.
           let rows = record_batches_to_json_rows(&vec![batch]);
       }
   }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1476058729

   > Hmmm...or we should check the result size of udf? I'm not sure wether it's proper that the sizes of input and result could be different. cc @alamb @mingmwang @tustvold
   
   
   I think this is a great idea @jiangzhx  for a UDF (rather than a user defined aggregate, for example), the number of output rows should be the same as the number of input rows - if that is not the case making a clearer error would be 👍 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475087150

   > 
   
   @BubbaJoe 
   based on your code, I made some modifications. Now everything looks normal.
   
   ```
       pub fn to_date(args: &[ArrayRef]) -> Result<ArrayRef> {
           if args.is_empty() || args.len() > 1 {
               return Err(DataFusionError::Internal(format!(
                   "to_date was called with {} arguments. It requires only 1.",
                   args.len()
               )));
           }
           let arg = &args[0].as_any().downcast_ref::<StringArray>().unwrap();
           let mut builder = Date32Builder::new();
           for date_string in arg.iter() {
               let date_time = match DateTime::parse_from_rfc3339(
                   date_string.expect("date_string is null"),
               ) {
                   Ok(dt) => dt,
                   Err(e) => {
                       return Result::Err(DataFusionError::Internal(e.to_string()));
                   }
               };
               builder.append_value((date_time.timestamp() / 86400) as i32);
           }
           Ok(Arc::new(builder.finish()))
       }
   
       let to_date = make_scalar_function(to_date);
       let to_date = create_udf(
           "to_date",
           vec![DataType::Utf8],
           Arc::new(DataType::Date32),
           Volatility::Immutable,
           to_date,
       );
   
       ctx.register_udf(to_date.clone());
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1482199206

   Thanks @doki23 
   
   I am curious, why do the record batches look like this? when i used `select distinct to_date(...) from t ` or `select to_date(...) as x from t group by x`?
   
   ```
   batch 1: []
   batch 2: []
   batch 3: []
   batch 4: []
   batch 5: []
   batch 6: []
   batch 7: [2012-04-27]
   batch 8: [2012-04-23, 2012-04-24]
   batch 9: [2012-04-25]
   batch 10: [2012-04-26]
   ```
   
   You would expect them to look something similar to this right?
   
   ```
   batch 1: [2012-04-27]
   batch 2: [2012-04-23]
   batch 3: [2012-04-24]
   batch 4: [2012-04-25]
   batch 5: [2012-04-26]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1483719196

   I would like to test this this! When is the next release? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1482042404

   Sorry for the horrible description. Actually my suggestion is 
   '''sql
   select distinct to_date(...) from t 
   '''
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1480427900

   @alamb @jiangzhx I am getting the same error here, please check!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475092781

   Hmmm...or we should check the result size of udf? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1480427571

   `main.rs`
   ```
   use datafusion::error::Result;
   use datafusion::{
       arrow::{
           array::{
               ArrayRef,
               StringArray,
               Date32Builder,
               Array,
           },
           datatypes::DataType,
       },
       physical_plan::functions::make_scalar_function,
       logical_expr::Volatility,
       prelude::{
           create_udf,
           SessionContext,
           NdJsonReadOptions,
       },
       error::DataFusionError,
   };
   use std::sync::Arc;
   use chrono::DateTime;
   
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
   
       // let testdata = datafusion::test_util::parquet_test_data();
   
       // register parquet file with the execution context
       ctx.register_json(
           "users",
           &format!("./data.jsonl"),
           NdJsonReadOptions::default().
               file_extension(".jsonl"),
       ).await?;
   
       let to_date = make_scalar_function(to_date);
       let to_date = create_udf(
           "to_date",
           vec![DataType::Utf8],
           Arc::new(DataType::Date32),
           Volatility::Immutable,
           to_date,
       );
   
       ctx.register_udf(to_date.clone());
   
       let batches = {
           let df = ctx.sql(
               "SELECT to_date(\"date\") as a from \
               'users' group by a;").await?;
           df.collect().await?
       };
       let list: Vec<_> = datafusion::arrow::json::writer::
           record_batches_to_json_rows(&batches[..])?;
       
       println!("{:?}", list);
   
       Ok(())
   }
   
   fn to_date(args: &[ArrayRef]) -> Result<ArrayRef> {
       if args.is_empty() || args.len() > 1 {
           return Err(DataFusionError::Internal(format!(
               "to_date was called with {} arguments. It requires only 1.",
               args.len()
           )));
       }
       let arg = &args[0].as_any().downcast_ref::<StringArray>().unwrap();
       let mut builder = Date32Builder::new();
       for date_string in arg.iter() {
           let date_time = match DateTime::parse_from_rfc3339(
               date_string.expect("date_string is null"),
           ) {
               Ok(dt) => dt,
               Err(e) => {
                   // builder.append_value((date_time.timestamp() / 86400) as i32);
                   return Result::Err(DataFusionError::Internal(e.to_string()));
               }
           };
           builder.append_value((date_time.timestamp() / 86400) as i32);
       }
       Ok(Arc::new(builder.finish()))
   }
   ```
   `Cargo.toml`
   ```
   [package]
   name = "df-test"
   version = "0.1.0"
   edition = "2021"
   
   # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
   
   [dependencies]
   tokio = { version = "1.0", features = ["full"] }
   datafusion = { version = "20.0"}
   chrono = "0.4.24"
   ```
   `data.jsonl`
   ```
   {"date":"2012-04-23T18:25:43.511Z","id":0}
   {"date":"2012-04-23T18:25:43.511Z","id":1}
   {"date":"2012-04-23T18:25:43.511Z","id":2}
   {"date":"2012-04-23T18:25:43.511Z","id":3}
   {"date":"2012-04-23T18:25:43.511Z","id":4}
   {"date":"2012-04-23T18:25:43.511Z","id":5}
   {"date":"2012-04-23T18:25:43.511Z","id":6}
   {"date":"2012-04-23T18:25:43.511Z","id":7}
   {"date":"2012-04-23T18:25:43.511Z","id":8}
   {"date":"2012-04-23T18:25:43.511Z","id":9}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475294290

   Thanks @jiangzhx, the code worked for me, but I am still getting an error when calling `record_batches_to_json_rows` on the queries:
   - SELECT DISTINCT to_date(\"createdAt\") as u from 'table';
   - SELECT to_date(\"createdAt\") as u from 'table' group by u;
   
   Error is the same as the above error log:
   ```
   thread 'tokio-runtime-worker' panicked at 'Trying to access an element at index 0 from a PrimitiveArray of length 0', /Users/user/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-array-34.0.0/src/array/primitive_array.rs:337:9
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1484416537

   @jiangzhx Can you check this example code here?
   
   I am still getting the error: https://github.com/BubbaJoe/datafusion-arrow-test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1474855866

   look like date32 can't hash 
   
   https://github.com/apache/arrow-datafusion/blob/d3280d72438556b596c5c7653056d7c9ee6848e1/datafusion/physical-expr/src/hash_utils.rs#L96-L106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1484450061

   > @jiangzhx Can you check this example code here?
   > 
   > I am still getting the error: https://github.com/BubbaJoe/datafusion-arrow-test
   
   There are some steps that need to be done before you can test the master version of arrow-rs.
   1. Use the arrow-rs master version in arrow-datafusion.
   2. In your project, reference your arrow-datafusion.
   
   or ,you can try my arrow-datafusion repo.
   datafusion = { git="https://github.com/jiangzhx/arrow-datafusion/", branch="test_arrow_master", features = ["avro"]}
   
   **Do not use my arrow-datafusion  branch in production .
   It is recommended that you wait for the latest version of arrow-datafusion to be released .**
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1474866771

   Hey @jiangzhx thanks for your response! How hard is it to implement something like this?
   
   I am not too familiar with the rust, let alone the code base. If you could give some pointers, i wouldn't mind creating a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1482192500

   @BubbaJoe It's a bug of the json writer when batches have duration fields, I'm trying to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed issue #5635: Unable to "GROUP BY" udf result

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #5635: Unable to "GROUP BY" udf result
URL: https://github.com/apache/arrow-datafusion/issues/5635


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1482201429

   I have no idea, maybe it's the default partitions? I'll dive into this then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1484396525

   > I would like to test this this! When is the next release?
   @BubbaJoe 
   you can try with master branch with dependence like this.
   arrow = { git="https://github.com/apache/arrow-rs.git", branch="master" }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475086294

   cc @BubbaJoe 
   i wrote a to_date UDF myself, which seems to be correct.
   
   ```
   
   use arrow::array::{Date32Array, Date32Builder, StringArray};
   use chrono::DateTime;
   use datafusion::from_slice::FromSlice;
   use datafusion::prelude::*;
   use datafusion::{
       arrow::{datatypes::DataType, record_batch::RecordBatch},
       logical_expr::Volatility,
   };
   use datafusion_common::{downcast_value, ScalarValue};
   use datafusion_common::{DataFusionError, Result};
   use datafusion_expr::ColumnarValue;
   use std::sync::Arc;
   
   // create local execution context with an in-memory table
   fn create_context() -> Result<SessionContext> {
       use datafusion::arrow::datatypes::{Field, Schema};
       // define a schema.
       let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Utf8, false)]));
   
       // define data.
       let batch = RecordBatch::try_new(
           schema,
           vec![Arc::new(StringArray::from_slice([
               "2023-03-04T05:01:14.274000+00:00",
               "2023-03-05T05:02:15.274000+00:00",
               "2023-03-06T05:03:16.274000+00:00",
           ]))],
       )?;
   
       // declare a new context. In spark API, this corresponds to a new spark SQLsession
       let ctx = SessionContext::new();
   
       // declare a table in memory. In spark API, this corresponds to createDataFrame(...).
       ctx.register_batch("t", batch)?;
       Ok(ctx)
   }
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = create_context()?;
   
       pub fn to_date(args: &[ColumnarValue]) -> Result<ColumnarValue> {
           if args.is_empty() || args.len() > 1 {
               return Err(DataFusionError::Internal(format!(
                   "to_date was called with {} arguments. It requires only 1.",
                   args.len()
               )));
           }
   
           match &args[0] {
               ColumnarValue::Array(array) => {
                   let args = downcast_value!(array, StringArray);
                   let mut builder = Date32Builder::new();
   
                   for arg in args {
                       let date_time = match DateTime::parse_from_rfc3339(arg.unwrap()) {
                           Ok(dt) => dt,
                           Err(e) => {
                               return Result::Err(DataFusionError::Internal(e.to_string()));
                           }
                       };
                       builder.append_value((date_time.timestamp() / 86400) as i32);
                   }
                   let date32 = Arc::new(Date32Array::from(builder.finish()));
   
                   Ok(ColumnarValue::Array(date32))
               }
               ColumnarValue::Scalar(ScalarValue::Utf8(v)) => {
                   let date_time =
                       match DateTime::parse_from_rfc3339((v.clone().unwrap()).as_str()) {
                           Ok(dt) => dt,
                           Err(e) => {
                               return Result::Err(DataFusionError::Internal(e.to_string()));
                           }
                       };
                   let mut builder = Date32Builder::new();
   
                   builder.append_value((date_time.timestamp() / 86400) as i32);
                   let date32 = Arc::new(Date32Array::from(builder.finish()));
   
                   Ok(ColumnarValue::Array(date32))
               }
               _ => {
                   return Err(DataFusionError::Execution(
                       "array of `to_date` must be non-null scalar Utf8".to_string(),
                   ));
               }
           }
       }
   
       // let to_date = make_scalar_function(to_date);
       let to_date = create_udf(
           "to_date",
           vec![DataType::Utf8],
           Arc::new(DataType::Date32),
           Volatility::Immutable,
           Arc::new(to_date),
       );
   
       ctx.register_udf(to_date.clone()); 
   
       let expr = to_date.call(vec![col("a")]);
       let df = ctx.table("t").await?;
       let df = df.select(vec![expr])?;
       df.show().await?;
   
       ctx.sql("SELECT to_date(a) as u from t group by u")
           .await?
           .show()
           .await?;
   
       ctx.sql("SELECT COUNT(*), T, SUM(COUNT(*)) Over (Order By T) From (SELECT *, to_date(a) as T from t) Group BY T")
           .await?
           .show()
           .await?;
       Ok(())
   }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1481551641

   >  Hmm, this doesn't seem very intuitive. why would a to_date() function need to be an aggregate function. Seems more like a bug tbh.
   
   I think @doki23  means that this query doesn't makes sense because it is needs to produce a single output per distinct value of `u` (not a) so to_date needs to be an aggregate function (that takes in multiple rows and produces a single output row)
   
   ```sql
   SELECT to_date(a) as u from t group by u
   ```
   
   What I think is *super* confusing is that datafusion does run this query (when there is an order by)
   
   ```
   SELECT to_date(a) as u from t group by u order by u
   ```
   
   IOx users also recently hit this confusing behavior: 
   
   https://github.com/influxdata/docs-v2/issues/4812#issuecomment-1478395592
   
   I will write a ticket to fix it (basically proposing to make `SELECT to_date(a) as u from t group by u order by u` error)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475089605

   > The `to_date` only returns 1 value per input array, is it as expected?
   cc @BubbaJoe @doki23 
   
   The current error message is confusing, maybe we should add some checks here.
   
   Maybe we can compare the length of group_values and batch_hashes is same, if not then give some clear exception information.
   
   https://github.com/apache/arrow-datafusion/blob/3ccf1aebb6959fbc6bbbf74d2821522ddfd7d484/datafusion/core/src/physical_plan/aggregates/row_hash.rs#L330-L335
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1476059872

   > Error is the same as the above error log:
   
   @BubbaJoe  would it be possible to share a self contained reproducer?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1483817127

    > 
   I would like to test this this! When is the next release?
   
   It is currently out for a vote and should be released early next week: https://lists.apache.org/thread/fnnmrn83spnb2y2l2vdw2v1hd54pfjl7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1484692820

   Thanks, @jiangzhx it's working!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1474842469

   
   I updated from 15.0 to 20.0 and something really weird happened. when i run this:
   `SELECT COUNT(*), T, SUM(COUNT(*)) Over (Order By T) From (SELECT *, to_date(\"updatedAt\") as T from 'table') Group BY T`
   I get the below error, this query was working fine on version 15.0
   
   I modified my UDF to process multiple values:
   ```
   pub fn to_date(args: &[ArrayRef]) -> DFResult<ArrayRef> {
       if args.is_empty() || args.len() > 1 {
           return Err(DataFusionError::Internal(format!(
               "to_date was called with {} arguments. It requires only 1.",
               args.len()
           )));
       }
       let arg = &args[0].as_any().downcast_ref::<StringArray>().unwrap();
       let mut builder = Date32Builder::new();
       for date_string in arg.iter() {
           let date_time = match DateTime::parse_from_rfc3339(date_string.expect("date_string is null")) {
               Ok(dt) => dt,
               Err(e) => {
                   return Result::Err(DataFusionError::Internal(e.to_string()));
               }
           };
           builder.append_value((date_time.timestamp() / 86400) as i32);
       }
       Ok(Arc::new(builder.finish()))
   }
   ```
   
   Error log:
   ```
   thread 'tokio-runtime-worker' panicked at 'Trying to access an element at index 17 from a PrimitiveArray of length 17', /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-array-34.0.0/src/array/primitive_array.rs:337:9
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:575:5
      1: core::panicking::panic_fmt
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:64:14
      2: arrow_array::array::primitive_array::PrimitiveArray<T>::value
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-array-34.0.0/src/array/primitive_array.rs:337:9
      3: <&arrow_array::array::primitive_array::PrimitiveArray<T> as arrow_array::array::ArrayAccessor>::value
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-array-34.0.0/src/array/primitive_array.rs:711:9
      4: <&arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Date32Type> as arrow_cast::display::DisplayIndexState>::write
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-cast-34.0.0/src/display.rs:505:29
      5: <arrow_cast::display::ArrayFormat<F> as arrow_cast::display::DisplayIndex>::write
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-cast-34.0.0/src/display.rs:361:9
      6: <arrow_cast::display::ValueFormatter as core::fmt::Display>::fmt
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-cast-34.0.0/src/display.rs:162:15
      7: <T as alloc::string::ToString>::to_string
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/alloc/src/string.rs:2536:9
      8: arrow_json::writer::set_column_for_json_rows::{{closure}}
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-json-34.0.0/src/writer.rs:309:25
      9: core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/iter/traits/iterator.rs:828:29
     10: <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/iter/adapters/enumerate.rs:106:27
     11: core::iter::traits::iterator::Iterator::fold
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/iter/traits/iterator.rs:2414:21
     12: <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/iter/adapters/enumerate.rs:112:9
     13: core::iter::traits::iterator::Iterator::for_each
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/iter/traits/iterator.rs:831:9
     14: arrow_json::writer::set_column_for_json_rows
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-json-34.0.0/src/writer.rs:305:13
     15: arrow_json::writer::record_batches_to_json_rows
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-json-34.0.0/src/writer.rs:422:17
     16: dataqueen_back::routes::query::QueryResponse::new::{{closure}}
                at ./src/routes/query.rs:49:19
     17: dataqueen_back::routes::query::execute_query::{{closure}}
                at ./src/routes/query.rs:22:9
     18: <F as axum::handler::Handler<(M,T1,T2),S,B>>::call::{{closure}}
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.6.0-rc.4/src/handler/mod.rs:209:52
     19: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/future/future.rs:124:9
     20: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     21: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     22: <axum::handler::future::IntoServiceFuture<F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.6.0-rc.4/src/macros.rs:42:17
     23: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     24: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     25: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     26: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     27: <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     28: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     29: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/future/future.rs:124:9
     30: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/util/oneshot.rs:97:38
     31: <axum::routing::route::RouteFuture<B,E> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.6.0-rc.4/src/routing/route.rs:144:61
     32: <tower_http::cors::ResponseFuture<F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-http-0.3.4/src/cors/mod.rs:662:56
     33: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     34: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     35: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     36: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     37: <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     38: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     39: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     40: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     41: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     42: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     43: <futures_util::future::try_future::MapErr<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     44: <tower::util::map_err::MapErrFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     45: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     46: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     47: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     48: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     49: <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     50: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     51: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     52: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     53: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     54: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     55: <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     56: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     57: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/future/future.rs:124:9
     58: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/util/oneshot.rs:97:38
     59: <axum::routing::route::RouteFuture<B,E> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.6.0-rc.4/src/routing/route.rs:144:61
     60: <F as futures_core::future::TryFuture>::try_poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
     61: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
     62: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/future/map.rs:55:37
     63: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     64: <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/lib.rs:91:13
     65: <tower::util::map_response::MapResponseFuture<F,N> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/macros.rs:38:17
     66: <core::pin::Pin<P> as core::future::future::Future>::poll
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/future/future.rs:124:9
     67: <tower::util::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tower-0.4.13/src/util/oneshot.rs:97:38
     68: <axum::routing::route::RouteFuture<B,E> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/axum-0.6.0-rc.4/src/routing/route.rs:144:61
     69: <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
     70: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
     71: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
     72: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
     73: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
     74: <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
     75: <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
     76: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
     77: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
     78: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/core.rs:223:17
     79: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/loom/std/unsafe_cell.rs:14:9
     80: tokio::runtime::task::core::Core<T,S>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/core.rs:212:13
     81: tokio::runtime::task::harness::poll_future::{{closure}}
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/harness.rs:476:19
     82: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panic/unwind_safe.rs:271:9
     83: std::panicking::try::do_call
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:483:40
     84: ___rust_try
     85: std::panicking::try
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:447:19
     86: std::panic::catch_unwind
                at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panic.rs:137:14
     87: tokio::runtime::task::harness::poll_future
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/harness.rs:464:18
     88: tokio::runtime::task::harness::Harness<T,S>::poll_inner
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/harness.rs:198:27
     89: tokio::runtime::task::harness::Harness<T,S>::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/harness.rs:152:15
     90: tokio::runtime::task::raw::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/raw.rs:255:5
     91: tokio::runtime::task::raw::RawTask::poll
                at /Users/joe/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.26.0/src/runtime/task/raw.rs:200:18
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1475086400

   The `to_date` only returns 1 value per input array, is it as expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1477285559

    ```
       // let sql = "SELECT to_date(a) as u from t group by u";        //got error
       // let sql = "SELECT to_timestamp(a) as u from t group by u";   //got error
   
       // let sql = "SELECT to_date(a) as u from t group by u order by u";        //got right
       // let sql = "SELECT to_timestamp(a) as u from t group by u order by u";   //got right
   ```
   the difference is append `order by u`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1483808708

   > But I am curious about the batching format, why do the record batches look like this?
   
   It's because the 'target partitions' whose default value is determined by core num.
   Setting target partitions to 1 will make the final `batches` vector only have 1 batch @BubbaJoe .
   
   ```rust
       let mut session_cfg = SessionConfig::new();
       session_cfg = session_cfg.with_target_partitions(1);
       let ctx = SessionContext::with_config(session_cfg);
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1481347909

   It's because `to_date` is not an aggregate function. I notice that the size of `batches` is 10(9 nulls + '2012-04-23'), that's the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1481452392

   Hmm, this doesn't seem very intuitive. why would a to_date() function need to be an aggregate function. Seems more like a bug tbh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] BubbaJoe commented on issue #5635: Unable to "GROUP BY" udf result

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe commented on issue #5635:
URL: https://github.com/apache/arrow-datafusion/issues/5635#issuecomment-1482106211

   @doki23 Actually i just tried using `DISTINCT` and that also doesn't work. Were you not able to reproduce on your side?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org