You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mustafasrepo (via GitHub)" <gi...@apache.org> on 2023/05/31 10:21:35 UTC

[GitHub] [arrow-datafusion] mustafasrepo opened a new issue, #6502: FIRST accumulator gives wrong result when fed with multiple batches

mustafasrepo opened a new issue, #6502:
URL: https://github.com/apache/arrow-datafusion/issues/6502

   ### Describe the bug
   
   First Accumulator in its current implementation is not aware of whether it is set or not. Hence when fed with multi partition data, it returns first value of the last batch.
   
   ### To Reproduce
   
   One can use test below to reproduce problem
   ```rust
   #[tokio::test]
   async fn test_first_value_multi_partition() -> Result<()> {
       let config = SessionConfig::new()
           .with_target_partitions(1);
       let ctx = SessionContext::with_config(config);
       let fields = vec![Field::new("a", DataType::Int64, false)];
       let schema = Arc::new(Schema::new(fields));
       let batch1 = RecordBatch::try_new(schema.clone(), vec![Arc::new(Int64Array::from(vec![1,2,3,4])) as ArrayRef])?;
       let batch2 = RecordBatch::try_new(schema.clone(), vec![Arc::new(Int64Array::from(vec![5,6,7,8])) as ArrayRef])?;
       let partitions = vec![vec![batch1, batch2]];
       let mem_table = MemTable::try_new(schema, partitions)?;
       ctx.register_table("table1", Arc::new(mem_table))?;
   
       let sql = "SELECT FIRST_VALUE(a)
                       FROM table1";
   
       let msg = format!("Creating logical plan for '{sql}'");
       let dataframe = ctx.sql(sql).await.expect(&msg);
       let physical_plan = dataframe.create_physical_plan().await?;
       let batches = collect(physical_plan, ctx.task_ctx()).await?;
       print_batches(&batches)?;
       Ok(())
   }
   ```
   result should be 1, where as now it returns 5.
   
   ### Expected behavior
   
   Above test should return 1
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mustafasrepo closed issue #6502: FIRST accumulator gives wrong result when fed with multiple batches

Posted by "mustafasrepo (via GitHub)" <gi...@apache.org>.
mustafasrepo closed issue #6502: FIRST accumulator gives wrong result when fed with multiple batches
URL: https://github.com/apache/arrow-datafusion/issues/6502


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org