You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 12:15:50 UTC

[GitHub] [arrow-datafusion] Dandandan commented on issue #3747: DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312"))

Dandandan commented on issue #3747:
URL: https://github.com/apache/arrow-datafusion/issues/3747#issuecomment-1271514648

   I think the code should update the size 
   
   > I think I am able to trigger it using the query below:
   > 
   > ```sql
   > CREATE EXTERNAL TABLE decimal_simple (
   >             c1  DECIMAL(10,6) NOT NULL,
   >             c2  DOUBLE NOT NULL,
   >             c3  BIGINT NOT NULL,
   >             c4  BOOLEAN NOT NULL,
   >             c5  DECIMAL(12,7) NOT NULL
   >             )
   >             STORED AS CSV
   >             WITH HEADER ROW
   >             LOCATION '../datafusion/core/tests/decimal_data.csv';
   > 
   > select * from decimal_simple where c1 >= 0.00004 order by c1 limit 100;
   > ```
   > 
   > But still not sure about the underlying issue.
   > 
   > The size of the input array changes even though the contents stay the same (so probably buffers too). Maybe that assumption doesn't hold when we are recreating the arrays (and/or maybe the ordering actually changes how buffers are built), so instead of triggering an assertion, we might need to factor this case in and just run into this branch when we are sure that the `new_size` is less than `size`.
   
   Seems indeed we should handle the case of a sorted batch being somehow bigger than the input batch, and add an issue in arrow-rs to tackle this there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org