You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/07 00:02:37 UTC

[GitHub] [arrow-datafusion] isidentical commented on issue #3747: DataFusionError(Internal("The size of the sorted batch is larger than the size of the input batch: 2120 > 2312"))

isidentical commented on issue #3747:
URL: https://github.com/apache/arrow-datafusion/issues/3747#issuecomment-1270856556

   I think I am able to trigger it using the query below:
   ```sql
   CREATE EXTERNAL TABLE decimal_simple (
               c1  DECIMAL(10,6) NOT NULL,
               c2  DOUBLE NOT NULL,
               c3  BIGINT NOT NULL,
               c4  BOOLEAN NOT NULL,
               c5  DECIMAL(12,7) NOT NULL
               )
               STORED AS CSV
               WITH HEADER ROW
               LOCATION '../datafusion/core/tests/decimal_data.csv';
   
   select * from decimal_simple where c1 >= 0.00004 order by c1 limit 100;
   ```
   
   But still not sure about the underlying issue.
   
   The size of the input array changes even though the contents stay the same (so probably buffers too). Maybe that assumption doesn't hold when we are recreating the arrays (and/or maybe the ordering actually changes how buffers are built), so instead of triggering an assertion, we might need to factor this case in and just run into this branch when we are sure that the `new_size` is less than `size`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org