You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kazuyukitanimura (via GitHub)" <gi...@apache.org> on 2023/03/24 08:39:14 UTC

[GitHub] [arrow-datafusion] kazuyukitanimura opened a new issue, #5713: Make an end to end reproducer for zero column batch issues

kazuyukitanimura opened a new issue, #5713:
URL: https://github.com/apache/arrow-datafusion/issues/5713

   ### Is your feature request related to a problem or challenge?
   
   Based on the review https://github.com/apache/arrow-datafusion/pull/5709#pullrequestreview-1355510439 it is ideal to make an end to end reproducer for zero column batch issues (either with SQL or with the DataFrame API)  It would be nice to ensure there aren't other issues with creating zero column batches.
   
   We have found two execs so far that return an Err ArrowError(InvalidArgumentError("must either specify a row count or at least one column")) with no column https://github.com/apache/arrow-datafusion/issues/4911 https://github.com/apache/arrow-datafusion/issues/5701
   
   I suspect there are other exec may have the same issue.
   
   ### Describe the solution you'd like
   
   Need to find a good example to project empty-column relation (with non-empty row),
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make an end to end reproducer for zero column batch issues [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5713:
URL: https://github.com/apache/arrow-datafusion/issues/5713#issuecomment-1784080358

   I think this is a good first issue for someone with some familiarity with SQL and who wants to learn about how the sqllogictests work -- see https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest#readme


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make an end to end reproducer for zero column batch issues [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5713:
URL: https://github.com/apache/arrow-datafusion/issues/5713#issuecomment-1784080134

   One classic list of things to test is Filter, Projection, Grouping, Join, and Window, and Limit. I am not sure how far we should go with this
   
   BTW is one potential "end to end" test for  zero column batches and LIMIT (maybe someone could just put this into  slt file, following the model of @Jefffrey  in https://github.com/apache/arrow-datafusion/pull/7945)
   
   Adding some simple variations with `GROUP BY` and `JOIN` could also help
   
   ```sql
   ❯ create table t as values (1, 2), (3, 4);
   0 rows in set. Query took 0.002 seconds.
   
   ❯ select * except (column1, column2) from t ;
   ++
   ++
   ++
   2 rows in set. Query took 0.001 seconds.
   
   ❯ select * except (column1, column2) from t limit 1;
   ++
   ++
   ++
   1 row in set. Query took 0.000 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Make an end to end reproducer for zero column batch issues [arrow-datafusion]

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on issue #5713:
URL: https://github.com/apache/arrow-datafusion/issues/5713#issuecomment-1783791564

   I think excluding columns is a valid way to get zero column batches with non-empty row count: https://github.com/apache/arrow-datafusion/issues/6510#issuecomment-1782524357
   
   https://github.com/apache/arrow-datafusion/pull/7945 added a test to sqllogictests which could be considered end to end, however it is very simple involving only a projection.
   
   Is there a list of things to test for zero-column batches? I see limit and projection are the only ones mentioned in the original issue text.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org