You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zhangxffff (via GitHub)" <gi...@apache.org> on 2023/11/23 14:55:59 UTC

[PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

zhangxffff opened a new pull request, #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #8204 .
   SubTask of #6195 .
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315#discussion_r1407841847


##########
datafusion/sqllogictest/test_files/limit.slt:
##########
@@ -379,6 +379,110 @@ SELECT COUNT(*) FROM (SELECT a FROM t1 WHERE a > 3 LIMIT 3 OFFSET 6);
 ----
 1
 
+# generate BIGINT data from 1 to 1000
+statement ok
+CREATE TABLE t1000 (i BIGINT) AS
+WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
+SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;

Review Comment:
   I took the liberty of updating some comments / adding an explain tests in [26510c9](https://github.com/apache/arrow-datafusion/pull/8315/commits/26510c98247fee5bdcad8203b14e1c4f6ce29d54)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315#discussion_r1407830562


##########
datafusion/sqllogictest/test_files/limit.slt:
##########
@@ -379,6 +379,110 @@ SELECT COUNT(*) FROM (SELECT a FROM t1 WHERE a > 3 LIMIT 3 OFFSET 6);
 ----
 1
 
+# generate BIGINT data from 1 to 1000
+statement ok
+CREATE TABLE t1000 (i BIGINT) AS
+WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
+SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;

Review Comment:
   I double checked that this actually generates partitioned input and it does indeed. 
   
   You can see on my local setup that the MemoryExec has multiple partitions
   
   ```
   ❯ explain select distinct i  from t1000;
   +---------------+-----------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                |
   +---------------+-----------------------------------------------------------------------------------------------------+
   | logical_plan  | Aggregate: groupBy=[[t1000.i]], aggr=[[]]                                                           |
   |               |   TableScan: t1000 projection=[i]                                                                   |
   | physical_plan | AggregateExec: mode=FinalPartitioned, gby=[i@0 as i], aggr=[]                                       |
   |               |   CoalesceBatchesExec: target_batch_size=8192                                                       |
   |               |     RepartitionExec: partitioning=Hash([i@0], 16), input_partitions=16                              |
   |               |       AggregateExec: mode=Partial, gby=[i@0 as i], aggr=[]                                          |
   |               |         MemoryExec: partitions=16, partition_sizes=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1] |
   |               |                                                                                                     |
   +---------------+-----------------------------------------------------------------------------------------------------+
   ```



##########
datafusion/sqllogictest/test_files/limit.slt:
##########
@@ -379,6 +379,110 @@ SELECT COUNT(*) FROM (SELECT a FROM t1 WHERE a > 3 LIMIT 3 OFFSET 6);
 ----
 1
 
+# generate BIGINT data from 1 to 1000
+statement ok
+CREATE TABLE t1000 (i BIGINT) AS
+WITH t AS (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0))
+SELECT ROW_NUMBER() OVER (PARTITION BY t1.column1) FROM t t1, t t2, t t3;

Review Comment:
   I double checked that this actually generates partitioned input and it does indeed. 
   
   You can see on my local setup that the MemoryExec has multiple partitions
   
   ```
   ❯ explain select distinct i  from t1000;
   +---------------+-----------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                |
   +---------------+-----------------------------------------------------------------------------------------------------+
   | logical_plan  | Aggregate: groupBy=[[t1000.i]], aggr=[[]]                                                           |
   |               |   TableScan: t1000 projection=[i]                                                                   |
   | physical_plan | AggregateExec: mode=FinalPartitioned, gby=[i@0 as i], aggr=[]                                       |
   |               |   CoalesceBatchesExec: target_batch_size=8192                                                       |
   |               |     RepartitionExec: partitioning=Hash([i@0], 16), input_partitions=16                              |
   |               |       AggregateExec: mode=Partial, gby=[i@0 as i], aggr=[]                                          |
   |               |         MemoryExec: partitions=16, partition_sizes=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1] |
   |               |                                                                                                     |
   +---------------+-----------------------------------------------------------------------------------------------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315#issuecomment-1829941841

   Since I had it checked out anyways, I fixed the clippy issues in [f5015ad](https://github.com/apache/arrow-datafusion/pull/8315/commits/f5015adbdccbca2988273d60f1153083a72b2826) and pushed to this branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315#issuecomment-1829942415

   Thank you very much @zhangxffff  -- this is a great initial contribution. I really appreciate it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Port tests in limit.rs to sqllogictest [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #8315:
URL: https://github.com/apache/arrow-datafusion/pull/8315


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org