You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "aprimadi (via GitHub)" <gi...@apache.org> on 2023/04/21 14:03:40 UTC

[GitHub] [arrow-datafusion] aprimadi opened a new pull request, #6088: refactor(sqllogictests): wip refactor group by test to sqllogic

aprimadi opened a new pull request, #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088

   # Which issue does this PR close?
   
   Port group by tests from group_by.rs.
   
   Work towards #4495 
   
   # Rationale for this change
   
   N/A
   
   # What changes are included in this PR?
   
   N/A
   
   # Are these changes tested?
   
   Yes
   
   # Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#discussion_r1174398296


##########
datafusion/core/tests/sqllogictests/src/setup.rs:
##########
@@ -39,6 +41,95 @@ pub async fn register_aggregate_tables(ctx: &SessionContext) {
     register_test_data(ctx);
 }
 
+pub async fn register_group_tables(ctx: &SessionContext) {
+    register_time_test_tables(ctx);
+}
+
+/// Create and populate time32_s, time32_ms, time64_us, time64_ns tables
+fn register_time_test_tables(ctx: &SessionContext) {

Review Comment:
   I don't think you need rust code to do this setup anymore
   
   You can create arbitrary Arrow types that have no corresponding SQL types with the `arrow_cast` function -- https://arrow.apache.org/datafusion/user-guide/sql/data_types.html
   
   For example, here is one way to do it:
   https://github.com/apache/arrow-datafusion/blob/7759d96e892e7c159136af35ebc0def49ed3b3b1/datafusion/core/tests/sqllogictests/test_files/timestamps.slt#L113-L129
   
   So you can make a value of `Time32(Seconds)` with something like
   
   ```sql
   ❯ select arrow_cast(5000::int, 'Time32(Second)');
   +-------------+
   | Int64(5000) |
   +-------------+
   | 01:23:20    |
   +-------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1519657290

   Btw,, those are the only two tests from group_by.rs not being moved to sqllogic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1519601265

   > 
   
   My idea is mainly based on these two test cases and what they cover.
   
   1. [group_by_date_trunc](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql/group_by.rs#L648) based on the method name, is to test date_trunc. I think we can use an existing test data and move to sqllogictests
   3. [group_by_dictionary](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql/group_by.rs#L745) based on the code, is testing DictionaryArray. In my opinion, it should not be migrated to sqllogictest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1519655729

   I think both `group_by_date_trunc` and `group_by_dictionary` test should not be moved.
   
   For `group_by_date_trunc` the data is procedurally generated and it also generates multiple partition file.
   
   For `group_by_dictionary`, I totally agree with you @jiangzhx
   
   IMO, the data setup should be as close to the code that test it except if it's being used multiple times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1518939691

   @alamb not sure how to port [group_by_date_trunc](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql/group_by.rs#L648) and [group_by_dictionary](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql/group_by.rs#L745) without writing setup code. Do we want to migrate those also?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on a diff in pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on code in PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#discussion_r1174400607


##########
datafusion/core/tests/sqllogictests/src/setup.rs:
##########
@@ -39,6 +41,95 @@ pub async fn register_aggregate_tables(ctx: &SessionContext) {
     register_test_data(ctx);
 }
 
+pub async fn register_group_tables(ctx: &SessionContext) {
+    register_time_test_tables(ctx);
+}
+
+/// Create and populate time32_s, time32_ms, time64_us, time64_ns tables
+fn register_time_test_tables(ctx: &SessionContext) {

Review Comment:
   Alright, thank you for the feedback @alamb. Didn't know that. I'm done for the day, will probably work more on this tomorrow if nothing comes up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1518260234

   Thank you @aprimadi  -- we really appreciate the help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1520372269

   I really appreciate the help @aprimadi  ❤️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1518539872

   Not sure if we should port [group_by_dictionary](https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/sql/group_by.rs#L745) test case @alamb. 
   
   It seems that if we port it into sqllogic, the test would be longer. While doing it programmatically seems to be very concise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1518549472

   This is probably enough for now, all the other tests use a one-time-use test data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6088: refactor(sqllogictests): port group by test to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6088:
URL: https://github.com/apache/arrow-datafusion/pull/6088#issuecomment-1520371751

   
   
   > For group_by_date_trunc the data is procedurally generated and it also generates multiple partition file.
   
   > For group_by_dictionary, I totally agree with you @jiangzhx
   
   I agree -- btw I am hoping eventually to be able to create partitioned files using `COPY TO ...` https://github.com/apache/arrow-datafusion/issues/5654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org