You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "aprimadi (via GitHub)" <gi...@apache.org> on 2023/04/29 08:29:26 UTC

[GitHub] [arrow-datafusion] aprimadi opened a new pull request, #6158: Port test in select.rs to sqllogic

aprimadi opened a new pull request, #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158

   # Which issue does this PR close?
   
   More work toward #4495
   
   # Rationale for this change
   
   Cause I like easy task
   
   # What changes are included in this PR?
   
   Just porting some tests to sqllogic
   
   # Are these changes tested?
   
   Yes
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6158: Port test in select.rs to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529172082

   > No worries @alamb. This keeps me engaged while I'm learning through the more difficult bits of the project.
   
   It is much appreciated. While you have a newcomer's eyes, I would be interested in the parts you found the most difficult to pick up (really I am interested in how to make it less difficult). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6158: Port test in select.rs to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1530147582

   > K-way sort preserving merge because there is a clever bit of the code that uses an array representation of a binary tree (to implement losers tree).
   
   
   
   Yes, I agree this is quite a nice piece of code that is non trivial. I think the parts of DataFusion that are the most performance critical are also the parts that are the hardest to understand. It would be great to improve the understandability. 
   
   @aprimadi  I wonder if you might be willing to take a shot at adding some comments (perhaps copy/paste/modify your comments in https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529252222) to help other future readers
   
   Also, I will try and be better about triaging tickets as they come in and marking the relevant ones as "good first issues" -- https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
   
   It is a little lacking now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6158: Port test in select.rs to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529074909

   No worries @alamb. This keeps me engaged while I'm learning through the more difficult bits of the project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6158: Port test in select.rs to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6158: Port test in select.rs to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1528953134

   Tests that cannot be ported (yet):
   - `query_get_indexed_field` - unsupported SQL types ARRAY
   - `query_nested_get_indexed_field` - unsupported SQL types ARRAY
   - `query_nested_get_indexed_field_on_struct` - unsupported SQL types ARRAY
   - `query_on_string_dictionary` - using DictionaryArray
   - `sort_on_window_null_string` - using DictionaryArray
   - `test_prepare_statement` - using partitioned csv
   - `parallel_query_with_filter` - using partitioned csv
   - `boolean_literal` - using partitioned csv
   - `unprojected_filter` - this test is using dataframe interface


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on pull request #6158: Port test in select.rs to sqllogic

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529252222

   The external sort come to mind and also the k-way sort preserving merge.
   
   K-way sort preserving merge because there is a clever bit of the code that uses an array representation of a binary tree (to implement losers tree).
   
   Like for example this code:
   https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L252
   https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L271
   These really find the leaf node of the loser tree that this `cursor index` belongs to but it's not immediately apparent without realizing that the binary tree is represented this way:
   
   ```
      0
      1 
    2   3
   4 5 6 7
   ```
   i.e. the root of the tree is an element at index 1 in the vector and the first child of the root is an element at index 2 in the vector and so on...
   
   I guess I'm just hesitant to work on larger tasks due to:
   1. Whether someone else is currently working on it
   2. Whether it's high priority. Don't want to block the issue for a prolonged period.
   3. If the issue touch areas that I'm currently not very familiar with. Again for the same reason as 2, don't want to block the issue for a prolonged period.
   
   Currently I prefer to work on issue that I can finish on one weekend and yeah currently #4495 is my default go to issue if I don't find issues that are interesting or that I'm sure I can finish on one weekend 😆.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6158: Port test in select.rs to sqllogic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6158:
URL: https://github.com/apache/arrow-datafusion/pull/6158#discussion_r1181236816


##########
datafusion/core/tests/sqllogictests/test_files/select.slt:
##########
@@ -265,3 +296,394 @@ query B
 select column1 is not distinct from column2 from t;
 ----
 false
+
+
+# select all
+# these two queries should return the same result
+query R
+SELECT c1 FROM aggregate_simple order by c1
+----
+0.00001
+0.00002
+0.00002
+0.00003
+0.00003
+0.00003
+0.00004
+0.00004
+0.00004
+0.00004
+0.00005
+0.00005
+0.00005
+0.00005
+0.00005
+
+query R
+SELECT ALL c1 FROM aggregate_simple order by c1
+----
+0.00001
+0.00002
+0.00002
+0.00003
+0.00003
+0.00003
+0.00004
+0.00004
+0.00004
+0.00004
+0.00005
+0.00005
+0.00005
+0.00005
+0.00005
+
+# select distinct
+query RRB rowsort
+SELECT DISTINCT * FROM aggregate_simple
+----
+0.00001 0.000000000001 true
+0.00002 0.000000000002 false
+0.00003 0.000000000003 true
+0.00004 0.000000000004 false
+0.00005 0.000000000005 true
+
+# select distinct with projection and order by
+query R
+SELECT DISTINCT c1 FROM aggregate_simple order by c1
+----
+0.00001
+0.00002
+0.00003
+0.00004
+0.00005
+
+# select distinct with multi-columns projection and single-column order by
+query RR
+SELECT DISTINCT c1, c2 FROM aggregate_simple order by c1
+----
+0.00001 0.000000000001
+0.00002 0.000000000002
+0.00003 0.000000000003
+0.00004 0.000000000004
+0.00005 0.000000000005
+
+# select distinct boolean column
+query B
+SELECT DISTINCT c3 FROM aggregate_simple order by c3
+----
+false
+true
+
+# select distinct with addition expression
+query R rowsort
+SELECT DISTINCT c1 + c2 AS a FROM aggregate_simple
+----
+0.000010000001
+0.000020000001
+0.000030000002
+0.000040000003
+0.000050000004
+
+# select distinct from
+query BBBBBBBB
+select
+1 IS DISTINCT FROM CAST(NULL as INT) as a,
+1 IS DISTINCT FROM 1 as b,
+1 IS NOT DISTINCT FROM CAST(NULL as INT) as c,
+1 IS NOT DISTINCT FROM 1 as d,
+NULL IS DISTINCT FROM NULL as e,
+NULL IS NOT DISTINCT FROM NULL as f,
+NULL is DISTINCT FROM 1 as g,
+NULL is NOT DISTINCT FROM 1 as h
+----
+true false false true false true true false
+
+query BBBBBBBB
+select
+NULL IS DISTINCT FROM NULL as a,
+NULL IS NOT DISTINCT FROM NULL as b,
+NULL is DISTINCT FROM 1 as c,
+NULL is NOT DISTINCT FROM 1 as d,
+1 IS DISTINCT FROM CAST(NULL as INT) as e,
+1 IS DISTINCT FROM 1 as f,
+1 IS NOT DISTINCT FROM CAST(NULL as INT) as g,
+1 IS NOT DISTINCT FROM 1 as h
+----
+false true true false true false false true
+
+# select distinct from utf8
+query BBBB
+select
+'x' IS DISTINCT FROM NULL as a,
+'x' IS DISTINCT FROM 'x' as b,
+'x' IS NOT DISTINCT FROM NULL as c,
+'x' IS NOT DISTINCT FROM 'x' as d
+----
+true false false true
+
+# select between simple expression
+query B
+SELECT 1 NOT BETWEEN 3 AND 5
+----
+true
+
+
+statement ok
+create table select_between_data(c1 bigint) as values (1), (2), (3), (4);
+
+# select between complex expression
+
+query B
+SELECT abs(c1) BETWEEN 0 AND LoG(c1 * 100 ) FROM select_between_data ORDER BY c1
+----
+true
+true
+false
+false
+
+# explain select between

Review Comment:
   👍  for testing the whole plan



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -168,7 +168,7 @@ async fn context_for_test_file(relative_path: &Path) -> SessionContext {
     let ctx = SessionContext::with_config(config);
 
     match relative_path.file_name().unwrap().to_str().unwrap() {
-        "aggregate.slt" | "select.slt" => {
+        "aggregate.slt" => {

Review Comment:
   ❤️ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org