You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/07/13 11:56:04 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6952: Minor: Improve aggregate test coverage more

alamb opened a new pull request, #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952

   # Which issue does this PR close?
   
   Follow on to https://github.com/apache/arrow-datafusion/pull/6939
   Related to https://github.com/apache/arrow-datafusion/issues/6889
   
   # Rationale for this change
   I almost merged code that had incorrect aggregate null handling and the tests didn't catch it
   
   This is my attempt to improve coverage
   
   # What changes are included in this PR?
   Add some more tests of aggregates (several types were missing coverage with `GROUP BY`). I'll comment inline
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6952: Minor: Improve aggregate test coverage more

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952#issuecomment-1641877915

   I think this is important coverage to add so I am merging it in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #6952: Minor: Improve aggregate test coverage more

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952#issuecomment-1634397300

   And this found a bug https://github.com/apache/arrow-datafusion/issues/6955 🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6952: Minor: Improve aggregate test coverage more

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952#discussion_r1262453491


##########
datafusion/core/tests/sqllogictests/test_files/aggregate.slt:
##########
@@ -1420,65 +1420,95 @@ select var(sq.column1), var_pop(sq.column1), stddev(sq.column1), stddev_pop(sq.c
 2 1 1.414213562373 1
 
 
-# sum / count for all nulls
-statement ok
-create table the_nulls as values (null::bigint, 1), (null::bigint, 1), (null::bigint, 2);
 
-# counts should be zeros (even for nulls)
-query II
-SELECT count(column1), column2 from the_nulls group by column2 order by column2;
+# aggregates on empty tables
+statement ok
+CREATE TABLE empty (column1 bigint, column2 int);
+
+# no group by column

Review Comment:
   The null tests didn't originally have coverage for the no group by case so I added that and consolidated some tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #6952: Minor: Improve aggregate test coverage more

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6952: Minor: Improve aggregate test coverage more

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #6952:
URL: https://github.com/apache/arrow-datafusion/pull/6952#discussion_r1262454044


##########
datafusion/core/tests/sqllogictests/test_files/aggregate.slt:
##########
@@ -1489,29 +1519,49 @@ create table bit_aggregate_functions (
   c1 SMALLINT NOT NULL,
   c2 SMALLINT NOT NULL,
   c3 SMALLINT,
+  tag varchar
 )
 as values
-  (5, 10, 11),
-  (33, 11, null),
-  (9, 12, null);
-
-# query_bit_and
-query III
-SELECT bit_and(c1), bit_and(c2), bit_and(c3) FROM bit_aggregate_functions
-----
-1 8 11
-
-# query_bit_or
-query III
-SELECT bit_or(c1), bit_or(c2), bit_or(c3) FROM bit_aggregate_functions
-----
-45 15 11
+  (5,  10, 11,   'A'),
+  (33, 11, null, 'B'),
+  (9,  12, null, 'A');
+
+# query_bit_and, query_bit_or, query_bit_xor

Review Comment:
   These also didn't have coverage for GROUP BY , so I consolidated the tests and added grouping coverage



##########
datafusion/core/tests/sqllogictests/test_files/aggregate.slt:
##########
@@ -1883,69 +1933,101 @@ CREATE TABLE test_table (c1 INT, c2 INT, c3 INT)
 
 # Inserting data
 statement ok
-INSERT INTO test_table VALUES (1, 10, 50), (1, 20, 60), (2, 10, 70), (2, 20, 80), (3, 10, NULL)
+INSERT INTO test_table VALUES
+  (1, 10, 50),
+  (1, 20, 60),
+  (2, 10, 70),
+  (2, 20, 80),
+  (3, 10, NULL)
 
 # query_group_by_with_filter
-query II rowsort
-SELECT c1, SUM(c2) FILTER (WHERE c2 >= 20) as result FROM test_table GROUP BY c1
-----
-1 20
-2 20
-3 NULL
+query III
+SELECT
+  c1,
+  SUM(c2) FILTER (WHERE c2 >= 20),
+  SUM(c2) FILTER (WHERE c2 < 1) -- no rows pass filter, so the output should be NULL

Review Comment:
   added some additional coverage with filtering



##########
datafusion/core/tests/sqllogictests/test_files/aggregate.slt:
##########
@@ -1883,69 +1933,101 @@ CREATE TABLE test_table (c1 INT, c2 INT, c3 INT)
 
 # Inserting data
 statement ok
-INSERT INTO test_table VALUES (1, 10, 50), (1, 20, 60), (2, 10, 70), (2, 20, 80), (3, 10, NULL)
+INSERT INTO test_table VALUES
+  (1, 10, 50),
+  (1, 20, 60),
+  (2, 10, 70),
+  (2, 20, 80),
+  (3, 10, NULL)
 
 # query_group_by_with_filter
-query II rowsort
-SELECT c1, SUM(c2) FILTER (WHERE c2 >= 20) as result FROM test_table GROUP BY c1
-----
-1 20
-2 20
-3 NULL
+query III
+SELECT
+  c1,
+  SUM(c2) FILTER (WHERE c2 >= 20),
+  SUM(c2) FILTER (WHERE c2 < 1) -- no rows pass filter, so the output should be NULL
+FROM test_table GROUP BY c1
+----
+1 20 NULL
+2 20 NULL
+3 NULL NULL
 
 # query_group_by_avg_with_filter
-query IR rowsort
-SELECT c1, AVG(c2) FILTER (WHERE c2 >= 20) AS avg_c2 FROM test_table GROUP BY c1
-----
-1 20
-2 20
-3 NULL
+query IRR rowsort
+SELECT
+  c1,
+  AVG(c2) FILTER (WHERE c2 >= 20),
+  AVG(c2) FILTER (WHERE c2 < 1)  -- no rows pass filter, so output should be null
+FROM test_table GROUP BY c1
+----
+1 20 NULL
+2 20 NULL
+3 NULL NULL
 
 # query_group_by_with_multiple_filters
 query IIR rowsort
-SELECT c1, SUM(c2) FILTER (WHERE c2 >= 20) AS sum_c2, AVG(c3) FILTER (WHERE c3 <= 70) AS avg_c3 FROM test_table GROUP BY c1

Review Comment:
   this was just some whitespace OCD



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org